Low TPS with a powerful dedicated server [1.16.5 - Paper 468]

Hello,

We are having TPS drops (16-) with our server when there are around 45+ players with Paper 468 (1.16.5).

The dedicated server is equiped with:

  • CPU: AMD Ryzen 9 5950x
  • RAM: 128GB
  • SSD: 3.84 TB NVMe

Here is the plugins list:

Here is a Spark Report when the TPS was low: https://spark.lucko.me/#DueHh8LO92

The world has been entirely gerenated with FastChunkPregenerator (from -25000 to +25000).

We can see there is Multiverse in the plugins list, but there is only one other world which is not currently used.

Here is an Aikar timing, but with the TPS doing great: https://timings.aikar.co/?id=7ae3cc81f89c489295c33bfd9194ddff

We are using the Aikar’s flags.

With Spark, we can see that the CPU usage is only at 7%.

What can I do to optimize the server?

Thank you very much :slight_smile:

Here is a timing with 70+ players: https://timings.aikar.co/?id=6ccb6ce2810649439702c15fb058fc41

Can the lag come from the Anti Xray in Engine Mode 2?

I’m no expert, but a normal Java developer. Looking at the Spark report, it seems PlayerChunkMap#forEachVisibleChunk is the cause, so it may have to be something with view-distance, or chunks at the general. This no guarantees, though.

Looking at the timings, it is either chunks or entities. The most entity took %6 of the tick, so it is probably chunks again.

Your hardware is good, and chunk loading in Paper is generally asynchronous, among with other things. Either the world is corrupted, a plugin is doing synchronous chunk thing (i.e teleporting) without using PaperLib, you have a leak/bitcoin miner plugin or something, or it is just that players are overloading your servers (i.e flying with elytra at high speeds, loading chunks continously, abusing exploits, etc.)

However these all are assumptions.

There is no exact solution to lag. Both Spigot and Paper advertises one thing: “High Performance Minecraft”. However, even when Paper is doing a good work, plugins and other stuff makes it complicated.

I can only give you some recommendations, which, you should NOT do them blindly and expect it to solve the issue magically, and you should take backups or do these with full responsibility of yours, these are just recommendations of mine to try out.

Just try them, see if it solves the issue and possibly revert back if did not see any improvement.


a) Remove the extra arguments “-Duser.language=en -Duser.region=US -Dpaper.playerconnection.keepalive=120”

I assume you added the first two to fix some bad plugins locale behaviour. Plugins should use localized String methods, and you should not set those.

The last is the most dangerous one out of three, the default value for a keep alive timeout is 20. Paper made it 30 so it is less likely to get a timed out kick in a fair network connection.

However, values above 30 causes bugs, like players getting quit actually disconnecting after some time, not instantly. This may cause issues like player being detecded as online even if it is not.


b) Tweak MaxGCPauseMillis

Aikar’s Java flags are the best (or most stable) flag collection out there, right from the Paper developer himself, but it can be improved. Since your heap is very large and player count is not that high, you should optimize your GC to get lowest pause times.

MaxGCPauseMillis controls the maximum time spent during “stop-the-world” GC events, these events pause the server to recycle memory. Even if you have a large heap, this does not mean GC will not happen. GC is a function of Java language, and it is required to make some things work.

Even when young gen (newsize and maxnewsize arguments) configured to a high percentage, some things such as method locals, soft references, etc. will still be cleared. Those require some minor spikes.

What is minor spike is depends on context though. For Minecraft, more than 50 milliseconds of a freeze/spikes lower TPS and skips ticks.

Aikar and Java developers provides 200 as a default. However, I’d recommend setting it to 49 to at least make it try to not skip any ticks as it can.

This value is not an absolute maximum and Java will get over it if it would give for example an OutOfMemoryError otherwise.


c) Tweak G1HeapRegionSize

I do not know much about this argument, but from the online documentations it is the value of each region in memory, and Java has many regions, so setting it higher will get you more memory usage (or wasted memory usage if region does not use all the value specified) in exchange for performance (in my understanding)

This 8M in default Aikar flags and 16M for Aikar flags > 12GB. However, it is 32M in default Minecraft client settings. So I’d recommend you to try 32M to see if it changes anything.


d) (Advanced) Upgrade your Linux Kernel or Java Version

This should not solve the problem directly, since most performance or security patches would be backported to older versions, and you are mostly using LTS stuff.

However, upgrading to a newer Linux kernel like kernel 5.x or Java 15 may improve the performance.


I can also recommend asking for help in the Paper discord. This community forums page is not very active.

Also check the logs for any spammed errors/warnings and watchdog logs. Those provide stack trace to investigate what caused the spikes/freezes, which are the most common causes of TPS drops.

A Minecraft Server has to tick 20 times per second, and it constantly tries to do so. When you type /tps, if it says *20, your TPS is above 20 but server equalized it to 20, to not break any game mechanics. If it is normal 20, the server is trying it is best to keep latency as low as possible.

If it is lower than 20, latencies will start.

There is a different thing, MSPT, which is milliseconds per tick. This the average time a tick takes. If a tick takes more than 50 milliseconds, this will mean the server has to skip ticks to catch on TPS.

So, a low tick time and a high TPS essentially means no lags, other than ping and minor GC spikes, or plugins doing background work.

Thank so so much for this clear explanations :smiley:

I’ve done the recommended modifications (except the kernel and java update), and it has improved the TPS a little bit, but we still have the chunks issue (ChunkProviderServer.tick() taking 42.14%).
How it is possible, knowing that all the chunks are generated? I’ve also changed the Anti-Xray to the Engine Mode 1.

Here is a new Spark profile: https://spark.lucko.me/#yn4MqI456e
And here is a new Aikar’s timing: https://timings.aikar.co/?id=f13567c6b3be4521a1b896084e38a096

I’ll also try to talk about this in Discord.

Thank you :slight_smile:

“How it is possible, knowing that all the chunks are generated?”, well, even if you pre-generate chunks, chunks still have to be loaded from disk into memory. Pre-generating only removes overhead of world/chunk/region generation. The chunks will still be loaded from disk and saved to the disk back.

However your problem is most probably not loading them, but ticking them, i.e updating entities on them, etc.

After looking at your configs, I found out that your paper.yml has delay-chunk-unloads-by: 60s setting. This setting delays all chunk unloads by 60 seconds even when they are no longer needed. The default setting is 10s, and Purpur/Tuinity has a default of 5s as far as I remember. I can recommend trying the default value, since entities/tile entities will only tick when the chunk they are in keeps loaded, this maybe the culprit.