-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heavy memory usage difference between 0.24 and 0.25 #480
Comments
Hi, I noticed this as well when upgrading axum. I think the reason for this is #468 Can you retry and set |
Can you |
I don't quite understand why the memory tradeoff for 3% of speed is worth in #468... |
I can't reproduce the issue with the |
Did you try to use 50k sockets for The actual "speed vs memory" trade-off is not as simple as it seems. Speed is not just speed, it is the number of concurrent connections being needed. Being faster effectively means having less clients in the wait queue and, for many applications, this is equivalent to less connections in parallel. 1 GB RAM out of 64 GB RAM (1.5% RAM) for 3% more performance is a fair trade. On the other hand, many applications do not benefit at all from the increased buffer size since due to the nature of their communication. They just waste RAM and get nothing in exchange. TL;DR: there is no "one-size-fits-all" buffer size. The configuration depends on the actual requirements. |
@Zarathustra2 is right in assuming that it was the increase in the size of the read buffer that caused higher memory consumption, however, it's not the linked PR that caused such a stark difference. The aforementioned PR only changed it from 64 KiB to 128 KiB. The real difference was introduced earlier, when our internal custom
@Totodore, so once you set
That particular function should not appear in the stack trace at all if we're talking about the k6 configuration that is available in your repository. I suppose the screenshot refers to a different benchmark (I've noticed that the filename in your README does not match the config name in the repository). PS: I agree that our current default read buffer configuration might be too large for a general use case. I believe the buffer size was set to such a high value due to our benchmarks that showed better results with larger buffer sizes. Perhaps we should tune it down to 16 KiB or something like this. |
Thanks for the details, I understand this much more now. The call traces are generated with the k6 client provided in the repo (there is just a typo on the readme). I don't understand why they should not be visible, the k6 client is doing TCP calls in the same way that a classic ws impl no? Even if the |
Right. However, the config in the repository lets each client connection send a single short message. Such a message would perfectly fit in the read buffer without needing an additional re-allocation inside the |
I think we can improve the Here is a comparison of different read buffer sizes to the 100k small message read bench:
128k is maybe a good default for users with lower numbers of ws connections (common for client usage?) as the performance improvement is desirable and the absolute extra memory required isn't much without multiplication. But perhaps not as good for users expecting many connections, as may be commonly expected for servers. I suppose it is a poor default for write heavy servers that don't expect to read much at all. Some potential actions:
Note that the write buffer setting has less impact as the capacity is more lazily used / may not be fully allocated if eagerly flushing, whereas the read buffer is fully allocated on init so it is available for reading into. |
One of the user of socketioxide, a socket.io implementation made with tokio-tungstenite reported a really high memory usage (around 3GB for 50K sockets) when benchmarking a basic ping/pong socket.io server.
I checked a bit and it appears that there is a big difference between 0.24 and 0.25 versions of tungstenite/tokio-tungstenite. I took the official echo example to do this and benchmarked with 10K sockets echoing messages.
Here is the repository with all the details and results: https://github.com/Totodore/tokio-tungstenite-mem-usage
Recap :
tokio-tungstenite 0.24.0
tokio-tungstenite 0.25.0
Tokio-tungstenite 0.25 eats 908MB of RAM vs 106.1MB for 0.24!
Possible causes
0.25 introduces the
Bytes
struct in theMessage
API. As it shares data directly received from the internal socket, I suspect that the cause is around this feature. According to the heaptrack trace, it comes from theFrameCodec
in_buffer
which seems to grows if it doesn't have enough space. But no reference to the message content is kept so theFrameCodec
should not grow that much.I also tried without echoing anything and the results are the same:
The text was updated successfully, but these errors were encountered: