I’m having a hard time trying to reach to higher levels of requests per second and am looking to understand what factors would limit the performance so I can make a decision if I want to invest in that (hardware or software changes).
Using the guides I can find online, I can get around 50,000 to 80,000 requests per second for a trivial static request on my system on localhost, with defaults, the results are 10 times slower. This is pretty good, but it seems low compared to what techempower.com benchmarks are showing. How can they be saying there are servers that can be 50 times faster on the same hardware? I don’t get those results when I test some of them. It seems like the systems I’m testing have bottlenecks outside the process which I don’t know how to tune or that the network card speed is more important then everything else.
This seems to be the most important change, and I did the equivalent changes in windows registry to help performance. It allows use of many more ports.
net.ipv4.ip_local_port_range = 2000 65000
I also changed ulimit -n 16384
There are several other settings I tried that don’t seem to help as much.
Is it possible to somehow use even more ports then that on the same system with separate IP addresses? I was not able to prove this to be a benefit with virtualbox, but maybe it is true in a dedicated server with multiple NICs? I understand the throughput will be higher, but I’m not sure if it will have the effect of “doubling” or “tripling” the requests per second with 2 or 3 NICs, or if this is more of an operating system limit.
My realistic C++ application has been optimized and can process up to 2 million internal requests per second, and I’m trying to find the most efficient way to handle the network part so that I can be as direct as possible and preserve as much of the speed as possible. If I just attach the application to Nginx, it will slow down to the 80,000 requests per second or lower which is still good, but I’ve been trying to optimize each area and I don’t know if I need to settle for that or dive deeper. I’d like to hold on to as much performance as possible even if I have to implement the socket communications myself. If I can’t verify a faster setup is possible, I don’t really want to try to implement the socket since relying on other people figuring out HTTP 2 and TLS stuff is better to me.
I’ve been configuring and benchmarking nginx, openlitespeed, libreactor, Java AsynchronousServerSocketChannel and I get nearly the same “best case” results on each one, once they are optimized. The result I was expecting was that they would be much different speeds and closer to the relationship they have in the techempower benchmarks, but it seems like the operating system or hardware is limiting them all.
I’ve been testing on several systems I have access to. Windows, Ubuntu in virtualbox, Ubuntu with Xeon Gold cpu in a dedicated server and digitalocean droplet with just 2 shared cpus. None of them have 10Gbe, I think they are all 1Gbe. They all have similar results. I think my Xeon Gold in Linux would be the best, but I’ve never seen any benchmark on these systems get anywhere close to what is being shown on these techempower.com web server benchmarks.
I’m also not sure if the network card speed is a factor for localhost performance, but I usually test localhost without TLS to see the fastest possible result. I’ve used multiple instances of ab or just one instance of wrk for testing, and generally 100 to 200 concurrency has the best result.
Are higher speeds mainly a factor of the network card, or is there more configuration I can do? I’m currently using systems with 1Gbe connections because that was practical and enough. Where would I get the most improvement. Network card, CPU, some system configurations, something else?
I also don’t know if localhost would be stuck at the speed of the NIC or if it can run at the speed of the CPU.