elastic beanstalk – Nginx + uWSGI + Flask Connection Reset

Problem

I have a Flask app deployed using Elastic Beanstalk’s “Single Container Docker” platform (latest revision 3.2.5 at the time of writing), with an “Application Load Balancer” in front of it. I had this same Flask app deployed in EB with the “Python 3.6” platform (and a “Classic Load Balancer”) for ages, but have since started having issues after upgrading to the new deployment. I am a relative novice when it comes to configuring Nginx / uWSGI, bear with me…

Specific Issue

I see the following errors in my Nginx error.log file on ~0.01% of the requests my environment handles:

<timestamp> (error) 22400#0: *52192 upstream prematurely closed connection while reading response header from upstream, client: <ip>, server: , request: "POST <endpoint> HTTP/1.1", upstream: "http://<ip>:5000/<endpoint>", host: "<my hostname>"
...
<timestamp> (error) 22400#0: *101979 readv() failed (104: Connection reset by peer) while reading upstream, client: <ip>, server: , request: "POST <endpoint> HTTP/1.1", upstream: "http://<ip>:5000/<endpoint>", host: "<my hostname>"

I see these errors across requests to different endpoints, using different HTTP methods (GET and POST), and at seemingly random times. Additionally, I do not see any application errors in my Flask app logs, which indicates that this is not an application issue but rather a configuration one.

Discussion

I ended up reading and trying a lot of stuff, so I’ll recount my experience for posterity. The answer I arrived at seems so simple that I’m still suspicious that I’ve got it right.

From the reading I’ve done, this sounded like a pretty straightforward issue with some misconfigured timeouts between Nginx + uWSGI. I was encouraged after reading this post which describes almost my exact situation with Elastic Beanstalk.

Part 1: Semi-Random Flailing

In the numerous and varied answers on this post I found some things to try:

  1. I tried setting the uWSGI parameter post-buffering = 32768 since people suggested that. It did not help, which makes sense because the setting applies only to requests with a Body and I had been observing the aforementioned errors on GET requests as well.
  2. I tried playing with Nginx’s keepalive + keepalive_timeout and uWSGI’s so-keepalive, http-timeout, and socket-timeout.

I realized from reading the docs that these uWSGI settings definitely weren’t going to help, although I held out hope for so-keepalive.

At this point I did notice a relatively significant decrease in the frequency of these errors, but they did not go away altogether. Like a bad engineer, I changed multiple variables at once in some of these trials. Thus, it’s hard to know exactly what helped. I suspect I made things better by setting Nginx keepalive to a number of connections <= what I saw was the maximum connections it could handle in the uWSGI log (100 connections). Anyone else’s insight on that one is welcome, albeit there’s not much to go on…

Part 2: A fix, I think…

I decided to try overriding the default upstream definition Elastic Beanstalk puts into the Nginx config. The original looked like this:

upstream docker {
    server <some ip>:5000;
    keepalive 256;
}

All I did was replace this with my own upstream, change the Nginx location to point at my custom upstream (below) and simply not set the keepalive parameter. Like so:

upstream myapp {
    server <some ip>:5000;
}
...
location / {
    # proxy_pass http://docker;
    proxy_pass http://myapp;
}

This seems to work… Since putting in the change I have basically seen zero 5xx errors in my Elastic Beanstalk environment. The fact that this works also seems to be corroborated by this answer which mentions:

… a uwsgi connection isn’t reusable and so it gets closed and a new connection opened with every request; it wouldn’t remain idle for any significant amount of time in normal operation.

I’m not sure where that is documented, but I didn’t notice it when reading about using uWSGI + Nginx. If that’s accurate, it certainly explains a lot.

Conclusion / Help?

I’m really glad I was able to figure this one out and the API seems to be working really well, but I can’t kick the feeling that I don’t understand why this works or I’ve committed some grave sin with this configuration.

It felt a bit cumbersome to override this stuff in Elastic Beanstalk, which makes me think I shouldn’t have. With the popularity of uWSGI for python webapps, my spidey-sense is telling me that there should have already been numerous posts about this Nginx keepalive playing poorly with uWSGI. Especially since that’s in the default configuration for this Elastic Beanstalk platform.

If you’ve read this far and know things, feel free to weigh in on the situation. Hopefully, at the very least, the next person to see those errors in their Nginx logs has another data point as to what the problem could be.