We are running Nginx and upstream on the same machine using docker, so
there's no firewall.
I did a test locally and captured the network packages.
For the normal requests, upstream send a [FIN, ACK] to nginx after
keep-alive timeout (500 ms), and nginx also send a [FIN, ACK] back, then
upstream send a [ACK] to close the connection completely.
1
2
3
4
5
6
7
8
9
10
11
No. Time Source
Destination Protocol Length Info
1 2017-11-12 17:11:04.299146 172.18.0.3
172.18.0.2 TCP 74 48528 â 8000 [SYN] Seq=0 Win=29200
Len=0 MSS=1460 SACK_PERM=1 TSval=32031305 TSecr=0 WS=128
2 2017-11-12 17:11:04.299171 172.18.0.2
172.18.0.3 TCP 74 8000 â 48528 [SYN, ACK] Seq=0 Ack=1
Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=32031305 TSecr=32031305 WS=128
3 2017-11-12 17:11:04.299194 172.18.0.3
172.18.0.2 TCP 66 48528 â 8000 [ACK] Seq=1 Ack=1
Win=29312 Len=0 TSval=32031305 TSecr=32031305
4 2017-11-12 17:11:04.299259 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
5 2017-11-12 17:11:04.299267 172.18.0.2
172.18.0.3 TCP 66 8000 â 48528 [ACK] Seq=1 Ack=176
Win=30080 Len=0 TSval=32031305 TSecr=32031305
6 2017-11-12 17:11:04.299809 172.18.0.2
172.18.0.3 HTTP 271 HTTP/1.1 200 OK (text/html)
7 2017-11-12 17:11:04.299852 172.18.0.3
172.18.0.2 TCP 66 48528 â 8000 [ACK] Seq=176 Ack=206
Win=30336 Len=0 TSval=32031305 TSecr=32031305
8 2017-11-12 17:11:04.800805 172.18.0.2
172.18.0.3 TCP 66 8000 â 48528 [FIN, ACK] Seq=206
Ack=176 Win=30080 Len=0 TSval=32031355 TSecr=32031305
9 2017-11-12 17:11:04.801120 172.18.0.3
172.18.0.2 TCP 66 48528 â 8000 [FIN, ACK] Seq=176
Ack=207 Win=30336 Len=0 TSval=32031355 TSecr=32031355
10 2017-11-12 17:11:04.801151 172.18.0.2
172.18.0.3 TCP 66 8000 â 48528 [ACK] Seq=207 Ack=177
Win=30080 Len=0 TSval=32031355 TSecr=32031355
For the failed requests, upstream received a new http request when it had
closed the connection after keep-alive timeout (500 ms) and hasnât got a
chance to send the [FIN] package. Because of the connection has been closed
from upstreamâs perspective, so it send a [RST] response for this request.
1
2
3
4
5
6
7
8
9
10
11
12
13
No. Time Source
Destination Protocol Length Info
433 2017-11-12 17:11:26.548449 172.18.0.3
172.18.0.2 TCP 74 48702 â 8000 [SYN] Seq=0 Win=29200
Len=0 MSS=1460 SACK_PERM=1 TSval=32033530 TSecr=0 WS=128
434 2017-11-12 17:11:26.548476 172.18.0.2
172.18.0.3 TCP 74 8000 â 48702 [SYN, ACK] Seq=0 Ack=1
Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=32033530 TSecr=32033530 WS=128
435 2017-11-12 17:11:26.548502 172.18.0.3
172.18.0.2 TCP 66 48702 â 8000 [ACK] Seq=1 Ack=1
Win=29312 Len=0 TSval=32033530 TSecr=32033530
436 2017-11-12 17:11:26.548609 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
437 2017-11-12 17:11:26.548618 172.18.0.2
172.18.0.3 TCP 66 8000 â 48702 [ACK] Seq=1 Ack=176
Win=30080 Len=0 TSval=32033530 TSecr=32033530
438 2017-11-12 17:11:26.549173 172.18.0.2
172.18.0.3 HTTP 271 HTTP/1.1 200 OK (text/html)
439 2017-11-12 17:11:26.549230 172.18.0.3
172.18.0.2 TCP 66 48702 â 8000 [ACK] Seq=176 Ack=206
Win=30336 Len=0 TSval=32033530 TSecr=32033530
440 2017-11-12 17:11:27.049668 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
441 2017-11-12 17:11:27.050324 172.18.0.2
172.18.0.3 HTTP 271 HTTP/1.1 200 OK (text/html)
442 2017-11-12 17:11:27.050378 172.18.0.3
172.18.0.2 TCP 66 48702 â 8000 [ACK] Seq=351 Ack=411
Win=31360 Len=0 TSval=32033580 TSecr=32033580
443 2017-11-12 17:11:27.551182 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
444 2017-11-12 17:11:27.551294 172.18.0.2
172.18.0.3 TCP 66 8000 â 48702 [RST, ACK] Seq=411
Ack=526 Win=32256 Len=0 TSval=32033630 TSecr=32033630
When nginx receives the [RST] package, it will log a âConnection resetâ
error.
I'm testing by set up the environment:
Upstream (Node.js server):
- Set keep-alive timeout to 500 ms
Test client:
- Keep sending requests with an interval
- Interval starts from 500 ms and decrease 0.1 ms after each request
For more detailed description of the test process, you can reference my
post at:
https://theantway.com/2017/11/analyze-connection-reset-error-in-nginx-upstream-with-keep-alive-enabled/
To Fix the issue, I tried to add a timeout for keep-alived upstream, and
you can check the patch at:
https://github.com/weixu365/nginx/blob/docker-1.13.6/docker/stretch/patches/01-http-upstream-keepalive-timeout.patch
The patch is for my current testing, and I can create a different format if
you need.
Regards
Wei Xu
Post by Maxim DouninHello!
Post by Wei XuHi
I saw there's an issue talking about "implement keepalive timeout for
upstream <https://trac.nginx.org/nginx/ticket/1170>".
I have a different scenario for this requirement.
I'm using Node.js web server as upstream, and set keep alive time out to
60
Post by Wei Xusecond in nodejs server. The problem is I found more than a hundred
"Connection reset by peer" errors everyday.
Because there's no any errors on nodejs side, I guess it was because of
the
Post by Wei Xuupstream has disconnected, and at the same time, nginx send a new
request,
Post by Wei Xuthen received a TCP RST.
Could you please trace what actually happens on the network level
to confirm the guess is correct?
Also, please check that there are no stateful firewalls between
nginx and the backend. A firewall which drops the state before
the timeout expires looks like a much likely cause for such
errors.
--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx-devel mailing list
http://mailman.nginx.org/mailman/listinfo/nginx-devel