Discussion:
[ module ] Add http upstream keep alive timeout parameter
Wei Xu
2017-11-02 09:41:16 UTC
Permalink
Hi
I saw there's an issue talking about "implement keepalive timeout for
upstream <https://trac.nginx.org/nginx/ticket/1170>".

I have a different scenario for this requirement.

I'm using Node.js web server as upstream, and set keep alive time out to 60
second in nodejs server. The problem is I found more than a hundred
"Connection reset by peer" errors everyday.

Because there's no any errors on nodejs side, I guess it was because of the
upstream has disconnected, and at the same time, nginx send a new request,
then received a TCP RST.

I tried Tengine <http://tengine.taobao.org/> which is a taobao cloned
version of nginx, and set upstream keep alive timeout to 30s, then there's
no errors any more.

So I want to know is there any plan to work on this enhancement? or can I
submit a patch for it?



Best Regards
Wei Xu
Maxim Dounin
2017-11-09 17:07:02 UTC
Permalink
Hello!
Post by Wei Xu
Hi
I saw there's an issue talking about "implement keepalive timeout for
upstream <https://trac.nginx.org/nginx/ticket/1170>".
I have a different scenario for this requirement.
I'm using Node.js web server as upstream, and set keep alive time out to 60
second in nodejs server. The problem is I found more than a hundred
"Connection reset by peer" errors everyday.
Because there's no any errors on nodejs side, I guess it was because of the
upstream has disconnected, and at the same time, nginx send a new request,
then received a TCP RST.
Could you please trace what actually happens on the network level
to confirm the guess is correct?

Also, please check that there are no stateful firewalls between
nginx and the backend. A firewall which drops the state before
the timeout expires looks like a much likely cause for such
errors.
--
Maxim Dounin
http://mdounin.ru/
Wei Xu
2017-11-12 12:25:20 UTC
Permalink
We are running Nginx and upstream on the same machine using docker, so
there's no firewall.

I did a test locally and captured the network packages.

For the normal requests, upstream send a [FIN, ACK] to nginx after
keep-alive timeout (500 ms), and nginx also send a [FIN, ACK] back, then
upstream send a [ACK] to close the connection completely.
1
2
3
4
5
6
7
8
9
10
11
No. Time Source
Destination Protocol Length Info
1 2017-11-12 17:11:04.299146 172.18.0.3
172.18.0.2 TCP 74 48528 → 8000 [SYN] Seq=0 Win=29200
Len=0 MSS=1460 SACK_PERM=1 TSval=32031305 TSecr=0 WS=128
2 2017-11-12 17:11:04.299171 172.18.0.2
172.18.0.3 TCP 74 8000 → 48528 [SYN, ACK] Seq=0 Ack=1
Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=32031305 TSecr=32031305 WS=128
3 2017-11-12 17:11:04.299194 172.18.0.3
172.18.0.2 TCP 66 48528 → 8000 [ACK] Seq=1 Ack=1
Win=29312 Len=0 TSval=32031305 TSecr=32031305
4 2017-11-12 17:11:04.299259 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
5 2017-11-12 17:11:04.299267 172.18.0.2
172.18.0.3 TCP 66 8000 → 48528 [ACK] Seq=1 Ack=176
Win=30080 Len=0 TSval=32031305 TSecr=32031305
6 2017-11-12 17:11:04.299809 172.18.0.2
172.18.0.3 HTTP 271 HTTP/1.1 200 OK (text/html)
7 2017-11-12 17:11:04.299852 172.18.0.3
172.18.0.2 TCP 66 48528 → 8000 [ACK] Seq=176 Ack=206
Win=30336 Len=0 TSval=32031305 TSecr=32031305
8 2017-11-12 17:11:04.800805 172.18.0.2
172.18.0.3 TCP 66 8000 → 48528 [FIN, ACK] Seq=206
Ack=176 Win=30080 Len=0 TSval=32031355 TSecr=32031305
9 2017-11-12 17:11:04.801120 172.18.0.3
172.18.0.2 TCP 66 48528 → 8000 [FIN, ACK] Seq=176
Ack=207 Win=30336 Len=0 TSval=32031355 TSecr=32031355
10 2017-11-12 17:11:04.801151 172.18.0.2
172.18.0.3 TCP 66 8000 → 48528 [ACK] Seq=207 Ack=177
Win=30080 Len=0 TSval=32031355 TSecr=32031355


For the failed requests, upstream received a new http request when it had
closed the connection after keep-alive timeout (500 ms) and hasn’t got a
chance to send the [FIN] package. Because of the connection has been closed
from upstream’s perspective, so it send a [RST] response for this request.

1
2
3
4
5
6
7
8
9
10
11
12
13
No. Time Source
Destination Protocol Length Info
433 2017-11-12 17:11:26.548449 172.18.0.3
172.18.0.2 TCP 74 48702 → 8000 [SYN] Seq=0 Win=29200
Len=0 MSS=1460 SACK_PERM=1 TSval=32033530 TSecr=0 WS=128
434 2017-11-12 17:11:26.548476 172.18.0.2
172.18.0.3 TCP 74 8000 → 48702 [SYN, ACK] Seq=0 Ack=1
Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=32033530 TSecr=32033530 WS=128
435 2017-11-12 17:11:26.548502 172.18.0.3
172.18.0.2 TCP 66 48702 → 8000 [ACK] Seq=1 Ack=1
Win=29312 Len=0 TSval=32033530 TSecr=32033530
436 2017-11-12 17:11:26.548609 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
437 2017-11-12 17:11:26.548618 172.18.0.2
172.18.0.3 TCP 66 8000 → 48702 [ACK] Seq=1 Ack=176
Win=30080 Len=0 TSval=32033530 TSecr=32033530
438 2017-11-12 17:11:26.549173 172.18.0.2
172.18.0.3 HTTP 271 HTTP/1.1 200 OK (text/html)
439 2017-11-12 17:11:26.549230 172.18.0.3
172.18.0.2 TCP 66 48702 → 8000 [ACK] Seq=176 Ack=206
Win=30336 Len=0 TSval=32033530 TSecr=32033530
440 2017-11-12 17:11:27.049668 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
441 2017-11-12 17:11:27.050324 172.18.0.2
172.18.0.3 HTTP 271 HTTP/1.1 200 OK (text/html)
442 2017-11-12 17:11:27.050378 172.18.0.3
172.18.0.2 TCP 66 48702 → 8000 [ACK] Seq=351 Ack=411
Win=31360 Len=0 TSval=32033580 TSecr=32033580
443 2017-11-12 17:11:27.551182 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
444 2017-11-12 17:11:27.551294 172.18.0.2
172.18.0.3 TCP 66 8000 → 48702 [RST, ACK] Seq=411
Ack=526 Win=32256 Len=0 TSval=32033630 TSecr=32033630

When nginx receives the [RST] package, it will log a ‘Connection reset’
error.

I'm testing by set up the environment:

Upstream (Node.js server):

- Set keep-alive timeout to 500 ms

Test client:

- Keep sending requests with an interval
- Interval starts from 500 ms and decrease 0.1 ms after each request

For more detailed description of the test process, you can reference my
post at:
https://theantway.com/2017/11/analyze-connection-reset-error-in-nginx-upstream-with-keep-alive-enabled/

To Fix the issue, I tried to add a timeout for keep-alived upstream, and
you can check the patch at:
https://github.com/weixu365/nginx/blob/docker-1.13.6/docker/stretch/patches/01-http-upstream-keepalive-timeout.patch

The patch is for my current testing, and I can create a different format if
you need.

Regards

Wei Xu
Post by Maxim Dounin
Hello!
Post by Wei Xu
Hi
I saw there's an issue talking about "implement keepalive timeout for
upstream <https://trac.nginx.org/nginx/ticket/1170>".
I have a different scenario for this requirement.
I'm using Node.js web server as upstream, and set keep alive time out to
60
Post by Wei Xu
second in nodejs server. The problem is I found more than a hundred
"Connection reset by peer" errors everyday.
Because there's no any errors on nodejs side, I guess it was because of
the
Post by Wei Xu
upstream has disconnected, and at the same time, nginx send a new
request,
Post by Wei Xu
then received a TCP RST.
Could you please trace what actually happens on the network level
to confirm the guess is correct?
Also, please check that there are no stateful firewalls between
nginx and the backend. A firewall which drops the state before
the timeout expires looks like a much likely cause for such
errors.
--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx-devel mailing list
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Maxim Dounin
2017-11-13 19:49:46 UTC
Permalink
Hello!
Post by Wei Xu
We are running Nginx and upstream on the same machine using docker, so
there's no firewall.
Note that this isn't usually true. Docker uses iptables
implicitly, and unless you specifically checked your iptables
configuration - likely you are using firewall.
Post by Wei Xu
I did a test locally and captured the network packages.
For the normal requests, upstream send a [FIN, ACK] to nginx after
keep-alive timeout (500 ms), and nginx also send a [FIN, ACK] back, then
upstream send a [ACK] to close the connection completely.
[...]
Post by Wei Xu
For more detailed description of the test process, you can reference my
https://theantway.com/2017/11/analyze-connection-reset-error-in-nginx-upstream-with-keep-alive-enabled/
The test demonstrates that it is indeed possible to trigger the
problem in question. Unfortunately, it doesn't provide any proof
that what you observed in production is the same issue though.

While it is more or less clear that the race condition in question
is real, it seems to be very unlikely with typical workloads. And
even when triggered, in most cases nginx handles this good enough,
re-trying the request per proxy_next_upstream.

Nevertheless, thank you for detailed testing. A simple test case
that reliably demonstrates the race is appreciated, and I was able
to reduce it to your client script and nginx with the following
trivial configuration:

upstream u {
server 127.0.0.1:8082;
keepalive 10;
}

server {
listen 8080;

location / {
proxy_pass http://u;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}

server {
listen 8082;

keepalive_timeout 500ms;

location / {
return 200 ok\n;
}
}
Post by Wei Xu
To Fix the issue, I tried to add a timeout for keep-alived upstream, and
https://github.com/weixu365/nginx/blob/docker-1.13.6/docker/stretch/patches/01-http-upstream-keepalive-timeout.patch
The patch is for my current testing, and I can create a different format if
you need.
The patch looks good enough for testing, though there are various
minor issues - notably testing timeout for NGX_CONF_UNSET_MSEC at
runtime, using wrong type for timeout during parsing (time_t
instead of ngx_msec_t).

Also I tend to think that using a separate keepalive_timeout
directive should be easier, and we probably want to introduce some
default value for it.

Please take a look if the following patch works for you:

# HG changeset patch
# User Maxim Dounin <***@mdounin.ru>
# Date 1510601341 -10800
# Mon Nov 13 22:29:01 2017 +0300
# Node ID 9ba0a577601b7c1b714eb088bc0b0d21c6354699
# Parent 6f592a42570898e1539d2e0b86017f32bbf665c8
Upstream keepalive: keepalive_timeout directive.

The directive configures maximum time a connection can be kept in the
cache. By configuring a time which is smaller than the corresponding
timeout on the backend side one can avoid the race between closing
a connection by the backend and nginx trying to use the same connection
to send a request at the same time.

diff --git a/src/http/modules/ngx_http_upstream_keepalive_module.c b/src/http/modules/ngx_http_upstream_keepalive_module.c
--- a/src/http/modules/ngx_http_upstream_keepalive_module.c
+++ b/src/http/modules/ngx_http_upstream_keepalive_module.c
@@ -12,6 +12,7 @@

typedef struct {
ngx_uint_t max_cached;
+ ngx_msec_t timeout;

ngx_queue_t cache;
ngx_queue_t free;
@@ -84,6 +85,13 @@ static ngx_command_t ngx_http_upstream_
0,
NULL },

+ { ngx_string("keepalive_timeout"),
+ NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1,
+ ngx_conf_set_msec_slot,
+ NGX_HTTP_SRV_CONF_OFFSET,
+ offsetof(ngx_http_upstream_keepalive_srv_conf_t, timeout),
+ NULL },
+
ngx_null_command
};

@@ -141,6 +149,8 @@ ngx_http_upstream_init_keepalive(ngx_con

us->peer.init = ngx_http_upstream_init_keepalive_peer;

+ ngx_conf_init_msec_value(kcf->timeout, 60000);
+
/* allocate cache items and add to free queue */

cached = ngx_pcalloc(cf->pool,
@@ -261,6 +271,10 @@ found:
c->write->log = pc->log;
c->pool->log = pc->log;

+ if (c->read->timer_set) {
+ ngx_del_timer(c->read);
+ }
+
pc->connection = c;
pc->cached = 1;

@@ -339,9 +353,8 @@ ngx_http_upstream_free_keepalive_peer(ng

pc->connection = NULL;

- if (c->read->timer_set) {
- ngx_del_timer(c->read);
- }
+ ngx_add_timer(c->read, kp->conf->timeout);
+
if (c->write->timer_set) {
ngx_del_timer(c->write);
}
@@ -392,7 +405,7 @@ ngx_http_upstream_keepalive_close_handle

c = ev->data;

- if (c->close) {
+ if (c->close || c->read->timedout) {
goto close;
}

@@ -485,6 +498,8 @@ ngx_http_upstream_keepalive_create_conf(
* conf->max_cached = 0;
*/

+ conf->timeout = NGX_CONF_UNSET_MSEC;
+
return conf;
}
--
Maxim Dounin
http://mdounin.ru/
Wei Xu
2017-11-14 03:03:04 UTC
Permalink
Hi,

Really nice, much simpler than my patch. It's great to have a default
timeout value. thanks for you time.
Post by Maxim Dounin
Hello!
Post by Wei Xu
We are running Nginx and upstream on the same machine using docker, so
there's no firewall.
Note that this isn't usually true. Docker uses iptables
implicitly, and unless you specifically checked your iptables
configuration - likely you are using firewall.
Post by Wei Xu
I did a test locally and captured the network packages.
For the normal requests, upstream send a [FIN, ACK] to nginx after
keep-alive timeout (500 ms), and nginx also send a [FIN, ACK] back, then
upstream send a [ACK] to close the connection completely.
[...]
Post by Wei Xu
For more detailed description of the test process, you can reference my
https://theantway.com/2017/11/analyze-connection-reset-
error-in-nginx-upstream-with-keep-alive-enabled/
The test demonstrates that it is indeed possible to trigger the
problem in question. Unfortunately, it doesn't provide any proof
that what you observed in production is the same issue though.
While it is more or less clear that the race condition in question
is real, it seems to be very unlikely with typical workloads. And
even when triggered, in most cases nginx handles this good enough,
re-trying the request per proxy_next_upstream.
Nevertheless, thank you for detailed testing. A simple test case
that reliably demonstrates the race is appreciated, and I was able
to reduce it to your client script and nginx with the following
upstream u {
server 127.0.0.1:8082;
keepalive 10;
}
server {
listen 8080;
location / {
proxy_pass http://u;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
server {
listen 8082;
keepalive_timeout 500ms;
location / {
return 200 ok\n;
}
}
Post by Wei Xu
To Fix the issue, I tried to add a timeout for keep-alived upstream, and
https://github.com/weixu365/nginx/blob/docker-1.13.6/
docker/stretch/patches/01-http-upstream-keepalive-timeout.patch
Post by Wei Xu
The patch is for my current testing, and I can create a different format
if
Post by Wei Xu
you need.
The patch looks good enough for testing, though there are various
minor issues - notably testing timeout for NGX_CONF_UNSET_MSEC at
runtime, using wrong type for timeout during parsing (time_t
instead of ngx_msec_t).
Also I tend to think that using a separate keepalive_timeout
directive should be easier, and we probably want to introduce some
default value for it.
# HG changeset patch
# Date 1510601341 -10800
# Mon Nov 13 22:29:01 2017 +0300
# Node ID 9ba0a577601b7c1b714eb088bc0b0d21c6354699
# Parent 6f592a42570898e1539d2e0b86017f32bbf665c8
Upstream keepalive: keepalive_timeout directive.
The directive configures maximum time a connection can be kept in the
cache. By configuring a time which is smaller than the corresponding
timeout on the backend side one can avoid the race between closing
a connection by the backend and nginx trying to use the same connection
to send a request at the same time.
diff --git a/src/http/modules/ngx_http_upstream_keepalive_module.c
b/src/http/modules/ngx_http_upstream_keepalive_module.c
--- a/src/http/modules/ngx_http_upstream_keepalive_module.c
+++ b/src/http/modules/ngx_http_upstream_keepalive_module.c
@@ -12,6 +12,7 @@
typedef struct {
ngx_uint_t max_cached;
+ ngx_msec_t timeout;
ngx_queue_t cache;
ngx_queue_t free;
@@ -84,6 +85,13 @@ static ngx_command_t ngx_http_upstream_
0,
NULL },
+ { ngx_string("keepalive_timeout"),
+ NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1,
+ ngx_conf_set_msec_slot,
+ NGX_HTTP_SRV_CONF_OFFSET,
+ offsetof(ngx_http_upstream_keepalive_srv_conf_t, timeout),
+ NULL },
+
ngx_null_command
};
@@ -141,6 +149,8 @@ ngx_http_upstream_init_keepalive(ngx_con
us->peer.init = ngx_http_upstream_init_keepalive_peer;
+ ngx_conf_init_msec_value(kcf->timeout, 60000);
+
/* allocate cache items and add to free queue */
cached = ngx_pcalloc(cf->pool,
c->write->log = pc->log;
c->pool->log = pc->log;
+ if (c->read->timer_set) {
+ ngx_del_timer(c->read);
+ }
+
pc->connection = c;
pc->cached = 1;
@@ -339,9 +353,8 @@ ngx_http_upstream_free_keepalive_peer(ng
pc->connection = NULL;
- if (c->read->timer_set) {
- ngx_del_timer(c->read);
- }
+ ngx_add_timer(c->read, kp->conf->timeout);
+
if (c->write->timer_set) {
ngx_del_timer(c->write);
}
@@ -392,7 +405,7 @@ ngx_http_upstream_keepalive_close_handle
c = ev->data;
- if (c->close) {
+ if (c->close || c->read->timedout) {
goto close;
}
@@ -485,6 +498,8 @@ ngx_http_upstream_keepalive_create_conf(
* conf->max_cached = 0;
*/
+ conf->timeout = NGX_CONF_UNSET_MSEC;
+
return conf;
}
--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx-devel mailing list
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Wei Xu
2017-11-22 06:31:25 UTC
Permalink
Hi,

Is there any place to view the status of current proposed patches? I'm not
sure if this patch had been accepted, still waiting or rejected?

In order to avoid errors in production, I'm running the patched version
now. But I think it would be better to run the official one, and also I can
introduce this solution for 'Connection reset by peer errors' to other
teams.
Post by Wei Xu
Hi,
Really nice, much simpler than my patch. It's great to have a default
timeout value. thanks for you time.
Post by Maxim Dounin
Hello!
Post by Wei Xu
We are running Nginx and upstream on the same machine using docker, so
there's no firewall.
Note that this isn't usually true. Docker uses iptables
implicitly, and unless you specifically checked your iptables
configuration - likely you are using firewall.
Post by Wei Xu
I did a test locally and captured the network packages.
For the normal requests, upstream send a [FIN, ACK] to nginx after
keep-alive timeout (500 ms), and nginx also send a [FIN, ACK] back, then
upstream send a [ACK] to close the connection completely.
[...]
Post by Wei Xu
For more detailed description of the test process, you can reference my
https://theantway.com/2017/11/analyze-connection-reset-error
-in-nginx-upstream-with-keep-alive-enabled/
The test demonstrates that it is indeed possible to trigger the
problem in question. Unfortunately, it doesn't provide any proof
that what you observed in production is the same issue though.
While it is more or less clear that the race condition in question
is real, it seems to be very unlikely with typical workloads. And
even when triggered, in most cases nginx handles this good enough,
re-trying the request per proxy_next_upstream.
Nevertheless, thank you for detailed testing. A simple test case
that reliably demonstrates the race is appreciated, and I was able
to reduce it to your client script and nginx with the following
upstream u {
server 127.0.0.1:8082;
keepalive 10;
}
server {
listen 8080;
location / {
proxy_pass http://u;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
server {
listen 8082;
keepalive_timeout 500ms;
location / {
return 200 ok\n;
}
}
Post by Wei Xu
To Fix the issue, I tried to add a timeout for keep-alived upstream, and
https://github.com/weixu365/nginx/blob/docker-1.13.6/docker/
stretch/patches/01-http-upstream-keepalive-timeout.patch
Post by Wei Xu
The patch is for my current testing, and I can create a different
format if
Post by Wei Xu
you need.
The patch looks good enough for testing, though there are various
minor issues - notably testing timeout for NGX_CONF_UNSET_MSEC at
runtime, using wrong type for timeout during parsing (time_t
instead of ngx_msec_t).
Also I tend to think that using a separate keepalive_timeout
directive should be easier, and we probably want to introduce some
default value for it.
# HG changeset patch
# Date 1510601341 -10800
# Mon Nov 13 22:29:01 2017 +0300
# Node ID 9ba0a577601b7c1b714eb088bc0b0d21c6354699
# Parent 6f592a42570898e1539d2e0b86017f32bbf665c8
Upstream keepalive: keepalive_timeout directive.
The directive configures maximum time a connection can be kept in the
cache. By configuring a time which is smaller than the corresponding
timeout on the backend side one can avoid the race between closing
a connection by the backend and nginx trying to use the same connection
to send a request at the same time.
diff --git a/src/http/modules/ngx_http_upstream_keepalive_module.c
b/src/http/modules/ngx_http_upstream_keepalive_module.c
--- a/src/http/modules/ngx_http_upstream_keepalive_module.c
+++ b/src/http/modules/ngx_http_upstream_keepalive_module.c
@@ -12,6 +12,7 @@
typedef struct {
ngx_uint_t max_cached;
+ ngx_msec_t timeout;
ngx_queue_t cache;
ngx_queue_t free;
@@ -84,6 +85,13 @@ static ngx_command_t ngx_http_upstream_
0,
NULL },
+ { ngx_string("keepalive_timeout"),
+ NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1,
+ ngx_conf_set_msec_slot,
+ NGX_HTTP_SRV_CONF_OFFSET,
+ offsetof(ngx_http_upstream_keepalive_srv_conf_t, timeout),
+ NULL },
+
ngx_null_command
};
@@ -141,6 +149,8 @@ ngx_http_upstream_init_keepalive(ngx_con
us->peer.init = ngx_http_upstream_init_keepalive_peer;
+ ngx_conf_init_msec_value(kcf->timeout, 60000);
+
/* allocate cache items and add to free queue */
cached = ngx_pcalloc(cf->pool,
c->write->log = pc->log;
c->pool->log = pc->log;
+ if (c->read->timer_set) {
+ ngx_del_timer(c->read);
+ }
+
pc->connection = c;
pc->cached = 1;
@@ -339,9 +353,8 @@ ngx_http_upstream_free_keepalive_peer(ng
pc->connection = NULL;
- if (c->read->timer_set) {
- ngx_del_timer(c->read);
- }
+ ngx_add_timer(c->read, kp->conf->timeout);
+
if (c->write->timer_set) {
ngx_del_timer(c->write);
}
@@ -392,7 +405,7 @@ ngx_http_upstream_keepalive_close_handle
c = ev->data;
- if (c->close) {
+ if (c->close || c->read->timedout) {
goto close;
}
@@ -485,6 +498,8 @@ ngx_http_upstream_keepalive_create_conf(
* conf->max_cached = 0;
*/
+ conf->timeout = NGX_CONF_UNSET_MSEC;
+
return conf;
}
--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx-devel mailing list
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Maxim Dounin
2017-11-22 16:00:17 UTC
Permalink
Hello!
Post by Wei Xu
Hi,
Is there any place to view the status of current proposed patches? I'm not
sure if this patch had been accepted, still waiting or rejected?
In order to avoid errors in production, I'm running the patched version
now. But I think it would be better to run the official one, and also I can
introduce this solution for 'Connection reset by peer errors' to other
teams.
The patch in question is sitting in my patch queue waiting for
further work - I consider introducing keepalive_requests at the
same time, and probably $upstream_connection and
$upstream_connection_requests variables.
--
Maxim Dounin
http://mdounin.ru/
Wei Xu
2018-01-05 04:53:46 UTC
Permalink
Hi,

Is it possible to merge the upstream keep alive feature first? because it's
a valuable and simple patch.

We're using React server render, and by adding Nginx as the reverse proxy
on each server, our AWS EC2 instances count reduced 25%, from 43 to 27-37
C4.Large instances.

I wrote a detailed article to explain what happened and why it works at:
https://theantway.com/2017/12/metrics-driven-development-how-did-i-reduced-aws-ec2-costs-to-27-and-improved-performance/


The only problem now is we still using the custom patched version, which
makes it *difficult to share the solution with other teams*. So back to the
initial question, is it possible to merge this feature first, and you can
create separate patches if you need to add more features later.

Regards

Wei
Post by Maxim Dounin
Hello!
Post by Wei Xu
Hi,
Is there any place to view the status of current proposed patches? I'm
not
Post by Wei Xu
sure if this patch had been accepted, still waiting or rejected?
In order to avoid errors in production, I'm running the patched version
now. But I think it would be better to run the official one, and also I
can
Post by Wei Xu
introduce this solution for 'Connection reset by peer errors' to other
teams.
The patch in question is sitting in my patch queue waiting for
further work - I consider introducing keepalive_requests at the
same time, and probably $upstream_connection and
$upstream_connection_requests variables.
--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx-devel mailing list
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Maxim Dounin
2018-01-07 16:33:56 UTC
Permalink
Hello!
Post by Wei Xu
Is it possible to merge the upstream keep alive feature first? because it's
a valuable and simple patch.
We're using React server render, and by adding Nginx as the reverse proxy
on each server, our AWS EC2 instances count reduced 25%, from 43 to 27-37
C4.Large instances.
https://theantway.com/2017/12/metrics-driven-development-how-did-i-reduced-aws-ec2-costs-to-27-and-improved-performance/
The only problem now is we still using the custom patched version, which
makes it *difficult to share the solution with other teams*. So back to the
initial question, is it possible to merge this feature first, and you can
create separate patches if you need to add more features later.
Sorry, but unlikely I'll be able to spend more time on this in the
upcoming couple of weeks at least. And I certainly don't want to
commit incomplete solution, as keepalive_requests might be
equally important for some workloads.

Meanwhile, you may want to consider solutions which do not require
any patching, in particular:

- configuring upstream group and proxy_next_upstream
appropriately, so nginx will retry failed requests (this is the
default as long as you have more than one upstream server
configured and requests are idempotent);

- tuning your backend to use higher keepalive timeouts, which will
made the race unlikely.
--
Maxim Dounin
http://mdounin.ru/
Loading...