SSL_read error on multiple simultaneous upstream SSL downloads

Discussion:

Agent Coulson

2013-10-18 18:01:14 UTC

I am able to reproduce the following error when I have nginx configured
with an upstream https connection. I have tweaked various settings all to
no avail (proxy_buffer_size, proxy_buffers, proxy_ssl_session_reuse).

2013/10/18 17:17:31 [debug] 15644#0: *39 SSL_read: -1, SSL_pending: 16384
2013/10/18 17:17:31 [debug] 15644#0: *39 SSL_get_error: 1
2013/10/18 17:17:31 [error] 15644#0: *39 SSL_read() failed (SSL:
error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record
mac) while sending to client, client: 127.0.0.1, server: -, request: "GET
/test-1 HTTP/1.1", upstream: "https://x.x.x.x:443/test-1", host:
"localhost:1182"

I've applied the following patch to log the SSL_pending bytes after an
SSL_read.

--- dist/nginx-1.4.3/src/event/ngx_event_openssl.c 2013-10-08
12:07:14.000000000 +0000
+++ new/nginx-1.4.3/src/event/ngx_event_openssl.c 2013-10-18
17:37:15.059940303 +0000
@@ -952,7 +952,9 @@ ngx_ssl_recv(ngx_connection_t *c, u_char

n = SSL_read(c->ssl->connection, buf, size);

- ngx_log_debug1(NGX_LOG_DEBUG_EVENT, c->log, 0, "SSL_read: %d", n);
+ ngx_log_debug2(NGX_LOG_DEBUG_EVENT, c->log, 0,
+ "SSL_read: %d, SSL_pending: %d",
+ n, SSL_pending(c->ssl->connection));

if (n > 0) {
bytes += n;

I've seen a bug report on this too (http://trac.nginx.org/nginx/ticket/215),
so thought i would send this here to see if anyone else is actively working
on the issue.

Here are my configure settings:

./configure --prefix=/var/nginx --with-debug --with-http_ssl_module
--without-http_auth_basic_module --without-http_autoindex_module
--without-http_browser_module --without-http-cache
--without-http_charset_module --without-http_empty_gif_module
--without-http_fastcgi_module --without-http_geo_module
--without-http_gzip_module --without-http_limit_conn_module
--without-http_map_module --without-http_memcached_module
--without-http_referer_module --without-http_rewrite_module
--without-http_scgi_module --without-http_split_clients_module
--without-http_ssi_module --without-http_upstream_ip_hash_module
--without-http_userid_module --without-http_uwsgi_module
--without-mail_imap_module --without-mail_pop3_module
--without-mail_smtp_module

Here is my configuration:

### Begin nginx.conf ###

worker_processes 1;

error_log logs/error.log debug;

pid logs/nginx.pid;

events {

worker_connections 1024;

}

http {

include mime.types;
default_type application/octet-stream;

access_log logs/access.log;

keepalive_timeout 60;

upstream http {

server upstream.srv:443;
keepalive 512;

}

server {

listen 1182 default_server;

server_name -;

ssl_protocols SSLv3 TLSv1;
ssl_ciphers RC4:HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;

location / {

proxy_pass https://http;

proxy_redirect off;
proxy_read_timeout 10s;
proxy_connect_timeout 6s;

proxy_buffering off;
proxy_buffer_size 64k;
proxy_buffers 6 16k;
proxy_busy_buffers_size 80k;

proxy_pass_header Server;
proxy_pass_header Date;
proxy_pass_header X-Pad;

proxy_set_header Connection "Keep-Alive";
proxy_set_header Host "upstream.srv";

}

}

}
### End nginx.conf ###
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20131018/db37df66/attachment.html>

Maxim Dounin

2013-10-18 19:06:05 UTC

Permalink

Hello!

Post by Agent Coulson
I am able to reproduce the following error when I have nginx configured
with an upstream https connection. I have tweaked various settings all to
no avail (proxy_buffer_size, proxy_buffers, proxy_ssl_session_reuse).
2013/10/18 17:17:31 [debug] 15644#0: *39 SSL_read: -1, SSL_pending: 16384
2013/10/18 17:17:31 [debug] 15644#0: *39 SSL_get_error: 1
error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record
mac) while sending to client, client: 127.0.0.1, server: -, request: "GET
"localhost:1182"

I tend to think it's highly unlikely it's a problem in nginx.
Most likely, it's a problem either in OpenSSL library used on
nginx side, or in SSL implementation used on a backend.

First thing I would recommend to test is to make sure you are able
to reporoduce the problem:

1. Using nginx statically compiled with a known version of the
OpenSSL library (--with-openssl=..., with sources from
openssl.org).

2. Using the same nginx as a backend.

[...]

Post by Agent Coulson
I've seen a bug report on this too (http://trac.nginx.org/nginx/ticket/215),
so thought i would send this here to see if anyone else is actively working
on the issue.

As of now, no one provided enough steps to reproduce the problem.
And, see above, most likely the problem is not in nginx.

--
Maxim Dounin
http://nginx.org/en/donation.html

Agent Coulson

2013-10-18 19:59:40 UTC

Permalink

Yes, I am able to reproduce this talking to the same nginx as an upstream,
here is my new config. To reproduce, create a file in the root which is
several Mb, i used 20Mb, and issus multiple simultaneous curl's to the
object, i found rate-limiting my curl is the best way to repro. This
suggests there is some problem when we have to buffer. I'm skeptical that
this is an openssl issue as I have used multiple different openssl versions
and still run into this. However for completeness, I've reprod with
openssl sources from openssl.org (openssl-1.0.1e) as you suggested.

Updated conf:

### Begin ngxin.conf ###

worker_processes 1;

error_log logs/error.log debug;

pid logs/nginx.pid;

events {
worker_connections 1024;
}

http {
include mime.types;
default_type application/octet-stream;

access_log logs/access.log;

keepalive_timeout 60;

upstream http {
server 127.0.0.1:1183;
keepalive 512;
}

server {
listen 1182 default_server;

server_name -;

ssl_protocols SSLv3 TLSv1;
ssl_ciphers RC4:HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;

location / {
proxy_pass https://http;

proxy_redirect off;
proxy_read_timeout 10s;
proxy_connect_timeout 6s;

proxy_buffering off;
proxy_buffer_size 64k;
proxy_buffers 6 16k;
proxy_busy_buffers_size 80k;

proxy_pass_header Server;
proxy_pass_header Date;
proxy_pass_header X-Pad;

proxy_set_header Connection "Keep-Alive";
proxy_set_header Host "upstream.srv";
}
}

server {
listen 1183 ssl;
server_name upstream.srv;

ssl_certificate /var/nginx/conf/upstream.srv.pem;
ssl_certificate_key /var/nginx/conf/upstream.srv.key;

ssl_protocols SSLv3 TLSv1;
ssl_ciphers RC4:HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;

location / {
root /var/nginx/html;
}
}
}

### End ngxin.conf ###

configure flags for nginx 1.4.3:

./configure --prefix=/var/nginx --with-debug --with-http_ssl_module
--without-http_auth_basic_module --without-http_autoindex_module
--without-http_browser_module --without-http-cache
--without-http_charset_module --without-http_empty_gif_module
--without-http_fastcgi_module --without-http_geo_module
--without-http_gzip_module --without-http_limit_conn_module
--without-http_map_module --without-http_memcached_module
--without-http_referer_module --without-http_rewrite_module
--without-http_scgi_module --without-http_split_clients_module
--without-http_ssi_module --without-http_upstream_ip_hash_module
--without-http_userid_module --without-http_uwsgi_module
--without-mail_imap_module --without-mail_pop3_module
--without-mail_smtp_module --with-openssl=/tmp/openssl-1.0.1e

I start nginx and then issue 3 simultaneous curl's from the local box,
rate-limited. This should be sufficient for anyone else to repro the issue.

curl --limit-rate 800k -v -o /dev/null http://localhost:1182/20m.txt&
curl --limit-rate 800k -v -o /dev/null http://localhost:1182/20m.txt&
curl --limit-rate 800k -v -o /dev/null http://localhost:1182/20m.txt&

At least one will fail with bytes remaining, and you will see the error in
the error.log.

2013/10/18 19:56:50 [error] 14667#0: *4 SSL_read() failed (SSL:
error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record
mac) while sending to client, client: 127.0.0.1, server: -, request: "GET
/20m.bin HTTP/1.1", upstream: "https://127.0.0.1:1183/20m.bin", host:
"localhost:1182"

thanks for your attention.

Post by Maxim Dounin
Hello!

Post by Agent Coulson
I am able to reproduce the following error when I have nginx configured
with an upstream https connection. I have tweaked various settings all

Post by Agent Coulson
no avail (proxy_buffer_size, proxy_buffers, proxy_ssl_session_reuse).
2013/10/18 17:17:31 [debug] 15644#0: *39 SSL_read: -1, SSL_pending: 16384
2013/10/18 17:17:31 [debug] 15644#0: *39 SSL_get_error: 1
error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad

record

Post by Agent Coulson
mac) while sending to client, client: 127.0.0.1, server: -, request: "GET
"localhost:1182"

I tend to think it's highly unlikely it's a problem in nginx.
Most likely, it's a problem either in OpenSSL library used on
nginx side, or in SSL implementation used on a backend.
First thing I would recommend to test is to make sure you are able
1. Using nginx statically compiled with a known version of the
OpenSSL library (--with-openssl=..., with sources from
openssl.org).
2. Using the same nginx as a backend.
[...]

Post by Agent Coulson
I've seen a bug report on this too (

http://trac.nginx.org/nginx/ticket/215),

Post by Agent Coulson
so thought i would send this here to see if anyone else is actively

working

Post by Agent Coulson
on the issue.

As of now, no one provided enough steps to reproduce the problem.
And, see above, most likely the problem is not in nginx.
--
Maxim Dounin
http://nginx.org/en/donation.html
_______________________________________________
nginx-devel mailing list
nginx-devel at nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20131018/58a6a3e5/attachment-0001.html>

Maxim Dounin

2013-10-19 12:58:47 UTC

Permalink

Hello!

Post by Agent Coulson
Yes, I am able to reproduce this talking to the same nginx as an upstream,
here is my new config. To reproduce, create a file in the root which is
several Mb, i used 20Mb, and issus multiple simultaneous curl's to the
object, i found rate-limiting my curl is the best way to repro. This
suggests there is some problem when we have to buffer. I'm skeptical that
this is an openssl issue as I have used multiple different openssl versions
and still run into this. However for completeness, I've reprod with
openssl sources from openssl.org (openssl-1.0.1e) as you suggested.

--
Maxim Dounin
http://nginx.org/en/donation.html

Agent Coulson

2013-10-21 17:50:31 UTC

Permalink

Hi!

thanks for that input, I have done some debugging and examined the SSL
context when this state arrises. Two SSL* structs (from different
connections) point to the same packet data. Disabling the read_ahead flag
mitigates this.

I've attached a patch, after applying I was unable to repro using
openssl-1.0.1e.

I'll submit a report to the upstream openssl project.

thanks!

Post by Maxim Dounin
Hello!

Post by Agent Coulson
Yes, I am able to reproduce this talking to the same nginx as an

upstream,

Post by Agent Coulson
here is my new config. To reproduce, create a file in the root which is
several Mb, i used 20Mb, and issus multiple simultaneous curl's to the
object, i found rate-limiting my curl is the best way to repro. This
suggests there is some problem when we have to buffer. I'm skeptical

that

Post by Agent Coulson
this is an openssl issue as I have used multiple different openssl

versions

Post by Agent Coulson
and still run into this. However for completeness, I've reprod with
openssl sources from openssl.org (openssl-1.0.1e) as you suggested.

[...]
Ok, I was able to reproduce this.
Looks like a regression in OpenSSL 1.0.0+. I'm able to reporduce
the problem with OpenSSL 1.0.0 and more recent versions, including
recent git snapshot, but everything is fine with OpenSSL 0.9.8y
and previous versions.
Bisection on OpenSSL 1.0.0 branch may be a helpful to trace the
exact cause.
--
Maxim Dounin
http://nginx.org/en/donation.html
_______________________________________________
nginx-devel mailing list
nginx-devel at nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20131021/bb967c54/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.patch
Type: application/octet-stream
Size: 978 bytes
Desc: not available
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20131021/bb967c54/attachment.obj>

Maxim Dounin

2013-10-21 19:49:37 UTC

Permalink

Hello!

Post by Agent Coulson
Hi!
thanks for that input, I have done some debugging and examined the SSL
context when this state arrises. Two SSL* structs (from different
connections) point to the same packet data. Disabling the read_ahead flag
mitigates this.
I've attached a patch, after applying I was unable to repro using
openssl-1.0.1e.
I'll submit a report to the upstream openssl project.

--
Maxim Dounin
http://nginx.org/en/donation.html

Agent Coulson

2013-10-21 20:00:32 UTC

Permalink

I think i would go with Piotr's suggestion, I had only started to dig into
the code, so hadn't got as far as he did (I took the weekend off)!

thanks again

Post by Maxim Dounin
Hello!

flag

Post by Agent Coulson
mitigates this.
I've attached a patch, after applying I was unable to repro using
openssl-1.0.1e.
I'll submit a report to the upstream openssl project.

Disabling the read_ahead as a workaround looks wrong for me.
While it probably reduces a chance for a problem to appear, it's
likely still there.
Have you tried looking into the OpenSSL code to find out what
causes the actual problem?
I think it's likely SSL_MODE_RELEASE_BUFFERS related (and I indeed
can't reproduce the error without SSL_MODE_RELEASE_BUFFERS set),
but I don't see any obvious problems in the code.
--
Maxim Dounin
http://nginx.org/en/donation.html
_______________________________________________
nginx-devel mailing list
nginx-devel at nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20131021/0819b1c5/attachment.html>

Piotr Sikora

2013-10-21 18:57:34 UTC

Permalink

Hey,

Post by Maxim Dounin
Looks like a regression in OpenSSL 1.0.0+. I'm able to reporduce
the problem with OpenSSL 1.0.0 and more recent versions, including
recent git snapshot, but everything is fine with OpenSSL 0.9.8y
and previous versions.
Bisection on OpenSSL 1.0.0 branch may be a helpful to trace the
exact cause.

I've looked a bit into this over the weekend and it seems that it's
being triggered by use of both: reading ahead and releasing buffers
(introduced in OpenSSL-1.0.0, hence the regression) on the client side
with upstream buffering off (I wasn't able to reproduce it with
upstream buffering on, but that might be just because it's harder to
trigger, as OpenSSL code path is effectively the same in both cases).

I don't think that we're affected on the server side (which would
actually suggest nginx bug), so the work-around for the issue (at
least for the time being) is to stop releasing buffers when nginx acts
as a client. I'm a bit tempted to do it only for the case with
buffering turned off, but from looking at the code I can't tell why it
would make a difference.

Patch attached.

Best regards,
Piotr Sikora

# HG changeset patch
# User Piotr Sikora <piotr at cloudflare.com>
# Date 1382381412 25200
# Mon Oct 21 11:50:12 2013 -0700
# Node ID 353b7360da138542933e01686f359ed72e7930d1
# Parent e6a1623f87bc96d5ec62b6d77356aa47dbc60756
SSL: release OpenSSL buffers only when acting as a server.

Signed-off-by: Piotr Sikora <piotr at cloudflare.com>

diff -r e6a1623f87bc -r 353b7360da13 src/event/ngx_event_openssl.c
--- a/src/event/ngx_event_openssl.c Mon Oct 21 18:20:32 2013 +0800
+++ b/src/event/ngx_event_openssl.c Mon Oct 21 11:50:12 2013 -0700
@@ -237,10 +237,6 @@ ngx_ssl_create(ngx_ssl_t *ssl, ngx_uint_
SSL_CTX_set_options(ssl->ctx, SSL_OP_NO_COMPRESSION);
#endif

-#ifdef SSL_MODE_RELEASE_BUFFERS
- SSL_CTX_set_mode(ssl->ctx, SSL_MODE_RELEASE_BUFFERS);
-#endif
-
SSL_CTX_set_read_ahead(ssl->ctx, 1);

SSL_CTX_set_info_callback(ssl->ctx, ngx_ssl_info_callback);
diff -r e6a1623f87bc -r 353b7360da13 src/http/modules/ngx_http_ssl_module.c
--- a/src/http/modules/ngx_http_ssl_module.c Mon Oct 21 18:20:32 2013 +0800
+++ b/src/http/modules/ngx_http_ssl_module.c Mon Oct 21 11:50:12 2013 -0700
@@ -528,6 +528,10 @@ ngx_http_ssl_merge_srv_conf(ngx_conf_t *
return NGX_CONF_ERROR;
}

+#ifdef SSL_MODE_RELEASE_BUFFERS
+ SSL_CTX_set_mode(conf->ssl.ctx, SSL_MODE_RELEASE_BUFFERS);
+#endif
+
#ifdef SSL_CTRL_SET_TLSEXT_HOSTNAME

if (SSL_CTX_set_tlsext_servername_callback(conf->ssl.ctx,
diff -r e6a1623f87bc -r 353b7360da13 src/mail/ngx_mail_ssl_module.c
--- a/src/mail/ngx_mail_ssl_module.c Mon Oct 21 18:20:32 2013 +0800
+++ b/src/mail/ngx_mail_ssl_module.c Mon Oct 21 11:50:12 2013 -0700
@@ -285,6 +285,10 @@ ngx_mail_ssl_merge_conf(ngx_conf_t *cf,
return NGX_CONF_ERROR;
}

+#ifdef SSL_MODE_RELEASE_BUFFERS
+ SSL_CTX_set_mode(conf->ssl.ctx, SSL_MODE_RELEASE_BUFFERS);
+#endif
+
cln = ngx_pool_cleanup_add(cf->pool, 0);
if (cln == NULL) {
return NGX_CONF_ERROR;

Maxim Dounin

2013-10-21 20:00:36 UTC

Permalink

Hello!

Post by Piotr Sikora
Hey,

While I tend to think that the problem is indeed related to
SSL_MODE_RELEASE_BUFFERS I don't see any reasons why the server
side shouldn't be affected. Could you please point out why you
think so?

In any case I don't think we should commit any workarounds before
the problem is at least understood. Trivial mitigation for the
errors observed so far would be to switch proxy_buffering back to
on, as by default, and/or use larger buffers.

--
Maxim Dounin
http://nginx.org/en/donation.html

Piotr Sikora

2013-10-23 21:26:41 UTC

Permalink

Hey Maxim,

Post by Maxim Dounin
While I tend to think that the problem is indeed related to
SSL_MODE_RELEASE_BUFFERS I don't see any reasons why the server
side shouldn't be affected. Could you please point out why you
think so?

Well, I don't see this from the code, so it's just a hunch, but:
- I wasn't able to reproduce it on the server side with big uploads,
- I wasn't able to reproduce it on the client side with buffering on,
- I was able to consistently reproduce it on the client side with buffering off,
- I did a fast scan on some of our production logs and I see those
errors only for content that would be transferred with proxy buffering
off,
- I think that we would see much more complains if this was happening
on the server side or on the client side with default settings
(buffering on).

I know this isn't very scientific, but those are the facts.

Note: I didn't play around with WebSockets... They are effectively
unbuffered, so they might be triggering this issue on the server side.

Post by Maxim Dounin
In any case I don't think we should commit any workarounds before
the problem is at least understood.

Agreed. I should be more clear that my patch was more of a temporary
work-around that a proper solution.

Post by Maxim Dounin
Trivial mitigation for the
errors observed so far would be to switch proxy_buffering back to
on, as by default, and/or use larger buffers.

Surprisingly, at least on OSX, larger buffers are part of the problem.
Small buffers (4k/8k) work fine, but the moment I increase the buffer
size (16k+) the error is triggered right away.

Best regards,
Piotr Sikora

Maxim Dounin

2013-10-23 21:46:06 UTC

Permalink

Hello!

Post by Piotr Sikora
Hey Maxim,

- I wasn't able to reproduce it on the server side with big uploads,
- I wasn't able to reproduce it on the client side with buffering on,
- I was able to consistently reproduce it on the client side with buffering off,
- I did a fast scan on some of our production logs and I see those
errors only for content that would be transferred with proxy buffering
off,
- I think that we would see much more complains if this was happening
on the server side or on the client side with default settings
(buffering on).
I know this isn't very scientific, but those are the facts.
Note: I didn't play around with WebSockets... They are effectively
unbuffered, so they might be triggering this issue on the server side.

As far as I understand, the problem happens if for some reason
nginx isn't able to read all the data OpenSSL read from a socket,
i.e. if some data are left in the OpenSSL read buffers. (And of
course it only happens if OpenSSL uses the same buffers for
multiple connections.)

This is not something impossible on the server side - but likely
much less common than with proxy_buffering set to off. It can
happen e.g. with pipelined requests, or if a request with a body
is delayed with limit_req.

--
Maxim Dounin
http://nginx.org/en/donation.html