知识问答

tengine 负载均衡检测模块有时候回有问题

huopen commented on 13 Jan 2014Tengine 2.0的ngx_http_upstream_check_module如果后台是weblogic 11g，会经常出现错误。错误提示如下：2014/01/13 14:37:36 [error] 17793#0: http parse stat...

huopen commented on 13 Jan 2014

Tengine 2.0的ngx_http_upstream_check_module如果后台是weblogic 11g，会经常出现错误。错误提示如下：
2014/01/13 14:37:36 [error] 17793#0: http parse status line error with peer: 210.xxx.xx.145:8089
2014/01/13 14:37:36 [error] 17793#0: check protocol http error with peer: 210.xxx.xx.145:8089
错误表现是：每检查几次成功后就出现1、2次错误，生成很多错误提示。如果后台是jbss4.2就没有这种错误。
后来我退回Tengine 1.52就正常了，没再出现错误，麻烦检查一下是否有bug！

配置文件如下：
check interval=2000 rise=2 fall=3 timeout=30000 type=http;
check_http_send "GET / HTTP/1.0\r\n\r\n";
check_http_expect_alive http_2xx http_3xx;

Contributorlilbedwin commented on 13 Jan 2014

tengine-2.0的后端健康检测，默认采用长连接，所以要在请求中增加keep-alive头，如下：
check_http_send "GET / HTTP/1.0\r\nconnection: keep-alive\r\n\r\n";
建立长连接后，默认每个连接发100的请求，然后关闭该连接，重新打开。

如果仍然想采用短连接，请增加一项配置，设置每个连接只发一个请求，如下：
check_keepalive_requests 1;
check_http_send "GET / HTTP/1.0\r\nconnection: keep-alive\r\n\r\n";

huopen commented on 13 Jan 2014

刚刚测试了一下使用短连接没问题。使用长连接仍然出现同样错误，不知道是不是我的服务器前后端之间有防火墙。

huopen commented on 13 Jan 2014

为什么后端是jboss无论长连接还是短连接使用起来一点问题都没有？但是weblogic就只能使用短连接呢？

Contributorlilbedwin commented on 14 Jan 2014

我们正在排查原因，有结果会立即给你答复，谢谢反馈！

Contributorlilbedwin commented on 15 Jan 2014

huopen，你好，问题已经基本定位：
长连接时，前一次check的response没有收完，后一次check时从socket里读取的前一次的剩余数据，被当做response header解析，造成出错。
想快速解决问题，可以按我之前发的配置，采用短连接。
我们也会尽快发布新的tengine版本，fix这个bug。

huopen commented on 15 Jan 2014

谢谢了。

Membercfsego commented on 21 Jan 2014

你好，huopen

最新的修改我们本地已经测试通过。但我们原版的问题现象和你提供的在细节方面有所不同，所以看你能不能帮我们在你的环境测试一下。修改我们已经提取出来一个patch，见https://github.com/cfsego/tengine-patches/blob/master/tengine-hc-keepalive.patch

谢谢

Contributorhigkoo commented on 28 Jan 2014

我在生产环境的Tengine集群里，升级了1台到2.0版本。
结果和楼主有点像：部分后端被识别为down，导致服务502。当天刚好遇到全网dns故障，压力很大，没来得及查就把Tengine退回老版本了。
请问现在已知的问题是？

Memberyaoweibin commented on 28 Jan 2014

新版的tengine为了降低健康检查连接的建立，把健康检查的连接从短连接变成了长连接。不过出现问题的特征是，当健康检查请求的响应数据很多，前一次健康检查数据没有读完，导致后一次健康检查数据不对，出现问题。

现在这个问题在开发分支已经修复了，年后会发布新版本。

Membercfsego commented on 27 Feb 2014

楼主消失了，关楼

cfsego closed this on 27 Feb 2014

Contributorhigkoo commented on 5 Jun 2014

嗯，2.0.1 就恢复了。

jiacheo commented on 22 Aug 2015

@cfsego 这个问题在

Tengine version: Tengine/2.1.0 (nginx/1.6.2)
这个版本上还是存在，而且加了
check_keepalive_requests 1;
也没有解决问题。
确定不是防火墙问题，因为我有一台机是本地的。
我的配置如下：

 upstream log-service {
        server localhost:8101;
        server 10.173.83.215:8101;
        check interval=1000 rise=2 fall=2 timeout=1000 type=http;
        check_keepalive_requests 1;
        check_http_send "GET /status.stat HTTP/1.1\r\n\r\n";
        check_http_expect_alive http_2xx http_3xx;
    }

部分错误日志如下：

2015/08/22 11:19:10 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101
2015/08/22 11:19:10 [error] 14387#0: check protocol http error with peer: 10.173.83.215:8101
2015/08/22 11:19:11 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101
2015/08/22 11:19:12 [error] 14387#0: check protocol http error with peer: 10.173.83.215:8101
2015/08/22 11:19:12 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101
2015/08/22 11:19:13 [error] 14387#0: check protocol http error with peer: 10.173.83.215:8101
2015/08/22 11:19:13 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101
2015/08/22 11:19:14 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101
2015/08/22 11:19:15 [error] 14387#0: check protocol http error with peer: 10.173.83.215:8101
2015/08/22 11:19:15 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101
2015/08/22 11:19:16 [error] 14387#0: check protocol http error with peer: 10.173.83.215:8101
2015/08/22 11:19:16 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101
2015/08/22 11:19:17 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101
2015/08/22 11:19:18 [error] 14387#0: check protocol http error with peer: 10.173.83.215:8101
2015/08/22 11:19:18 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101
2015/08/22 11:19:19 [error] 14387#0: check protocol http error with peer: 10.173.83.215:8101
2015/08/22 11:19:19 [error] 14387#0: check protocol http error with peer: 127.0.0.1:8101

我去curl这些链接都是没问题的

jiacheo commented on 22 Aug 2015

好吧，问题出在 HTTP/1.1，改成HTTP/1.0 就没问题了。。之所以有HTTP/1.1，是因为我其他机器写的就是HTTP/1.1 。。。

Contributoryujinqiu commented on 14 Apr 2017

@jiacheo 如果你用 HTTP/1.1 的话, 你需要指定 Host, 类似下面的

check_http_send "GET / HTTP/1.0\r\nconnection: keep-alive\r\nHost: foo.bar.com\r\n\r\n";

Refer RFC rfc2616#section-14.23
A client MUST include a Host header field in all HTTP/1.1 request messages .

HTTP1.0 只能是短连接, 并且不支持多 Host, 所以没有问题.

发表于 2018-08-31 00:31
阅读 ( 49 )

tengine 负载均衡检测模块有时候回有问题

你可能感兴趣的文章

相关问题

0 条评论

作家榜 »