tcp write error broken pipe

2025-10-09 18:52:00
丁国栋
原创 19
摘要:本文记录和解释为什么发生TCP write error broken pipe这样的错误。

今天在我们的一个由frankenphp(Caddy + PHP)支撑的服务访问比较慢,但一小段时间后又自己好了。它是运行在 Kubernetes中的,看服务业务日志没发现报错,看Pod日志发现以下报错:


# kubectl logs -f --tail 10 -n quickon-system zentaopaas-frankenphp-5545d648c6-rvlnq
{"level":"warn","ts":1759973384.78088,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:48146: write: broken pipe"}
{"level":"warn","ts":1759973517.6707053,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:25396: write: broken pipe"}
{"level":"info","ts":1759973625.34224,"logger":"tls","msg":"storage cleaning happened too recently; skipping for now","storage":"FileStorage:/data/caddy","instance":"25237dcd-61d2-4ae3-b9d9-5c968a7ef5b6","try_again":1760060025.342238,"try_again_in":86399.999999672}
{"level":"info","ts":1759973625.3431087,"logger":"tls","msg":"finished cleaning storage units"}
{"level":"warn","ts":1759973644.7477026,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.1.0:50412: write: broken pipe"}
{"level":"warn","ts":1759973702.706567,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:32784: write: broken pipe"}
{"level":"warn","ts":1759973781.1995344,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:61128: write: broken pipe"}
{"level":"warn","ts":1759973810.600418,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:28986: write: broken pipe"}
{"level":"warn","ts":1759973870.5739777,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:48154: write: broken pipe"}
{"level":"warn","ts":1759973870.5742023,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:64440: write: broken pipe"}
在上面的日志中,10.42.8.201 IP是Pod的IP,10.42.2.0和10.42.1.0都是集群内节点 flannel.1 网卡的IP,而 Nginx Ingress 的Pod也是在这两个节点上,他们都是 Pod CIDR中的IP。


这个错误看起来是从frankenphp到Nginx Ingress之间或是到节点之间出现了问题。

通过 https://github.com/caddyserver/caddy/issues/6000 看,mohammed90 解释称“ This is harmless, as you're experiencing. This usually happens when the client disconnects without completing the handshake and the necessary components of a proper HTTP connection. ”意思是说 这是无害的,通常发生在建立好连接之前客户端就断开了。他还提到“They should be concerning to you because they mean the client is not closing the connection properly/cleanly. It's harmless in in smaller numbers as it can mean the client closed the webpage/app in the middle of the loading request. However, in large numbers this is a form of an attack to exhaust your server. It's good to know about it.”意思是说这些错误被记录下来和被注意到是有意义的,特别是当大量出现时,如果出现较少通常没有关系可以忽略。

现在回头看这个服务慢的问题,还是得用分段、分层排除法


  1. 当访问一个服务慢的时候,Nginx Ingress代理的其他服务是否也慢
  2. 绕过Nginx Ingress直接在服务的 Pod 内访问是否也慢
  3. 网络方面的问题也可以抓包看看

其他改进或需要注意:


  1. Pod是否设置了TCP探针、HTTP探针
  2. 监控响应速度
  3. 良好的服务日志设计



--

发表评论
博客分类