tcp write error broken pipe
- 2025-10-09 18:52:00
- 丁国栋
- 原创 19
今天在我们的一个由frankenphp(Caddy + PHP)支撑的服务访问比较慢,但一小段时间后又自己好了。它是运行在 Kubernetes中的,看服务业务日志没发现报错,看Pod日志发现以下报错:
# kubectl logs -f --tail 10 -n quickon-system zentaopaas-frankenphp-5545d648c6-rvlnq {"level":"warn","ts":1759973384.78088,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:48146: write: broken pipe"} {"level":"warn","ts":1759973517.6707053,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:25396: write: broken pipe"} {"level":"info","ts":1759973625.34224,"logger":"tls","msg":"storage cleaning happened too recently; skipping for now","storage":"FileStorage:/data/caddy","instance":"25237dcd-61d2-4ae3-b9d9-5c968a7ef5b6","try_again":1760060025.342238,"try_again_in":86399.999999672} {"level":"info","ts":1759973625.3431087,"logger":"tls","msg":"finished cleaning storage units"} {"level":"warn","ts":1759973644.7477026,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.1.0:50412: write: broken pipe"} {"level":"warn","ts":1759973702.706567,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:32784: write: broken pipe"} {"level":"warn","ts":1759973781.1995344,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:61128: write: broken pipe"} {"level":"warn","ts":1759973810.600418,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:28986: write: broken pipe"} {"level":"warn","ts":1759973870.5739777,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:48154: write: broken pipe"} {"level":"warn","ts":1759973870.5742023,"logger":"frankenphp","msg":"write error","error":"write tcp 10.42.8.201:80->10.42.2.0:64440: write: broken pipe"}在上面的日志中,10.42.8.201 IP是Pod的IP,10.42.2.0和10.42.1.0都是集群内节点 flannel.1 网卡的IP,而 Nginx Ingress 的Pod也是在这两个节点上,他们都是 Pod CIDR中的IP。
这个错误看起来是从frankenphp到Nginx Ingress之间或是到节点之间出现了问题。
通过 https://github.com/caddyserver/caddy/issues/6000 看,mohammed90 解释称“ This is harmless, as you're experiencing. This usually happens when the client disconnects without completing the handshake and the necessary components of a proper HTTP connection. ”意思是说 这是无害的,通常发生在建立好连接之前客户端就断开了。他还提到“They should be concerning to you because they mean the client is not closing the connection properly/cleanly. It's harmless in in smaller numbers as it can mean the client closed the webpage/app in the middle of the loading request. However, in large numbers this is a form of an attack to exhaust your server. It's good to know about it.”意思是说这些错误被记录下来和被注意到是有意义的,特别是当大量出现时,如果出现较少通常没有关系可以忽略。
现在回头看这个服务慢的问题,还是得用分段、分层排除法:
- 当访问一个服务慢的时候,Nginx Ingress代理的其他服务是否也慢
- 绕过Nginx Ingress直接在服务的 Pod 内访问是否也慢
- 网络方面的问题也可以抓包看看
其他改进或需要注意:
- Pod是否设置了TCP探针、HTTP探针
- 监控响应速度
- 良好的服务日志设计
--