处理一例 Pod Pending 的问题
- 2025-11-25 19:31:00
- 丁国栋
- 原创 9
今天日常巡检发现有两个新创建的Pod无法分配到节点。提示:0/12 nodes are available: 5 Insufficient cpu, 7 node(s) didn''t match Pod''s node affinity.
Pod的节点选择器标签对应的节点资源是非常充裕的。
Pod 的资源定义是:
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 500m
memory: 256Mi
而 kubectl top node -l name=value 显示:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% 10.8.0.15 413m 2% 6522Mi 23% 10.8.0.4 447m 2% 7661Mi 27% 10.8.0.41 625m 4% 8365Mi 30% 10.8.0.49 548m 3% 8070Mi 29% 10.8.0.9 811m 5% 7435Mi 26%通过命令
kubectl describe node -l name=value |grep 'Allocated resources' -A 12 检查节点的资源分配情况,发现 Requests 都 99% 了。
kubectl describe node -l name=value |grep 'Allocated resources' -A 12 Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 15552m (99%) 35220m (225%) memory 14625692160 (50%) 27878641152 (96%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) tke.cloud.tencent.com/eip 0 0 tke.cloud.tencent.com/eni-ip 1 1 Events: <none> -- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 15502m (99%) 38220m (245%) memory 17310046720 (59%) 33784221184 (116%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) tke.cloud.tencent.com/eip 0 0 tke.cloud.tencent.com/eni-ip 1 1 Events: <none> -- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 15502m (99%) 32320m (207%) memory 15967869440 (55%) 28683947520 (99%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) tke.cloud.tencent.com/eip 0 0 tke.cloud.tencent.com/eni-ip 1 1 Events: <none> -- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 15552m (99%) 39420m (252%) memory 17146468864 (59%) 34857963008 (120%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) tke.cloud.tencent.com/eip 0 0 tke.cloud.tencent.com/eni-ip 1 1 Events: <none> -- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 15552m (99%) 38620m (247%) memory 14059461120 (48%) 30026124800 (103%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) tke.cloud.tencent.com/eip 0 0 tke.cloud.tencent.com/eni-ip 0 0 Events: <none>
第一感觉就是已有的 Pod 的 Request 设置的太大了。再去改已有的 Pod 的 Request又不太现实,添加新的节点需要新购资源也不好,为了临时解决这个问题,我把其他的资源富余的节点打上了相同的标签,让Pod调度到该节点上。
---
发表评论