我正在使用fluent-bit来收集日志并将其传递给fluentd以便在Kubernetes环境中进行处理。Fluent-bit示例由DaemonSet控制并从Docker容器中读取日志。
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
还有一个fluent-bit服务也在运行
Name: monitoring-fluent-bit-dips
Namespace: dips
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=fluent-bit-dips
app.kubernetes.io/version=1.8.10
helm.sh/chart=fluent-bit-0.19.6
Annotations: meta.helm.sh/release-name: monitoring
meta.helm.sh/release-namespace: dips
Selector: app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=fluent-bit-dips
Type: ClusterIP
IP Families: <none>
IP: 10.43.72.32
IPs: <none>
Port: http 2020/TCP
TargetPort: http/TCP
Endpoints: 10.42.0.144:2020,10.42.1.155:2020,10.42.2.186:2020 + 1 more...
Session Affinity: None
Events: <none>
Fluentd服务描述如下
Name: monitoring-logservice
Namespace: dips
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=logservice
app.kubernetes.io/version=1.9
helm.sh/chart=logservice-0.1.2
Annotations: meta.helm.sh/release-name: monitoring
meta.helm.sh/release-namespace: dips
Selector: app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=logservice
Type: ClusterIP
IP Families: <none>
IP: 10.43.44.254
IPs: <none>
Port: http 24224/TCP
TargetPort: http/TCP
Endpoints: 10.42.0.143:24224
Session Affinity: None
Events: <none>
但fluent-bit日志未到达fluentd,并出现以下错误
[error] [upstream] connection #81 to monitoring-fluent-bit-dips:24224 timed out after 10 seconds
我尝试了几种方法,比如:
- 重新部署流体钻头荚
- 重新部署Fluentd pod
- 将fluent-bit版本从1.7.3升级到1.8.10
这是一个Kubernetes环境,在这个环境中,fluent-bit能够在部署的早期阶段与fluentd进行通信。除此之外,当我在本地部署docker-desktop环境时,这个相同的fluent版本也能正常工作。
我的猜测是
- fluent-bit无法管理日志进程的数量
- 重新启动服务后,fluent服务将无法通信
任何人有任何经验,在这方面或有任何想法如何调试这个问题更深入?
更新了以下Fluentd运行pod描述
Name: monitoring-logservice-5b8864ffd8-gfpzc
Namespace: dips
Priority: 0
Node: sl-sy-k3s-01/10.16.1.99
Start Time: Mon, 29 Nov 2021 13:09:13 +0530
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/name=logservice
pod-template-hash=5b8864ffd8
Annotations: kubectl.kubernetes.io/restartedAt: 2021-11-29T12:37:23+05:30
Status: Running
IP: 10.42.0.143
IPs:
IP: 10.42.0.143
Controlled By: ReplicaSet/monitoring-logservice-5b8864ffd8
Containers:
logservice:
Container ID: containerd://102483a7647fd2f10bead187eddf69aa4fad72051d6602dd171e1a373d4209d7
Image: our.private.repo/dips/logservice/splunk:1.9
Image ID: our.private.repo/dips/logservice/splunk@sha256:531f15f523a251b93dc8a25056f05c0c7bb428241531485a22b94896974e17e8
Ports: 24231/TCP, 24224/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Mon, 29 Nov 2021 13:09:14 +0530
Ready: True
Restart Count: 0
Liveness: exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOME_ENV_VARS
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from monitoring-logservice-token-g9kwt (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
monitoring-logservice-token-g9kwt:
Type: Secret (a volume populated by a Secret)
SecretName: monitoring-logservice-token-g9kwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
2条答案
按热度按时间sirbozc51#
请尝试将指向fluentd服务的fluent-bit配置更改为monitoring-logservice。dips:24224
zzwlnbp82#
https://docs.fluentbit.io/manual/pipeline/filters/kubernetes
在我的问题,Kubernetes Apiserver ssl错误.