当fluent-bit尝试与Kubernetes中的fluentd通信时,“[error] [upstream] connection timeout after 10 seconds”([错误] [上游]连接在10秒后超时)失败

piv4azn7  于 2022-11-02  发布在  Kubernetes
关注(0)|答案(2)|浏览(273)

我正在使用fluent-bit来收集日志并将其传递给fluentd以便在Kubernetes环境中进行处理。Fluent-bit示例由DaemonSet控制并从Docker容器中读取日志。

[INPUT]
      Name tail
      Path /var/log/containers/*.log
      Parser docker
      Tag kube.*
      Mem_Buf_Limit 5MB
      Skip_Long_Lines On

还有一个fluent-bit服务也在运行

Name:              monitoring-fluent-bit-dips
Namespace:         dips
Labels:            app.kubernetes.io/instance=monitoring
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=fluent-bit-dips
                   app.kubernetes.io/version=1.8.10
                   helm.sh/chart=fluent-bit-0.19.6
Annotations:       meta.helm.sh/release-name: monitoring
                   meta.helm.sh/release-namespace: dips
Selector:          app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=fluent-bit-dips
Type:              ClusterIP
IP Families:       <none>
IP:                10.43.72.32
IPs:               <none>
Port:              http  2020/TCP
TargetPort:        http/TCP
Endpoints:         10.42.0.144:2020,10.42.1.155:2020,10.42.2.186:2020 + 1 more...
Session Affinity:  None
Events:            <none>

Fluentd服务描述如下

Name:              monitoring-logservice
Namespace:         dips
Labels:            app.kubernetes.io/instance=monitoring
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=logservice
                   app.kubernetes.io/version=1.9
                   helm.sh/chart=logservice-0.1.2
Annotations:       meta.helm.sh/release-name: monitoring
                   meta.helm.sh/release-namespace: dips
Selector:          app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=logservice
Type:              ClusterIP
IP Families:       <none>
IP:                10.43.44.254
IPs:               <none>
Port:              http  24224/TCP
TargetPort:        http/TCP
Endpoints:         10.42.0.143:24224
Session Affinity:  None
Events:            <none>

但fluent-bit日志未到达fluentd,并出现以下错误

[error] [upstream] connection #81 to monitoring-fluent-bit-dips:24224 timed out after 10 seconds

我尝试了几种方法,比如:

  • 重新部署流体钻头荚
  • 重新部署Fluentd pod
  • 将fluent-bit版本从1.7.3升级到1.8.10

这是一个Kubernetes环境,在这个环境中,fluent-bit能够在部署的早期阶段与fluentd进行通信。除此之外,当我在本地部署docker-desktop环境时,这个相同的fluent版本也能正常工作。
我的猜测是

  • fluent-bit无法管理日志进程的数量
  • 重新启动服务后,fluent服务将无法通信

任何人有任何经验,在这方面或有任何想法如何调试这个问题更深入?
更新了以下Fluentd运行pod描述

Name:         monitoring-logservice-5b8864ffd8-gfpzc
Namespace:    dips
Priority:     0
Node:         sl-sy-k3s-01/10.16.1.99
Start Time:   Mon, 29 Nov 2021 13:09:13 +0530
Labels:       app.kubernetes.io/instance=monitoring
              app.kubernetes.io/name=logservice
              pod-template-hash=5b8864ffd8
Annotations:  kubectl.kubernetes.io/restartedAt: 2021-11-29T12:37:23+05:30
Status:       Running
IP:           10.42.0.143
IPs:
  IP:           10.42.0.143
Controlled By:  ReplicaSet/monitoring-logservice-5b8864ffd8
Containers:
  logservice:
    Container ID:   containerd://102483a7647fd2f10bead187eddf69aa4fad72051d6602dd171e1a373d4209d7
    Image:          our.private.repo/dips/logservice/splunk:1.9
    Image ID:       our.private.repo/dips/logservice/splunk@sha256:531f15f523a251b93dc8a25056f05c0c7bb428241531485a22b94896974e17e8
    Ports:          24231/TCP, 24224/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 29 Nov 2021 13:09:14 +0530
    Ready:          True
    Restart Count:  0
    Liveness:       exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOME_ENV_VARS
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from monitoring-logservice-token-g9kwt (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  monitoring-logservice-token-g9kwt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  monitoring-logservice-token-g9kwt
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>
sirbozc5

sirbozc51#

请尝试将指向fluentd服务的fluent-bit配置更改为monitoring-logservice。dips:24224

zzwlnbp8

zzwlnbp82#

https://docs.fluentbit.io/manual/pipeline/filters/kubernetes

filters: |
    [FILTER]
        Name kubernetes
        Match kube.*
        Kube_URL            https://kubernetes.default:443
        tls.verify Off

在我的问题,Kubernetes Apiserver ssl错误.

相关问题