首先,我想清楚地了解一些事情,如果我在kubernetes集群中运行telegraf守护程序,它将收集pods的度量?或者它将收集物理节点的度量?
我在我的测试kubernetes集群中创建了一个telegraf守护程序,它在我的笔记本电脑hyperv下运行,基于这个kubernetes集群安装:
我想收集豆荚的指标,但它没有到达Kafka机器。我在日志中看到这个错误:
2019-05-08T02:36:35Z I! Starting Telegraf 1.9.2
2019-05-08T02:36:35Z I! Using config file: /etc/telegraf/telegraf.conf
2019-05-08T02:46:36Z E! [agent] Failed to connect to output kafka, retrying in 15s, error was 'kafka: client has run out of available brokers to talk to (Is your cluster reachable?)'
这是守护程序集定义文件:
apiVersion: v1
kind: ConfigMap
metadata:
name: telegraf
namespace: monitoring
labels:
k8s-app: telegraf
data:
telegraf.conf: |+
[global_tags]
env = "$ENV"
[agent]
hostname = "$HOSTNAME"
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "2s"
precision = ""
debug = false
quiet = true
logfile = ""
[[outputs.kafka]]
brokers = ["10.121.63.5:9092", "10.121.63.18:9092", "10.121.62.64:9092", "10.121.62.80:9092", "10.121.63.22:9092"]
topic = "telegraf-measurements-json"
client_id = "golangsarama__1.18.0__serverinfra__telegraf"
routing_tag = "host"
version = "0.11.0.2"
compression_codec = 2
required_acks = 1
data_format = "json"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
[[inputs.kubernetes]]
url = "https://192.168.213.18:6443"
insecure_skip_verify = true
---
# Section: Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: telegraf
namespace: monitoring
labels:
k8s-app: telegraf
spec:
selector:
matchLabels:
name: telegraf
template:
metadata:
labels:
name: telegraf
spec:
containers:
- name: telegraf
image: docker.io/telegraf:1.9.2
resources:
limits:
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: "HOST_PROC"
value: "/rootfs/proc"
- name: "HOST_SYS"
value: "/rootfs/sys"
- name: ENV
valueFrom:
secretKeyRef:
name: telegraf
key: env
volumeMounts:
- name: sys
mountPath: /rootfs/sys
readOnly: true
- name: proc
mountPath: /rootfs/proc
readOnly: true
- name: docker-socket
mountPath: /var/run/docker.sock
- name: utmp
mountPath: /var/run/utmp
readOnly: true
- name: config
mountPath: /etc/telegraf
terminationGracePeriodSeconds: 30
volumes:
- name: sys
hostPath:
path: /sys
- name: docker-socket
hostPath:
path: /var/run/docker.sock
- name: proc
hostPath:
path: /proc
- name: utmp
hostPath:
path: /var/run/utmp
- name: config
configMap:
name: telegraf
这是我创建守护程序的文章。
这是豆荚:
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-65f88748fd-jztrz 1/1 Running 0 7d18h
kube-system coredns-fb8b8dccf-rl48l 1/1 Running 0 7d18h
kube-system coredns-fb8b8dccf-x8fvx 1/1 Running 0 7d18h
kube-system etcd-k8s-master 1/1 Running 2 7d18h
kube-system kube-apiserver-k8s-master 1/1 Running 2 7d18h
kube-system kube-controller-manager-k8s-master 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-96tsl 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-b884r 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-pdqmq 1/1 Running 0 7d18h
kube-system kube-proxy-42k2g 1/1 Running 0 7d18h
kube-system kube-proxy-77pw9 1/1 Running 0 7d18h
kube-system kube-proxy-n5mbs 1/1 Running 0 7d18h
kube-system kube-scheduler-k8s-master 1/1 Running 2 7d18h
monitoring telegraf-dvtcl 1/1 Running 5 117m
monitoring telegraf-n2mqz 1/1 Running 5 117m
tcpdump显示从守护程序发送的内容:
09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......
y?...#..?5]......n(._.........
z#0........................
09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......
y?...#..?5]......n(._.........
但我在我们的grafana Jmeter 盘上什么也看不到。如果我在节点上安装一个独立的基于rpm的telegraf,它就会发出,我可以看到度量。但我对pod指标很好奇。
1条答案
按热度按时间8yparm6h1#
来自telegraf的这个错误仅仅意味着没有连接到配置中的代理数组中的10类ip代理范围。取决于你如何设置网络和路由,你可能只是有一个简单的路由问题,以那些私人IP拥有你的Kafka集群。