regex 使用转义字符解析key=values的字符串

sirbozc5 于 2023-11-20 发布在其他

关注(0)|答案(1)|浏览(181)

Loki以结构为key1=value1 key2=value2的键值格式输出以下日志：

level=info ts=2023-10-20T14:30:48.716410806Z caller=metrics.go:159 component=frontend org_id=fake traceID=58290ebda8d79180 latency=fast query=\"sum by (level) (count_over_time({k8s_namespace=\\\"ingress-nginx\\\"} |= ``[1s]))\" query_hash=110010092 query_type=metric range_type=range length=15m0.001s start_delta=15m0.833402507s end_delta=832.40267ms step=1s duration=61.999532ms status=200 limit=1000 returned_lines=0 throughput=4.2MB total_bytes=260kB total_bytes_structured_metadata=0B lines_per_second=4209 total_lines=261 post_filter_lines=261 total_entries=1 store_chunks_download_time=0s queue_time=819.962996ms splits=2 shards=32 cache_chunk_req=0 cache_chunk_hit=0 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=0 cache_chunk_download_time=0s cache_index_req=0 cache_index_hit=0 cache_index_download_time=0s cache_stats_results_req=0 cache_stats_results_hit=0 cache_stats_results_download_time=0s cache_result_req=0 cache_result_hit=0 cache_result_download_time=0s source=logvolhist

字符串
在fluentd中，我尝试使用Labeled Tab-separated Values解析器解析此日志，将delimiter_pattern作为/\s+/，将label_delimiter作为=，并获得以下结果：

{
  "level": "info",
  "caller": "metrics.go:159",
  "component": "frontend",
  "org_id": "fake",
  "traceID": "58290ebda8d79180",
  "latency": "fast",
  "query": "\"sum",
  "(count_over_time({k8s_namespace": "\\\"ingress-nginx\\\"}",
  "|": "",
  "query_hash": "110010092",
  "query_type": "metric",
  "range_type": "range",
  ...
}

型
对于键query，这个解析器只能捕获第一个单词，并使用后面的空格作为另一个键值的空格。
我尝试了不同的RegEx表达式，和两个插件解析器（https://shihadeh.dev/ruby-gems/Key-ValueParser/和https://github.com/fluent-plugins-nursery/fluent-plugin-kv-parser），但到目前为止没有运气。
这是一个得到正确的正则表达式的问题，使用不同的解析器，试图取消转义字符或其他东西？

regex

来源：https://stackoverflow.com/questions/77331986/parse-string-of-key-values-with-escaped-characters