json jq将多个文本文件拆分为多个数组

r6l8ljro  于 2023-08-08  发布在  其他
关注(0)|答案(5)|浏览(73)

我有一个问题,我有一些文件的内容有多个键值对,我想转换成多个数组。
让我用一些制作的例子来说明我的意思。首先是文件的内容:

# cat content/1.yaml
time: "2020-09-14T22:33:40Z"
id: ed1d4321
name: One
description: 'Here is number "one"
  this is good'

# cat content/2yaml
time: "2021-09-14T22:33:40Z"
id: eg134841
name: Two
description: 'Here is number "two"
  best of all'
newkey: value

字符串
在下一步中,我将这些文件合并到一个blob中,其中包含我想要保留的文件名:

# for file in $(ls content/*yaml); do echo filename: $file; cat $file; done
filename: content/1.yaml
time: "2020-09-14T22:33:40Z"
id: ed1d4321
name: One
description: 'Here is number "one"
  this is good'
filename: content/2yaml
time: "2021-09-14T22:33:40Z"
id: eg134841
name: Two
description: 'Here is number "two"
  best of all'
newkey: value


现在问题开始了,如何将这些集合到json数组中?
这就是我到目前为止的想法:

# for file in $(ls content/*yaml); do echo filename: $file; cat $file; done | jq -Rn '[inputs|split(": ")] | map({(.[0]): .[1]})'
[
  {
    "filename": "content/1.yaml"
  },
  {
    "time": "\"2020-09-14T22:33:40Z\""
  },
  {
    "id": "ed1d4321"
  },
  {
    "name": "One"
  },
  {
    "description": "'Here is number \"one\""
  },
  {
    "  this is good'": null
  },
  {
    "filename": "content/2yaml"
  },
  {
    "time": "\"2021-09-14T22:33:40Z\""
  },
  {
    "id": "eg134841"
  },
  {
    "name": "Two"
  },
  {
    "description": "'Here is number \"two\""
  },
  {
    "  best of all'": null
  },
  {
    "newkey": "value"
  }
]


这已经很接近了,但我仍然需要解决一些问题,我没有找到解决方案:
1.文件名不会分散到单独的数组中。

  1. time字段不应该有转义的带引号的字符串。我想有一个解决方案,在所有领域的迭代,并会扩大这些内容的引号像这里的例子"time": "2021-09-14T22:33:40Z"
  2. description值分布在多行中,我希望看到它们合并成一个值,但这不是目前发生的事情,所以应该是这样的:"description": "Here is number \"two\" best of all。单引号不应保留。
    所以最后的结果应该是这样的:
[
  {
    "filename": "content/1.yaml",
    "time": "2020-09-14T22:33:40Z",
    "id": "ed1d4321",
    "name": "One",
    "description": "Here is number \"one\"  this is good"
  },
  {
    "filename": "content/2yaml",
    "time": "2021-09-14T22:33:40Z",
    "id": "eg134841",
    "name": "Two",
    "description": "Here is number \"two\"  best of all",
    "newkey": "value"
  }
]

3ks5zfa0

3ks5zfa01#

下面的代码一次处理一个文件,并假定使用-R和-s命令行选项(jq -Rs)调用jq。将多个文件的结果组合起来作为练习。(提示:对于文件名,使用input_filename

def objectify:
    capture("(?<key>[^:]+): *(?<value>.*)")
    | .value = (.value | (fromjson? // .))
    | [.]
    | from_entries;

  gsub("\n  *"; " ")        # join dangling text
  | . / "\n"                # split
  | map(select(length>0))   # ignore ""
  | map(objectify)          # {key, value}
  | add

字符串

mfuanj7w

mfuanj7w2#

这是一个部分解决方案--值还没有被“清理”。这是留给读者的摘录:-)
jq --slurp --raw-input开始:

# split lines
split("\n")
# join lines starting with whitespace with previous line
| reduce .[] as $l (
    null;
    if $l | startswith(" ") then .[-1] += $l else . += [$l] end
)
# split on first colon, returning an array of objects like {key: X, value: Y}
| map(capture("^(?<key>[^:]+):\\s*(?<value>.*)$"))
# combine these simple objects into bigger objects but begin a new objects when encountering "filename"
| reduce .[] as $e (null; 
    if $e.key == "filename" then . += [{}] else . end
    | .[-1][$e.key] = $e.value
)

字符串
输出如下:

[
  {
    "filename": "content/1.yaml",
    "time": "\"2020-09-14T22:33:40Z\"",
    "id": "ed1d4321",
    "name": "One",
    "description": "'Here is number \"one\"  this is good'"
  },
  {
    "filename": "content/2yaml",
    "time": "\"2021-09-14T22:33:40Z\"",
    "id": "eg134841",
    "name": "Two",
    "description": "'Here is number \"two\"  best of all'",
    "newkey": "value"
  }
]

sbdsn5lh

sbdsn5lh3#

这可能更适合yq,而不是尝试重新实现YAML解析器。
这样的东西会起作用:

yq eval-all -o=json '[{"filename": filename} + .]' *.yaml

字符串
导致

[
  {
    "filename": "1.yaml",
    "time": "2020-09-14T22:33:40Z",
    "id": "ed1d4321",
    "name": "One",
    "description": "Here is number \"one\" this is good"
  },
  {
    "filename": "2.yaml",
    "time": "2021-09-14T22:33:40Z",
    "id": "eg134841",
    "name": "Two",
    "description": "Here is number \"two\" best of all",
    "newkey": "value"
  }
]

nwo49xxi

nwo49xxi4#

好吧,我找到了另一个答案,它不是使用yq,而是使用Python,它很可能安装在很多机器上:

# for file in $(ls content/*yaml); do (echo filename: $file; cat $file) | python -c 'import yaml; import json; import sys; print(json.dumps(yaml.safe_load(sys.stdin)));' ; done | jq -s
[
  {
    "filename": "content/1.yaml",
    "time": "2020-09-14T22:33:40Z",
    "id": "ed1d4321",
    "name": "One",
    "description": "Here is number \"one\" this is good"
  },
  {
    "filename": "content/2yaml",
    "time": "2021-09-14T22:33:40Z",
    "id": "eg134841",
    "name": "Two",
    "description": "Here is number \"two\" best of all",
    "newkey": "value"
  }
]

字符串

au9on6nz

au9on6nz5#

为了记录,你可以使用gojq,jq的Go实现,因为它支持YAML:

gojq -n --yaml-input '[inputs | {filename: input_filename} + .] ' *.yaml

字符串
请注意,gojq将对对象键进行排序。

相关问题