json jq将多个文本文件拆分为多个数组

r6l8ljro 于 2023-08-08 发布在其他

关注(0)|答案(5)|浏览(73)

我有一个问题，我有一些文件的内容有多个键值对，我想转换成多个数组。
让我用一些制作的例子来说明我的意思。首先是文件的内容：

# cat content/1.yaml
time: "2020-09-14T22:33:40Z"
id: ed1d4321
name: One
description: 'Here is number "one"
  this is good'

# cat content/2yaml
time: "2021-09-14T22:33:40Z"
id: eg134841
name: Two
description: 'Here is number "two"
  best of all'
newkey: value

字符串
在下一步中，我将这些文件合并到一个blob中，其中包含我想要保留的文件名：

# for file in $(ls content/*yaml); do echo filename: $file; cat $file; done
filename: content/1.yaml
time: "2020-09-14T22:33:40Z"
id: ed1d4321
name: One
description: 'Here is number "one"
  this is good'
filename: content/2yaml
time: "2021-09-14T22:33:40Z"
id: eg134841
name: Two
description: 'Here is number "two"
  best of all'
newkey: value

型
现在问题开始了，如何将这些集合到json数组中？
这就是我到目前为止的想法：

# for file in $(ls content/*yaml); do echo filename: $file; cat $file; done | jq -Rn '[inputs|split(": ")] | map({(.[0]): .[1]})'
[
  {
    "filename": "content/1.yaml"
  },
  {
    "time": "\"2020-09-14T22:33:40Z\""
  },
  {
    "id": "ed1d4321"
  },
  {
    "name": "One"
  },
  {
    "description": "'Here is number \"one\""
  },
  {
    "  this is good'": null
  },
  {
    "filename": "content/2yaml"
  },
  {
    "time": "\"2021-09-14T22:33:40Z\""
  },
  {
    "id": "eg134841"
  },
  {
    "name": "Two"
  },
  {
    "description": "'Here is number \"two\""
  },
  {
    "  best of all'": null
  },
  {
    "newkey": "value"
  }
]

型
这已经很接近了，但我仍然需要解决一些问题，我没有找到解决方案：
1.文件名不会分散到单独的数组中。

time字段不应该有转义的带引号的字符串。我想有一个解决方案，在所有领域的迭代，并会扩大这些内容的引号像这里的例子"time": "2021-09-14T22:33:40Z"
description值分布在多行中，我希望看到它们合并成一个值，但这不是目前发生的事情，所以应该是这样的："description": "Here is number \"two\" best of all。单引号不应保留。
所以最后的结果应该是这样的：

[
  {
    "filename": "content/1.yaml",
    "time": "2020-09-14T22:33:40Z",
    "id": "ed1d4321",
    "name": "One",
    "description": "Here is number \"one\"  this is good"
  },
  {
    "filename": "content/2yaml",
    "time": "2021-09-14T22:33:40Z",
    "id": "eg134841",
    "name": "Two",
    "description": "Here is number \"two\"  best of all",
    "newkey": "value"
  }
]

型

JSON

来源：https://stackoverflow.com/questions/76744674/jq-split-multiple-text-files-into-multiple-arrays

5条答案

按热度按时间

3ks5zfa01#

下面的代码一次处理一个文件，并假定使用-R和-s命令行选项（jq -Rs）调用jq。将多个文件的结果组合起来作为练习。（提示：对于文件名，使用input_filename。

def objectify:
    capture("(?<key>[^:]+): *(?<value>.*)")
    | .value = (.value | (fromjson? // .))
    | [.]
    | from_entries;

  gsub("\n  *"; " ")        # join dangling text
  | . / "\n"                # split
  | map(select(length>0))   # ignore ""
  | map(objectify)          # {key, value}
  | add

字符串

赞(0）回复(0）举报 2023-08-08

mfuanj7w2#

这是一个部分解决方案--值还没有被“清理”。这是留给读者的摘录：-）
从jq --slurp --raw-input开始：

# split lines
split("\n")
# join lines starting with whitespace with previous line
| reduce .[] as $l (
    null;
    if $l | startswith(" ") then .[-1] += $l else . += [$l] end
)
# split on first colon, returning an array of objects like {key: X, value: Y}
| map(capture("^(?<key>[^:]+):\\s*(?<value>.*)$"))
# combine these simple objects into bigger objects but begin a new objects when encountering "filename"
| reduce .[] as $e (null; 
    if $e.key == "filename" then . += [{}] else . end
    | .[-1][$e.key] = $e.value
)

字符串
输出如下：

[
  {
    "filename": "content/1.yaml",
    "time": "\"2020-09-14T22:33:40Z\"",
    "id": "ed1d4321",
    "name": "One",
    "description": "'Here is number \"one\"  this is good'"
  },
  {
    "filename": "content/2yaml",
    "time": "\"2021-09-14T22:33:40Z\"",
    "id": "eg134841",
    "name": "Two",
    "description": "'Here is number \"two\"  best of all'",
    "newkey": "value"
  }
]

型

赞(0）回复(0）举报 2023-08-08

sbdsn5lh3#

这可能更适合yq，而不是尝试重新实现YAML解析器。
这样的东西会起作用：

yq eval-all -o=json '[{"filename": filename} + .]' *.yaml

字符串
导致

[
  {
    "filename": "1.yaml",
    "time": "2020-09-14T22:33:40Z",
    "id": "ed1d4321",
    "name": "One",
    "description": "Here is number \"one\" this is good"
  },
  {
    "filename": "2.yaml",
    "time": "2021-09-14T22:33:40Z",
    "id": "eg134841",
    "name": "Two",
    "description": "Here is number \"two\" best of all",
    "newkey": "value"
  }
]

型

赞(0）回复(0）举报 2023-08-08

nwo49xxi4#

好吧，我找到了另一个答案，它不是使用yq，而是使用Python，它很可能安装在很多机器上：

# for file in $(ls content/*yaml); do (echo filename: $file; cat $file) | python -c 'import yaml; import json; import sys; print(json.dumps(yaml.safe_load(sys.stdin)));' ; done | jq -s
[
  {
    "filename": "content/1.yaml",
    "time": "2020-09-14T22:33:40Z",
    "id": "ed1d4321",
    "name": "One",
    "description": "Here is number \"one\" this is good"
  },
  {
    "filename": "content/2yaml",
    "time": "2021-09-14T22:33:40Z",
    "id": "eg134841",
    "name": "Two",
    "description": "Here is number \"two\" best of all",
    "newkey": "value"
  }
]

字符串

赞(0）回复(0）举报 2023-08-08

au9on6nz5#

为了记录，你可以使用gojq，jq的Go实现，因为它支持YAML：

gojq -n --yaml-input '[inputs | {filename: input_filename} + .] ' *.yaml

字符串
请注意，gojq将对对象键进行排序。

赞(0）回复(0）举报 2023-08-08

我来回答

json jq将多个文本文件拆分为多个数组

5条答案

相关问题

热门标签

最新问答