通过取其中一个键值将单个JSON转换为多个JSON?

dpiehjr4  于 2023-02-06  发布在  其他
关注(0)|答案(2)|浏览(100)

我目前有一个json文件,格式如下:

{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record1",
                "Type":"CNAME",
                "SetIdentifier":"record1-ap-northeast",
                "GeoLocation":{
                    "CountryCode":"JP"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record1"
                    }
                ],
                "HealthCheckId":"ID"
            }
        },
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record2",
                "Type":"CNAME",
                "SetIdentifier":"record2-ap-south",
                "GeoLocation":{
                    "CountryCode":"SG"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record2"
                    }
                ],
                "HealthCheckId":"ID"
            }
        },
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record3",
                "Type":"CNAME",
                "SetIdentifier":"record3-ap-west",
                "GeoLocation":{
                    "CountryCode":"IN"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record3"
                    }
                ],
                "HealthCheckId":"ID"
            }
        },
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4",
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
            }
        }
    ]
}

原始文件有20000个这样的值为“更改键”。我想创建一个文件与830个值在每个文件中,并创建尽可能多的文件,因为它创建。为了实现这一点,我需要它在下面的格式。

{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4", #830 such arrays in each file
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
            }
        }
    ]
}

我创建了下面的shell脚本来完成此操作

#!/bin/bash

# Set the input file name
input_file="input.json"

# Set the output file prefix
output_file_prefix="output"

# Set the number of objects per output file
objects_per_file=830

# Skip the first two lines of the input file
tail -n +3 "$input_file" > temp.json

# Get the total number of lines in the input file
total_lines=$(wc -l < temp.json)

# Calculate the number of output files needed
output_files=$(((total_lines + objects_per_file - 1) / objects_per_file))

# Split the input file into multiple output files
split -l $objects_per_file temp.json "$output_file_prefix"

# Loop through each output file and add the opening and closing square brackets
for file in "$output_file_prefix"*; do
  echo "[" > "$file".json
  cat "$file" >> "$file".json
  echo "]" >> "$file".json
  rm "$file"
done

# Remove the temporary file
rm temp.json

**通过使用此函数,我得到了预期的输出,但它被破坏了,因为它考虑了830行,而不是830个数组。**格式:

#start of file
[
{
"Action": "DELETE",
"ResourceRecordSet":
{
  "Name": "record1",
  "Type": "CNAME",
  "SetIdentifier": "record1-ap-northeast",
  "GeoLocation": {
    "CountryCode": "JP"
  },
  "TTL": 60,
  "ResourceRecords": [
    {
      "Value": "record1"
    }
  ],
  "HealthCheckId": "ID"
}
},

#end of file
{
"Action": "DELETE",
"ResourceRecordSet":
{
"Action": "DELETE",
"ResourceRecordSet":
{
  "Name": "record4.",
  "Type": "CNAME",
  "SetIdentifier": "record4"
]

我怎样才能达到所需的结果。由于字符限制,我不能在每个文件中使用超过830个这样的数组?我尝试使用jq工具来实现这一点,但我完全是新的。你能帮助我吗?

tcomlyy6

tcomlyy61#

如果你想使用jq,你将不得不分两三步来做,然而每一步都非常简单。
第一步使用带有-c选项的jq创建一个包含所需JSON对象的JSONLines文件:

< input.json jq -c '
  (.Changes | _nwise(830)) as $C   # 830 per problem statement
  | .Changes = $C
' > output.jsonl

接下来,将输出的.jsonl分割成你想要的文件,这可以用很多方法来完成,例如使用awk,甚至是shell的read
最后,如果您希望单独的文件“打印得很漂亮”,可以使用jq以显而易见的方式来实现。

uajslkp6

uajslkp62#

因此,即使假设一些JSON "行"被压缩成一行(例如,相当于jq -c),而其他行以树格式打印,那么您所需要的只是awk中的右侧regex,以标识其行分隔符/sep("RS"):

gcat <( printf '%s' "$json_in_1$json_in_1$ajson_in_1" | jq -c      ) 
     <( printf '%s\n%s\n%s' "$json_in_1" "$json_in_1" "$json_in_1" ) |
{m,g,n}awk '
 BEGIN { RS = (_ = "[[:space:]]*") (__ = "[}]") \
              (_)__ (_)"[]]" (_)__ (FS =  "\n") "?"
        ORS = (_ = "}")_     ("]")_ FS
        OFS = "\f\r\t"
       _+=_^= __ = (_<_)
 } { 
     printf(" NR # %d | NF = %d :: %s>>>>%s%s%.*s%s>>>>%s",
              NR, NF, FS, FS, $__, _<NF, FS, ORS, FS, FS) }'
NR # 1 | NF = 1 :: 
>>>>
{"Comment":"json data","Changes":[{"Action":"DELETE","ResourceRecordSet":{"Name":"record4.","Type":"CNAME","SetIdentifier":"record4","GeoLocation":{"CountryCode":"*"},"TTL":60,"ResourceRecords":[{"Value":"record4-ap-west"}],"HealthCheckId":"ID"}}]}
>>>>
 NR # 2 | NF = 1 :: 
>>>>
{"Comment":"json data","Changes":[{"Action":"DELETE","ResourceRecordSet":{"Name":"record4.","Type":"CNAME","SetIdentifier":"record4","GeoLocation":{"CountryCode":"*"},"TTL":60,"ResourceRecords":[{"Value":"record4-ap-west"}],"HealthCheckId":"ID"}}]}
>>>>
 NR # 3 | NF = 19 :: 
>>>>
{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4",
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
}}]}
>>>>
 NR # 4 | NF = 19 :: 
>>>>
{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4",
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
}}]}
>>>>
 NR # 5 | NF = 19 :: 
>>>>
{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4",
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
}}]}
>>>>

那么一旦您能够隔离各个"Change Key"记录,那么每830行输出一次应该是相对直接的。
您可以通过以下方式将其输出传输到更下游以确认输出是否有效JSON

... | awk '/^[{]/,/[}][}][]][}]$/' | jq

只要输入结构被很好地定义,那么awk就可以很好地处理JSON,而不需要专用的解析器。

相关问题