JSON数据中多个数组的Pandas record_path

ccgok5k5  于 2023-03-06  发布在  其他
关注(0)|答案(3)|浏览(146)

我如何读取一个嵌套的JSON文件在Pandas Dataframe ?。我尝试与pd.json_normalize(results,record_path=选项,但无法获得 auto_scan_date。我只是想了解路径和帮助在这种情况下。
下面是位示例JSON内容

[{
  "owner": "admin@123",
  "ipadd": "10.10.10.10",
  "servername": "demoserver1.admin.com",
  "Status": "live",
  "config": [
    {
      "ipadd": "10.10.10.10",
      "scan": {
        "last_scan_date": "2000-10-10 23:53",
        "auto_scan": [
            {
            "auto_scan_date": "2000-10-11 23:53" 
            }
            ],
        "scan_status": "Enable",
        "enable_datetime": "2000-09-20 23:53",
        "scanned_by": "serveradmin"
      },
      "repo": "main"
    }
  ],
  "repo": "main"
},{
  "owner": "admin@123",
  "ipadd": "10.10.10.11",
  "servername": "demoserver2.admin.com",
  "Status": "live",
  "config": [
    {
      "ipadd": "10.10.10.10",
      "scan": {
        "last_scan_date": "2000-10-10 23:53",
        "auto_scan": [
            {
            "auto_scan_date": "2000-10-11 23:53" 
            }
            ],
        "scan_status": "Enable",
        "enable_datetime": "2000-09-20 23:53",
        "scanned_by": "serveradmin"
      },
      "repo": "main"
    }
  ],
  "repo": "main"
}]

在输出下面查找

servername| ipadd| last_scan_date| auto_scan_date| scan_status| enable_datetime
--------------------------------------------------------------------------------
demoserver1.admin.com|10.10.10.10|2000-10-10 23:53|2000-10-11 23:53| Enable|2000-09-20 23:53
kg7wmglp

kg7wmglp1#

一个简单的方法是使用flatten_json来扁平化json数据,将扁平化的数据传递给panda,当你在代码片段中 print the commented df 时,你会看到所有以记录路径作为标题的列,重命名你感兴趣的列,并选择它们作为你想要的df。

from flatten_json import flatten

flattened_data = [flatten(d, '.') for d in json_data]
df = pd.DataFrame(flattened_data)
# print(df)
df = df.rename(
    columns={
        'config.0.scan.last_scan_date': 'last_scan_date',
        'config.0.scan.auto_scan.0.auto_scan_date': 'auto_scan_date',
        'config.0.scan.scan_status': 'scan_status',
        'config.0.scan.enable_datetime': 'enable_datetime'
    })

df = df[[
  'servername', 'ipadd', 'last_scan_date', 'auto_scan_date', 'scan_status', 'enable_datetime'
]]

print(df)

如果您更喜欢使用json_normalize,则需要提供一个ordered列表,json_normalize会将其链接为路径以获取所需的值,因此,在record_pathmeta参数的帮助下,链接的路径将成为列名。然后,您可以重命名列,最后对 Dataframe 重新排序。两种方法都将产生相同的输出。
x一个一个一个一个x一个一个二个x

j9per5c4

j9per5c42#

记录路径应该指向要扁平化的列表,在本例中为auto_scan。对于JSON数据中的每个列表级别,都在record_path中添加一个嵌套列表:

df = pd.json_normalize(data, record_path=['config', ['scan', 'auto_scan']])

输出:

auto_scan_date
0  2000-10-11 23:53
1  2000-10-11 23:53

然后,必须提供meta参数以获取所需的所有其他列:

df = pd.json_normalize(data, record_path=['config', ['scan', 'auto_scan']], meta=['servername', 'ipadd', ['config', 'scan', 'last_scan_date'], ['config', 'scan', 'scan_status'], ['config', 'scan', 'enable_datetime']])

df.T

输出:

0                      1
auto_scan_date                    2000-10-11 23:53       2000-10-11 23:53
servername                   demoserver1.admin.com  demoserver2.admin.com
ipadd                                  10.10.10.10            10.10.10.11
config.scan.last_scan_date        2000-10-10 23:53       2000-10-10 23:53
config.scan.scan_status                     Enable                 Enable
config.scan.enable_datetime       2000-09-20 23:53       2000-09-20 23:53

为了获得您所要求的确切结果,我们以重命名/重新排序一些列来结束:

df = df.rename({'config.scan.last_scan_date': 'last_scan_date', 'config.scan.scan_status': 'scan_status', 'config.scan.enable_datetime': 'enable_datetime'}, axis=1)
df = df[['servername', 'ipadd', 'last_scan_date', 'auto_scan_date', 'scan_status', 'enable_datetime']]

df.T

输出:

0                      1
servername       demoserver1.admin.com  demoserver2.admin.com
ipadd                      10.10.10.10            10.10.10.11
last_scan_date        2000-10-10 23:53       2000-10-10 23:53
auto_scan_date        2000-10-11 23:53       2000-10-11 23:53
scan_status                     Enable                 Enable
enable_datetime       2000-09-20 23:53       2000-09-20 23:53
zf9nrax1

zf9nrax13#

您的数据,json_data

json_data = [
    {
    "owner": "admin@123",
    "ipadd": "10.10.10.10",
    "servername": "demoserver1.admin.com",
    "Status": "live",
    "config": [
        {
            "ipadd": "10.10.10.10",
            "scan": {
                "last_scan_date": "2000-10-10 23:53",
                "auto_scan": [
                    {
                        "auto_scan_date": "2000-10-11 23:53"
                    }
                ],
                "scan_status": "Enable",
                "enable_datetime": "2000-09-20 23:53",
                "scanned_by": "serveradmin"
            },
            "repo": "main"
        }
    ],
    "repo": "main"
    },
    {
    "owner": "admin@123",
    "ipadd": "10.10.10.11",
    "servername": "demoserver2.admin.com",
    "Status": "live",
    "config": [
        {
            "ipadd": "10.10.10.10",
            "scan": {
                "last_scan_date": "2000-10-10 23:53",
                "auto_scan": [
                    {
                        "auto_scan_date": "2000-10-11 23:53"
                    }
                ],
                "scan_status": "Enable",
                "enable_datetime": "2000-09-20 23:53",
                "scanned_by": "serveradmin"
            },
            "repo": "main"
        }
    ],
    "repo": "main"
    }
]

你可以试试这样的方法

import pandas as pd
df = pd.DataFrame(
        {"servername": [data['servername'] for data in json_data],
         "ipadd": [data['config'][0]['ipadd'] for data in json_data],
         "last_scan_date": [data['config'][0]['scan']['last_scan_date'] for data in json_data],
         "auto_scan_date": [data['config'][0]['scan']['auto_scan'][0]['auto_scan_date'] for data in json_data],
         "scan_status": [data['config'][0]['scan']['scan_status'] for data in json_data],
         "enable_datetime": [data['config'][0]['scan']['enable_datetime'] for data in json_data]
        })
print(df.head())

servername        ipadd    last_scan_date    auto_scan_date scan_status   enable_datetime
0  demoserver1.admin.com  10.10.10.10  2000-10-10 23:53  2000-10-11 23:53      Enable  2000-09-20 23:53
1  demoserver2.admin.com  10.10.10.10  2000-10-10 23:53  2000-10-11 23:53      Enable  2000-09-20 23:53

相关问题