如何从嵌套JSON文件中提取数据并在Python中创建数据框

zzzyeukh  于 2023-05-23  发布在  Python
关注(0)|答案(3)|浏览(162)

bounty将在5天内到期。回答此问题可获得+100声望奖励。user1471980正在寻找此问题的最新答案。

我有这个嵌套的JSON数据:

{
  "result": [
    {
    "deviceid": 33,
    "devicename": "server101",
    "objectName": "CPU",
    "data": [
    {
      "value":0.59,
      "rvalue":null
    },
    {
      "value":90,
      "rvalue":null
    },
    {
      "value": 85,
      "rvalue":null
    }
          ]
  },
  {
  "deviceid": 30,
    "devicename": "server10",
    "objectName": "CPU",
    "data": [
    {
      "value":0.30,
      "rvalue":null
    },
    {
      "value":60,
      "rvalue":null
    },
    {
      "value": 79,
      "rvalue":null
    }
    ]
  },
  {
  "deviceid": 0,
    "devicename": "server300",
    "objectName": "CPU",
    "data": [
    {
      "value":0.10,
      "rvalue":null
    },
    {
      "value":0.20,
      "rvalue":null
    },
    {
      "value": 0.25,
      "rvalue:":null
    }]
  }
  ],
  "timeRanges": [
    {
      "name":"1st Month",
      "startTime":1680000000,
      "endTime": 1689000000
    },
    {
      "name":"2nd Month",
      "startTime": 1680000000,
      "endTime": 1689000000
    },
    {
      "name":"3rd Month",
      "startTime": 1680000000,
      "endTime": 1689000000 
    }
  ]
}

我需要从这个json中提取数据并附加到一个 Dataframe 。
输出应该是这样的:

deviceid deviceName objectName 1stMonth 2ndMonth 3rdMonth. startTime. endTime
33      server101  CPU      0.59     90      85       1680000000  1689000000
30      server10   CPU      0.30     60      79      1680000000  1689000000
0       server300  CPU      0.10     0.20    0.25     1680000000  1689000000

我对这方面很陌生,希望得到任何指导。

hgb9j2n6

hgb9j2n61#

下面的代码似乎产生了预期的结果。将null转换为"null"。但是,正如下面的注解中所提到的,Python可以使用json.loads()识别null

import pandas as pd

adict =    {
  "result": [
    {
    "deviceid": 33,
    "devicename": "server101",
    "objectName": "CPU",
    "data": [
    {
      "value":0.59,
      "rvalue":"null"
    },
    {
      "value":90,
      "rvalue":"null"
    },
    {
      "value": 85,
      "rvalue":"null"
    }
          ]
  },
  {
  "deviceid": 30,
    "devicename": "server10",
    "objectName": "CPU",
    "data": [
    {
      "value":0.30,
      "rvalue":"null"
    },
    {
      "value":60,
      "rvalue":"null"
    },
    {
      "value": 79,
      "rvalue":"null"
    }
    ]
  },
  {
  "deviceid": 0,
    "devicename": "server300",
    "objectName": "CPU",
    "data": [
    {
      "value":0.10,
      "rvalue":"null"
    },
    {
      "value":0.20,
      "rvalue":"null"
    },
    {
      "value": 0.25,
      "rvalue:":"null"
    }]
  }
  ],
  "timeRanges": [
    {
      "name":"1st Month",
      "startTime":1680000000,
      "endTime": 1689000000
    },
    {
      "name":"2nd Month",
      "startTime": 1680000000,
      "endTime": 1689000000
    },
    {
      "name":"3rd Month",
      "startTime": 1680000000,
      "endTime": 1689000000 
    }
  ]
}

rows = []
for ind in range(len(adict["result"])):
    row = [adict["result"][ind]["deviceid"] , adict["result"][ind]["devicename"], 
           adict["result"][ind]["objectName"],  
           adict["result"][ind]["data"][0]["value"],  adict["result"][ind]["data"][1]["value"], 
           adict["result"][ind]["data"][2]["value"], 
           adict["timeRanges"][ind]["startTime"],  adict["timeRanges"][ind]["endTime"]
          ]
    rows.append(row)

column_names = ["deviceid", "deviceName", "objectName", "1stMonth",
                "2ndMonth", "3rdMonth", "startTime", "endTime"]
df = pd.DataFrame(data = rows, columns=column_names)
print(df)

输出:

deviceid deviceName objectName  1stMonth  2ndMonth  3rdMonth   startTime     endTime
0        33  server101        CPU      0.59      90.0     85.00  1680000000  1689000000
1        30   server10        CPU      0.30      60.0     79.00  1680000000  1689000000
2         0  server300        CPU      0.10       0.2      0.25  1680000000  1689000000

我在这里附上了实际的JSON文件:

oymdgrw7

oymdgrw72#

在修复了json文件中的一些 typos 后,您可以用途:

with open("file.json") as file:
    obj = json.load(file)

df = (
    pd.json_normalize(
        obj["result"], "data", ["deviceid", "devicename", "objectName"])
        .join(pd.concat([pd.DataFrame(obj["timeRanges"])]*len(obj["result"]),
             ignore_index=True)).pivot(
        index=["deviceid", "devicename", "objectName", "startTime", "endTime"],
        columns="name", values="value").reset_index().rename_axis(None, axis=1)
)

输出:

print(df.to_string(index=False))

 deviceid devicename objectName  startTime    endTime  1st Month  2nd Month  3rd Month
        0  server300        CPU 1680000000 1689000000       0.10       0.20       0.25
       30   server10        CPU 1680000000 1689000000       0.30      60.00      79.00
       33  server101        CPU 1680000000 1689000000       0.59      90.00      85.00
tf7tbtn2

tf7tbtn23#

我觉得我错过了一些东西,因为你的json和/或结构...但这提供了你要求的结果。

import pandas as pd

# CLEAN THE PAYLOAD
null = None

# WHAT YOU PROVIDED
json = <your nested json>

df = pd.DataFrame(json)
results = pd.json_normalize(df["result"], "data", ["deviceid", "devicename", "objectName"])
times = pd.json_normalize(df["timeRanges"]).drop(columns=['name'])
trans = results.pivot_table(index=["deviceid", "devicename", "objectName"],values='value',aggfunc=list).squeeze()
data = pd.DataFrame(trans.tolist(),index=trans.index,columns=['1stMonth','2ndMonth','3rdMonth.']).reset_index()
final = pd.concat([data, times], axis=1)

相关问题