使用python将多个json值追加到Single Pandas列中

cyvaqqii  于 2023-06-20  发布在  Python
关注(0)|答案(2)|浏览(111)

我在从JSON中获取值并存储在Dataframe中时遇到了麻烦。我的Json看起来像

{
  "issues": [
    {
      "expand": "operations",
      "id": "1",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1001",
            "value": "Mobile",
            "id": "1001",
            "disabled": "false"
          }
        ]
      }
    },
    {
      "expand": "operations",
      "id": "2",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1002",
            "value": "Desktop",
            "id": "1002",
            "disabled": false
          },
          {
            "self": "https://url1001",
            "value": "Mobile",
            "id": "1001",
            "disabled": false
          }
        ]
      }
    },
    {
      "expand": "operations",
      "id": "3",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1003",
            "value": "ios",
            "id": "1002",
            "disabled": false
          }
        ]
      }
    },
    {
      "expand": "operations",
      "id": "4",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1002",
            "value": "Desktop",
            "id": "1002",
            "disabled": false
          },
          {
            "self": "https://url1001",
            "value": "Mobile",
            "id": "1001",
            "disabled": false
          },
          {
            "self": "https://url1003",
            "value": "ios",
            "id": "1003",
            "disabled": false
          }
        ]
      }
    }
  ]
}

下面是我的部分代码

df2=pd.dataframe()
        d=pd.json_normalize(json.loads(df1['customfield_100'].to_json(orient='record')))
        filtered_component=[]
        for index in range(len(issues.id)):
             if((pd.json_normalize(df1['customfield_100'][index])).size>0):
                 filtered_component.append(d[0][index]['value']
             else:
                 filtered_component.append('No Component')
          df2['Component']=filterd_component

当我列出df2 ['Component ']时,我得到以下输出

'Mobile'
'Desktop'
'ios'
'Desktop'

我希望我的输出是这样的(当我列出df2 [组件]),即如果customfield_100有多个值,那么我希望这些值被分隔;.我不确定循环/代码应该如何编写

'Mobile'
'Desktop';'Mobile'
'ios'
'Desktop';'Mobile';'ios'
ecbunoof

ecbunoof1#

另一种可能的解决方案:

df = pd.json_normalize(
    data, 
    record_path=['issues', 'fields', 'customfield_100'], 
    meta=[['issues', 'id']])

df.groupby('issues.id')['value'].agg(';'.join)

在哪里

data = {
  "issues": [
    {
      "expand": "operations",
      "id": "1",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1001",
            "value": "Mobile",
            "id": "1001",
            "disabled": False
          }
        ]
      }
    },
    {
      "expand": "operations",
      "id": "2",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1002",
            "value": "Desktop",
            "id": "1002",
            "disabled": False
          },
          {
            "self": "https://url1001",
            "value": "Mobile",
            "id": "1001",
            "disabled": False
          }
        ]
      }
    },
    {
      "expand": "operations",
      "id": "3",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1003",
            "value": "ios",
            "id": "1002",
            "disabled": False
          }
        ]
      }
    },
    {
      "expand": "operations",
      "id": "4",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1002",
            "value": "Desktop",
            "id": "1002",
            "disabled": False
          },
          {
            "self": "https://url1001",
            "value": "Mobile",
            "id": "1001",
            "disabled": False
          },
          {
            "self": "https://url1003",
            "value": "ios",
            "id": "1003",
            "disabled": False
          }
        ]
      }
    }
  ]
}

输出:

issues.id
1                Mobile
2        Desktop;Mobile
3                   ios
4    Desktop;Mobile;ios
Name: value, dtype: object
stszievb

stszievb2#

如果data包含解析后的Json数据,则可以执行以下操作:

all_data = []
for i in data['issues']:
    for k, v in i['fields'].items():
        for vv in v:
            all_data.append({'main_id': i['id'], 'field_id': k, **vv})

df = pd.DataFrame(all_data)
print(df)

这将打印:

main_id         field_id             self    value    id disabled
0       1  customfield_100  https://url1001   Mobile  1001    false
1       2  customfield_100  https://url1002  Desktop  1002    False
2       2  customfield_100  https://url1001   Mobile  1001    False
3       3  customfield_100  https://url1003      ios  1002    False
4       4  customfield_100  https://url1002  Desktop  1002    False
5       4  customfield_100  https://url1001   Mobile  1001    False
6       4  customfield_100  https://url1003      ios  1003    False

然后,您可以按main_id分组,例如:

df = df.groupby('main_id')['value'].agg(';'.join)
print(df)

这将打印:

main_id
1                Mobile
2        Desktop;Mobile
3                   ios
4    Desktop;Mobile;ios
Name: value, dtype: object

相关问题