Pandas Dataframe 中深嵌套Json数据的规范化

6fe3ivhb  于 2023-02-20  发布在  其他
关注(0)|答案(2)|浏览(139)

我刚接触Pandas,很难从一系列化妆品中提取颜色变化的数据。
我的目标是建立一个数据框架,所有的颜色变化,为每个产品在自己的名单。
大致如下:
| 姓名|类型|网址|价格|说明|图像|阴影|六角|
| - ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|
| 产品1|口红|...|二十七时|...|[.,.,.]|[.,.,.]|[.,.]|
I am trying to flatten this data, but I keep receiving key errors.
这是最初的请求。

import requests
import pandas as pd

headers = {
    'authority': 'ncsa.sdapi.io',
    'accept': 'application/json',
    'accept-language': 'en-US,en;q=0.9',
    'authorizationtoken': 'Mi1tYy11cy1lbi1lY29tbXYxOmh0dHBzOi8vbS5tYWNjb3NtZXRpY3MuY29t',
    'business-unit': '2-mc-us-en-ecommv1',
    'cache-control': 'no-cache',
    'clientid': 'stardust-fe-client',
    'content-type': 'application/json',
    'origin': 'https://m.maccosmetics.com',
    'referer': 'https://m.maccosmetics.com/',
    'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
    'sec-ch-ua-mobile': '?1',
    'sec-ch-ua-platform': '"Android"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'cross-site',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36',
}

json_data = {
    'query': '{\n                products(environment: {prod:true},\n                    filter: [{tags:{filter:{key:{in:["lipstick"]}},includeInnerHits:false}}],\n                    sort: [{tags:{product_display_order:ASCENDING}}]\n                ) {\n                    \n        ... product__collection \n        \n        items {\n            ... product_default ... product_productSkinType ... product_form ... product_productCoverage ... product_benefit ... product_productReview ... product_skinConcern ... product_usage ... product_productFinish ... product_usageOptions ... product_brushTypes ... product_brushShapes \n            skus {\n                total\n                items {\n                    ... product__skus_default ... product__skus_autoReplenish ... product__skus_colorFamily ... product__skus_skuLargeImages ... product__skus_skuMediumImages ... product__skus_skuSmallImages ... product__skus_vtoFoundation ... product__skus_vtoMakeup \n                }\n            }\n        }\n    \n    \n                }\n            }\n\nfragment product__collection \n    on product_collection {\n        items {\n            product_id\n            skus {\n                items {\n                    inventory_status\n                    sku_id\n                }\n            }\n        }\n    }\n\n\nfragment product_default \n    on product {\n        default_category {\n            id\n            value\n        }\n        description\n        display_name\n        is_hazmat\n        meta {\n            description\n        }\n        product_badge\n        product_id\n        product_url\n        short_description\n        tags {\n            total\n            items {\n                id\n                value\n                key\n            }\n        }\n    }\n\n\nfragment product_productSkinType \n    on product {\n        skin {\n            type {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product_form \n    on product {\n        form {\n            key\n            value\n        }\n    }\n\n\nfragment product_productCoverage \n    on product {\n        coverage {\n            key\n            value\n        }\n    }\n\n\nfragment product_benefit \n    on product {\n        benefit {\n            benefits {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product_productReview \n    on product {\n        reviews {\n            average_rating\n            number_of_reviews\n        }\n    }\n\n\nfragment product_skinConcern \n    on product {\n        skin {\n            concern {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product_usage \n    on product {\n        usage {\n            content\n            label\n            type\n        }\n    }\n\n\nfragment product_productFinish \n    on product {\n        finish {\n            key\n            value\n        }\n    }\n\n\nfragment product_usageOptions \n    on product {\n        usage_options {\n            key\n            value\n        }\n    }\n\n\nfragment product_brushTypes \n    on product {\n        brush {\n            types {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product_brushShapes \n    on product {\n        brush {\n            shapes {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product__skus_default \n    on product__skus {\n        is_default_sku\n        is_discountable\n        is_giftwrap\n        is_under_weight_hazmat\n        iln_listing\n        iln_version_number\n        inventory_status\n        material_code\n        prices {\n            currency\n            is_discounted\n            include_tax {\n                price\n                original_price\n                price_per_unit\n                price_formatted\n                original_price_formatted\n                price_per_unit_formatted\n            }\n        }\n        sizes {\n            value\n            key\n        }\n        shades {\n            name\n            description\n            hex_val\n        }\n        sku_id\n        sku_badge\n        unit_size_formatted\n        upc\n    }\n\n\nfragment product__skus_autoReplenish \n    on product__skus {\n        is_replenishable\n    }\n\n\nfragment product__skus_colorFamily \n    on product__skus {\n        color_family {\n            key\n            value\n        }\n    }\n\n\nfragment product__skus_skuLargeImages \n    on product__skus {\n        media {\n            large {\n                src\n                alt\n                height\n                width\n            }\n        }\n    }\n\n\nfragment product__skus_skuMediumImages \n    on product__skus {\n        media {\n            medium {\n                src\n                alt\n                height\n                width\n            }\n        }\n    }\n\n\nfragment product__skus_skuSmallImages \n    on product__skus {\n        media {\n            small {\n                src\n                alt\n                height\n                width\n            }\n        }\n    }\n\n\nfragment product__skus_vtoFoundation \n    on product__skus {\n        vto {\n            is_foundation_experience\n        }\n    }\n\n\nfragment product__skus_vtoMakeup \n    on product__skus {\n        vto {\n            is_color_experience\n        }\n    }\n',
    'variables': {},
}

response = requests.post(
    'https://ncsa.sdapi.io/stardust-prodcat-product-v3/graphql/core/v1/extension/v1',
    headers=headers,
    json=json_data,
)

所有这些值都按预期返回

json_object = response.json()

result_items = json_object['data']['products']['items']

result_items[0]['skus']['items'][0]['prices'][0]['include_tax']['price_formatted']

result_items[0]['skus']['items'][0]['shades']

result_items[0]['skus']['items'][0]['media']['large'][0]['src']

result_items[0]['skus']['items'][0]['media']['large'][0]['alt']

result_items[0]['skus']['items'][0]['color_family'][0]['value']

我能够访问单个产品的颜色名称,如

shade_list = []
def get_shade_names():
    items = result_items[0]['skus']['items']
    for item in items:
        shades = item['shades']
        for shade_data in shades:
            shade = shade_data['name']
            shade_list.append(shade)

get_shade_names()
print(shade_list)

但是实现列表的列表的嵌套循环的几次尝试仅导致单个列表或一系列错误。
这是我从DataFrame转到json_normalize的时候,但是,当我尝试使用记录路径和 meta时,我一直收到键错误。
有人能告诉我怎么做吗?我试着脱离Pandas文档中的例子,但似乎什么都不起作用。任何帮助都将不胜感激。

yhived7q

yhived7q1#

创建多个 Dataframe 您的输出不是很清楚,但您可以使用类似于:

# Extract base data from top level records
main_cols = ['product_id', 'display_name', 'description']
main_df = pd.json_normalize(result_items)[main_cols]

# Extract sub dataset
shade_df = pd.json_normalize(result_items, ['skus', 'items', 'shades'], 'product_id', record_prefix='shade.')

# Merge base and other sub dataset
df = main_df.merge(shade_df, on='product_id')

输出:

>>> df
    product_id                        display_name                                        description            shade.name     shade.description shade.hex_val
0        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...        Marrakesh-mere  Intense orange brown       #b0594d
1        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...         Sheer Outrage       Grapefruit pink       #ca5a5a
2        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...         Dubonnet Buzz         Deep red wine       #c95c54
3        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...          Mull It Over           Dirty peach       #a45f51
4        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...             Rose Mary            Soft mauve       #b96161
..         ...                                 ...                                                ...                   ...                   ...           ...
329      19393                      Lipmix / Satin  Lipmix is to the makeup artist as tubes of pai...                 Satin                             #dedede
330       1625                           Lip Erase  M·A·C Pro Lip Erase is a professional product ...                  Pale                   N27       #e3bd92
331      19392                      Lipmix / Gloss  Lipmix is to the makeup artist as tubes of pai...                 Gloss                             #dfdbcb
332      82134         Lipstick / Frosted Firework  A blast of five holiday-exclusive Lustre, Fros...  Once Bitten, Ice Shy  Sheer white w/ pearl       #eae8df
333      52596                     Lustre Lipstick  M·A·C Lipstick – the iconic product that made ...              Lady Bug         Yellow tomato       #b23532
qgelzfjb

qgelzfjb2#

使用meta和record_path的示例:

data = response.json()

df = pd.json_normalize(
    data=data["data"]["products"]["items"],
    meta="product_id",
    record_path=["skus", "items", "shades"]
)

选择产品:

shades = df.query("product_id.eq('99908')")["name"].to_list()
print(shades)

输出:

['Marrakesh-mere', 'Sheer Outrage', 'Dubonnet Buzz', 'Mull It Over', 'Rose Mary', 'Sweet Cinnamon', 'Devoted To Chili', 'Wild Rebel', 'Devoted To Danger', 'Love Clove', 'Ruby New', 'Gingerella', 'Stay Curious', 'Peppery Pink', 'All-Star Anise', 'Nice Spice', 'Spice World', 'Over The Taupe', 'Brickthrough', 'Nutmeg Ganache', 'Sorry Not Sorry', 'Pumpkin Spiced', 'Hot Paprika']

相关问题