无法从python中的json文件中抓取某些值

kulphzqa  于 2023-06-25  发布在  Python
关注(0)|答案(1)|浏览(133)

我想从json文件中抓取数据,但是我无法抓取json值的可用性(json文件中的“可用”)。其他值已成功报废。
它在列上显示为空白。

var availability= "" if i >= len(variants) else variants[i].get('available', '')
import asyncio
import os
import random
import time
import openpyxl
import aiohttp
from urllib import request

# path="C:/Users/pengoul/Downloads/dl" 
path = os.getcwd()
print(f"CWD is {path}")
path = os.path.join(path, "download")
if not os.path.exists(path):
        os.makedirs(path)

# picpath= os.makedirs('picture')
async def request():
    async with aiohttp.ClientSession() as session:
        async with session.get(url='https://hiutdenim.co.uk/products.json?limit=500') as resp:
            html = await resp.json()
            k = list()
            f = openpyxl.Workbook()
            sheet = f.active
            sheet.append(['Name', 'Barcode', 'Product Category', 'Image', 'Internal Reference', 'Sales Price','Product Tags'])

            products = []

            print("Saving to excel ...")
            for i in html['products']:
                title = i.get('title')
                id1 = i.get('id')
                product_type = i.get('product_type')
                images = [img.get('src', '') for img in i.get('images', [])]
                products.append((title, id1, product_type, images))
                variants = [var for var in i.get('variants')]
                for i in range(max(len(images), len(variants))):
                    imgsrc = "" if i >= len(images) else images[i]
                    varsku = "" if i >= len(variants) else variants[i].get('sku', '')
                    varprice = "" if i >= len(variants) else variants[i].get('price', '')
                    varavailability= "" if i >= len(variants) else variants[i].get('available', '')
                    sheet.append([title, "'" + str(id1), product_type, imgsrc, varsku, varprice, varavailability])
                f.save(f"result230102.xlsx")

 print("Downloading images ...")
            for product in products:
                title, id1, product_type, images = product
                for seq, imgurl in enumerate(images):
                    print(f"Downloading img for {id1} ({seq + 1}/{len(images)})")
                    request.urlretrieve(imgurl, os.path.join(path, f"{id1}-{seq + 1}.jpg"))

async def download(url):
    image = url[0]
    file_name = f'{url[1]}.jpg'
    print(f'picpath/{file_name}')
    async with aiohttp.ClientSession() as session:
        time.sleep(random.random())
        async with session.get(image) as resp:
            with open(path+ file_name, mode='wb') as f:
                f.write(await resp.content.read())

#     print(f'picpath/{file_name}')

async def main():
    if not os.path.exists(path):
        os.mkdir(path)
    tasks = []
    await request()
    # for url in urls:
    #     tasks.append(asyncio.create_task(download(url)))
    # await asyncio.wait(tasks)

if __name__ == '__main__':
    print(os.getpid())
    t1 = time.time()
    urls = []
    loop = asyncio.get_event_loop()  
    loop.run_until_complete(main())  
    t2 = time.time()
    print('total:', t2 - t1)

这一栏显示为空白。
我想从json中提取“available”的值。

1u4esq0p

1u4esq0p1#

我在调试器中运行了您的代码,在有问题的行放置了断点。在执行过程中多次命中此断点。在某些情况下,它会为varavailability生成一个True值,正如您所期望的那样。
i的值是1variants的长度也是1时,这一行最终会执行。在这种情况下,根据if条件if i >= len(variants),变量varavailability被设置为""i被允许具有1的值,因为在这种情况下images的长度是5。在本例中,循环for i in range(max(len(images), len(variants))):将遍历i == 0i == 4。对于每个大于0i值,varavailability将被设置为""。我不确定这是否是你想知道的情况,但它是有道理的。
更新:
至于如何解决这个问题,问题集中在variantsimages的内容如何相互关联以及你在循环中做了什么:

for i in range(max(len(images), len(variants))):
    imgsrc = "" if i >= len(images) else images[i]
    varsku = "" if i >= len(variants) else variants[i].get('sku', '')
    varprice = "" if i >= len(variants) else variants[i].get('price', '')
    varavailability= "" if i >= len(variants) else variants[i].get('available', '')
    sheet.append([title, "'" + str(id1), product_type, imgsrc, varsku, varprice, varavailability])

看起来代码在一个产品列表上迭代,每个产品都有两个与之相关的列表,一个图像列表和一个变体列表。我的猜测是这两个列表的内容是独立的... images中的每个值都不对应于variants中的特定条目。
如果您想要的是产品变体表,一种可能的解决方案是将特定产品的所有图像与该产品的每个变体相关联,然后只需迭代每个变体。可能是这样的:

imgsrc = " ".join(images)
for variant in variants:
    varsku = variants.get('sku', '')
    varprice = variants.get('price', '')
    varavailability = variants.get('available', '')
    sheet.append([title, "'" + str(id1), product_type, imgsrc, varsku, varprice, varavailability])

相关问题