Python：不能将所有指向dict的链接保存在json文件中，只能保存最后一个

xnifntxz 于 2023-06-25 发布在 Python

关注(0)|答案(3)|浏览(116)

我用python编写了这段代码，以获取所有链接并将其放入json文件中，但由于某些原因，我只获取了最后一个链接（网站和类见代码）。任何想法，为什么它不能正常工作？

import requests
from bs4 import BeautifulSoup
import json

headers = {
>     "Accept": "*/*",
>     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0Safari/537.36"
> }

number = 0

for page_number in range(1, 2):
    url = f"https://www.sothebysrealty.com/eng/associates/int/{page_number}-pg"
    req = requests.get(url, headers=headers)
    src = req.text
    soup = BeautifulSoup(src, "lxml")
    name_link = soup.find_all("a", class_="Entities-card__cta btn u-text-uppercase u-color-sir-blue palm--hide")

    all_links_dict = {}
    for item in name_link:
        value_links = ("https://www.sothebysrealty.com" + item.get("href"))

    all_links_dict[number + 1] = value_links

    with open("all_links_dict.json", "w", encoding="utf-8-sig") as file:
    json.dump(all_links_dict, file, indent=4, ensure_ascii=False)

JSON

来源：https://stackoverflow.com/questions/76530648/python-can-not-save-all-links-to-a-dict-in-json-file-only-the-last-one

3条答案

按热度按时间

0x6upsns1#

这是因为all_links_dict[number + 1] = value_links不在for item in name_link循环中。所以你只加了一次诏书。
你还必须在循环中增加number。

for item in name_link:
    value_links = ("https://www.sothebysrealty.com" + item.get("href"))
    all_links_dict[number] = value_links
    number += 1

赞(0）回复(0）举报 2023-06-25

jslywgbw2#

我注意到了几件事。
首先，您的页码range(1,2)。在python中，stop不包含在range中，所以for循环只会在页码为1时运行一次。
其次，all_links_dict = {}行每次都将字典重置为空dict。
最后，您在'w'模式下每次循环迭代打开文件，然后json转储，这将覆盖之前的任何内容。
我建议调整范围，将字典初始化移出for循环，并在for循环结束时将字典转储到文件中。

赞(0）回复(0）举报 2023-06-25

rsaldnfx3#

有几个问题：

all_links_dict = {}
    for item in name_link:
        value_links = ("https://www.sothebysrealty.com" + item.get("href"))
    all_links_dict[number + 1] = value_links

您不会在任何时候更新number，因此每次循环只保存一个键控1的值。或者在每次迭代中使用page_number的派生来更新自身，或者添加一行来递增number并将其带入内部循环。

all_links_dict = {}
    for item in name_link:
        value_links = ("https://www.sothebysrealty.com" + item.get("href"))
        number += 1
        all_links_dict[number] = value_links

with open("all_links_dict.json", "w", encoding="utf-8-sig") as file:
        json.dump(all_links_dict, file, indent=4, ensure_ascii=False)

在每次迭代中，您应该使用mode="a"而不是"w"来追加，而不是覆盖。但是，您应该注意，在第二次迭代之后，该文件将不是有效的json（即，您无法再对其进行解码）。最好有一个每次都追加的列表，然后在循环之后（或结束时）将列表写入json。

还有一个事实，即 for page_number in range(1, 2): 只会导致一次迭代（其中page_number为1），因此即使有所有这些，也只会保存一个页面的信息，除非范围扩大到包含更多页面。

赞(0）回复(0）举报 2023-06-25

我来回答

Python：不能将所有指向dict的链接保存在json文件中，只能保存最后一个

3条答案

相关问题

热门标签

最新问答