我想网页刮“搜索:pc”的网站称为jumia的一部分.我想在所有的页面上搜索,但不幸的是它没有工作,我不知道为什么它覆盖文件,而它是循环外,并通过使用:
with pd.ExcelWriter("output.xlsx", engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
pop.to_excel(writer, sheet_name="sheet1"
字符串
而不是:
with open(f"output.xlsx" ,"a") :
with pd.ExcelWriter("output.xlsx") as writer:
pop.to_excel(writer,sheet_name="sheet2")
型
但它会导致一个错误:
File "c:\Users\hp\Desktop\python_projects\test3.py", line 40, in <module>
find_computers()
File "c:\Users\hp\Desktop\python_projects\test3.py", line 33, in find_computers
with pd.ExcelWriter("output.xlsx", engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\hp\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\excel\_openpyxl.py", line 61, in __init__
super().__init__(
File "C:\Users\hp\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\excel\_base.py", line 1263, in __init__
self._handles = get_handle(
^^^^^^^^^^^
File "C:\Users\hp\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\common.py", line 872, in get_handle
handle = open(handle, ioargs.mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'output.xlsx'
型
这是我的实际代码:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time
import openpyxl
from bs4 import Tag
def find_computers():
n=1
while n<=50:
html_text=requests.get(f"https://www.jumia.ma/catalog/?q=pc&page={n}#catalog-listing").text
soup=BeautifulSoup(html_text,"lxml")
computers=soup.find_all("a",class_="core")
df={"price": [],"original price": [],"promo":[]}
computer_name_list=[]
for computer in computers:
computer_name=computer.find("h3",class_="name").text.strip()
price=computer.find("div",class_="prc").text.strip()
original_price_element=computer.find("div",class_="old")
original_price=original_price_element.text.strip() if isinstance(original_price_element, Tag) else "N/A"
promo_element = computer.find("div", class_="bdg _dsct _sm")
promo = promo_element.text.strip() if isinstance(promo_element, Tag) else "N/A"
df["price"].append(price)
df["original price"].append(original_price)
df["promo"].append(promo)
computer_name_list.append(computer_name)
n+=1
pop=pd.DataFrame(df,index=computer_name_list)
pd.set_option('colheader_justify', 'center')
with pd.ExcelWriter("output.xlsx") as writer:
pop.to_excel(writer,sheet_name="sheet2")
if __name__=="__main__":
while True:
find_computers()
time_s = 10
time.sleep(6 * time_s)
型
3条答案
按热度按时间lfapxunr1#
对我来说也很好,所以除了避免创建空的嵌套框架和在循环中追加一个例子,如何存储结果和在集合后创建嵌套框架:
字符串
ajsxfq5m2#
您需要将df起始行移动到循环之外。请参见示例:
字符串
jgovgodb3#
其中一个问题是,你不能附加一个df/sheet到一个 * 不存在的 * Excel文件。
你应该有一个预先创建的文件,或者在开始这个过程之前创建一个:
字符串
这里有一个建议,保存计算机信息(* 每页 *)在一个单独的ss实时:
型
预览(* 输出.xlsx,带2页 *):
x1c 0d1x的数据