selenium 数据被覆盖如何解决这个问题

hlswsv35  于 2022-11-10  发布在  其他
关注(0)|答案(1)|浏览(241)

循环中的每一次迭代都会覆盖先前提取的数据。我怎么才能解决这个问题呢?

from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support.select import Select
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

# url='https://www.amazon.com/dp/B00M0DWQYI?th=1'

# url='https://www.amazon.com/dp/B010RWD4GM?th=1'

PATH="C:\Program Files (x86)\chromedriver.exe"
driver =webdriver.Chrome(PATH)
df_urls = pd.read_csv('D:/selenium/inputs/amazone-asin.csv',encoding='utf-8')
list_dicts_urls =df_urls.to_dict('records')

item=dict()
product=[]
for url in list_dicts_urls:

    product_url = 'https://' + url['MARKETPLACE'] + '/dp/' + url['ASIN']
    driver.get(product_url)

    try:
        item['title'] = driver.find_element(By.CSS_SELECTOR,'span#productTitle').text
    except:
        item['title'] = ''

    try:
        item['brand'] = driver.find_element(By.CSS_SELECTOR,'a#bylineInfo').text.replace('Visit the','').replace('Store','').strip()
    except:
        item['brand'] = ''
    try:
        rating = driver.find_element(By.CSS_SELECTOR,'span#acrCustomerReviewText').text.replace('ratings','').strip()
        rating = int(rating.replace(',', ''))
        item['rating'] = rating
    except:
        item['rating'] = ''

    time.sleep(2)
    try:
        p1=driver.find_element(By.XPATH, '//span[@class="a-price-whole"]').text
        p2= driver.find_element(By.XPATH, '//span[@class="a-price-fraction"]').text
        item['price']=p1+p2
    except:
        item['price']=''

    product.append(item)

df=pd.DataFrame(product)
df.to_csv("ama.csv")
uklbhaso

uklbhaso1#

我认为您需要在for循环中定义item=dict()。否则,这是在所有循环迭代中使用的相同的单个item对象。
试试这个:

from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support.select import Select
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

# url='https://www.amazon.com/dp/B00M0DWQYI?th=1'

# url='https://www.amazon.com/dp/B010RWD4GM?th=1'

PATH="C:\Program Files (x86)\chromedriver.exe"
driver =webdriver.Chrome(PATH)
df_urls = pd.read_csv('D:/selenium/inputs/amazone-asin.csv',encoding='utf-8')
list_dicts_urls =df_urls.to_dict('records')

product=[]
for url in list_dicts_urls:

    item=dict()

    product_url = 'https://' + url['MARKETPLACE'] + '/dp/' + url['ASIN']
    driver.get(product_url)

    try:
        item['title'] = driver.find_element(By.CSS_SELECTOR,'span#productTitle').text
    except:
        item['title'] = ''

    try:
        item['brand'] = driver.find_element(By.CSS_SELECTOR,'a#bylineInfo').text.replace('Visit the','').replace('Store','').strip()
    except:
        item['brand'] = ''
    try:
        rating = driver.find_element(By.CSS_SELECTOR,'span#acrCustomerReviewText').text.replace('ratings','').strip()
        rating = int(rating.replace(',', ''))
        item['rating'] = rating
    except:
        item['rating'] = ''

    time.sleep(2)
    try:
        p1=driver.find_element(By.XPATH, '//span[@class="a-price-whole"]').text
        p2= driver.find_element(By.XPATH, '//span[@class="a-price-fraction"]').text
        item['price']=p1+p2
    except:
        item['price']=''

    product.append(item)

df=pd.DataFrame(product)
df.to_csv("ama.csv")

相关问题