如何将python/beautiful soup中的web抓取数据导入mysql数据库

5q4ezhmt  于 2021-06-25  发布在  Mysql
关注(0)|答案(1)|浏览(408)

虽然我在python中得到了10个以上的结果,但现在我只能得到mysql数据库中出现的最后一个产品(id为12以及价格、图片等信息)。我需要修复它,使他们都出现,而不仅仅是一个产品。
python代码如下。

import requests
from bs4 import BeautifulSoup
import mysql.connector

url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'

source = requests.get(url).text

soup = BeautifulSoup(source, 'lxml')

conn = mysql.connector.connect(host='127.0.0.1', user='x', database='scrape',password="x")                         
cursor = conn.cursor()  

item_container = soup.find_all('div', class_='item-container')

def get_data():
lists = []
    for index, item_name in enumerate(item_container):
        name = item_name.find_all('a', class_='item-title')[0].text
        lists.append({'name': name})
        lists[index]['index'] = index

    for index, item_price in enumerate(item_container):
        price = item_price.find('li', class_='price-current').find('strong')
        if price == None:

            price == ('Not Available')
            lists[index]['price'] = price

        else:

            price = ('$' + price.text +'.99')
            prices = []
        lists[index]['price'] = price

    for index, item_picture in enumerate(item_container):
            picture = 'http:' + item_picture.find('img', class_='lazy-img')['data-src']

            lists[index]['picture'] = picture

    for index, item_shipping in enumerate(item_container):
            shipping = (item_shipping.find('li', class_='price-ship').text).strip()
            lists[index]['shipping'] = shipping

def create_table():

    val_index = lists[index]['index']
    val_name = lists[index]['name']
    val_picture = lists[index]['picture']
    val_price = lists[index]['price']
    val_shipping = lists[index]['shipping']

    add_item = ("INSERT INTO newegg "
                "(id, itemname, itempic, itemprice, itemshipping) "
                "VALUES (%s, %s, %s, %s, %s)")

    data_item = (val_index, val_name, val_picture, val_price, val_shipping)

    cursor.execute("DELETE FROM newegg ")
    conn.commit()
    cursor.execute(add_item, data_item)  
    conn.commit()

    cursor.close() 
    conn.close()                                                                  

create_table();
get_data()
vmjh9lq9

vmjh9lq91#

所以需要解决的主要问题是 create_table() . 我们不希望它在插入项之前删除数据库内容。此外,我们需要循环检查您的系统中的所有项目 lists . 我会这样做的。

def create_table():
    cursor.execute("DELETE FROM newegg ")
    conn.commit()

    for product in lists:
        val_index = product['index']
        val_name = product['name']
        val_picture = product['picture']
        val_price = product['price']
        val_shipping = product['shipping']

        add_item = ("INSERT INTO newegg "
                    "(id, itemname, itempic, itemprice, itemshipping) "
                    "VALUES (%s, %s, %s, %s, %s)")

        data_item = (val_index, val_name, val_picture, val_price, val_shipping)

        cursor.execute(add_item, data_item)
        conn.commit()

注意, create_table() 也不再为您关闭连接。我建议在初始化连接的同一作用域(在本例中是全局作用域)中关闭连接。功能 create_table() 不“拥有”连接资源,因此不应允许它破坏它。尽管初始化和破坏函数内部的连接是非常有意义的。
此外,请注意,这将清除您的表每次你刮。这可能很好,但如果你想改变主意 id 随着时间的推移,不要从一开始就删除,然后获取你的 id 列自动递增之类的。

相关问题