html Python / BeautifulSoup在< option>

00jrzges  于 2023-06-20  发布在  Python
关注(0)|答案(2)|浏览(104)

我是一个python / beautifulsoup新手。
我正在尝试获取标签中的属性值。下面是HTML片段。具体来说,我试图从第一个“data-inventory-quantity”中检索值(在本例中为60)。

import requests
import bs4
import lxml
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
import csv

def getTitle(soup):
    return soup.find('title').text

def getInventory(soup):
  

def getPrice(soup):
    return soup.find("meta", {"property" : "og:price:amount"}).attrs['content']

urlList = []

with open('output.csv', 'w', newline='')  as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Title', 'Inventory', 'Price'])
    
    for url in urlList:
        try:
            html = urlopen(url)
        except HTTPError as e:
            print(e)
        except URLError:
            print("error")
        else:
            soup = bs4.BeautifulSoup(html.read(), 'html.parser')
            row = [getTitle(soup),  getInventory(soup), getPrice(soup)]
            print(row)
            csv_output.writerow(row)

然而,由于我需要对多个URL运行此操作,每个URL都有一个唯一的“值”,因此我无法弄清楚如何编辑代码,以便不需要使用此特定选项“值”。我试过汤。找到一个更高级别的标签,例如。“soup.find('select',id = 'variant-listbox')['data-inventory-quantity']”但这给了我一个“KeyError:'data-inventory-quantity'"。当这个选项标记中的所有其他属性值对于每个URL都不同时,有没有办法找到data-inventory-quantity?
HTML:

<option
                data-sku=""

                selected="selected"  value="40323576791107"

                  data-inventory-quantity="60"

              >
                Regular - $75.00
              </option>

              <option
                data-sku=""

                 value="40323576823875"

                  data-inventory-quantity="4"

              >
                Variant - $100.00
              </option>

          </select>
        </div>'''
vi4fp9gy

vi4fp9gy1#

尝试:

from bs4 import BeautifulSoup

html_doc = '''\
 <div class="variants ">
              <select id="variant-listbox" name="id" class="medium">

                  <option
                    data-sku=""

                    selected="selected"  value="40323576791107"

                      data-inventory-quantity="60"

                  >
                    Regular - $75.00
                  </option>

                  <option
                    data-sku=""

                     value="40323576823875"

                      data-inventory-quantity="4"

                  >
                    Variant - $100.00
                  </option>

              </select>
            </div>'''

soup = BeautifulSoup(html_doc, 'html.parser')

o = soup.select_one('option[data-inventory-quantity]')
print(o['data-inventory-quantity'])

图纸:

60

如果要选择 selected 选项:

o = soup.select_one('option[data-inventory-quantity][selected]')
print(o['data-inventory-quantity'])

编辑:要具有getInventory(soup)功能:

def getInventory(soup):
    o = soup.select_one('option[data-inventory-quantity]')
    return o['data-inventory-quantity']
axr492tv

axr492tv2#

我更喜欢在通过Bs4进行解析时使用find_all_next作为get子标签。通过名称找到每个元素,并从data-inventory-quantity参数中获取值。贝娄密码

import bs4

code = ''' <div class="variants ">
              <select id="variant-listbox" name="id" class="medium">
                
                  <option
                    data-sku=""
                    
                    selected="selected"  value="40323576791107"
                    
                      data-inventory-quantity="60"
                    
                  >
                    Regular - $75.00
                  </option>
                
                  <option
                    data-sku=""
                    
                     value="40323576823875"
                    
                      data-inventory-quantity="4"
                    
                  >
                    Variant - $100.00
                  </option>
                
              </select>
            </div>'''
soup = bs4.BeautifulSoup(code, 'html.parser')
print(soup.find_all('div')[0].find_all_next('select')[0].find_all_next('option',
                                                                       {'selected': 'selected'})[0].get('data-inventory-quantity'))

相关问题