如何获得某些元素的文本BeautifulSoup Python

brvekthn  于 2023-02-01  发布在  Python
关注(0)|答案(1)|浏览(132)

我有这样的html代码

<tr>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>
        Name Name Name
      </sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>25.01.1980</sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
</tr>
<tr>...</tr>
<tr>...</tr>

我需要获取每个tr的第3个和第5个td的文本
显然这是行不通的:)

from bs4 import BeautifulSoup
import index

soup = BeautifulSoup(index.index_doc, 'lxml')

for i in soup.find_all('tr')[2:]:
    print(i[2].text, i[4].text)
umuewwlo

umuewwlo1#

可以使用css selectors和伪类:nth-of-type()来选择元素(假设您需要日期,所以我选择了第6个td):

data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]

要获得tuples的列表:

list(zip(data, data[1:]))
示例
from bs4 import BeautifulSoup

html = '''
<tr>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>
        Name Name Name
      </sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>25.01.1980</sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
</tr>
<tr>...</tr>
<tr>...</tr>
'''
soup = BeautifulSoup(html)

data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]

list(zip(data, data[1:]))

相关问题