Python中类似jquery的HTML解析?

bt1cpqcv  于 12个月前  发布在  jQuery
关注(0)|答案(4)|浏览(156)

Python中有没有什么方法可以让我像jQuery那样解析HTML文档?
也就是说,我希望能够使用CSS选择器语法从文档中抓取任意一组节点,读取它们的内容/属性等。

2j4z5cfb

2j4z5cfb1#

如果您熟悉BeautifulSoup,则可以将soupselect添加到库中。
Soupselect是BeautifulSoup的CSS选择器扩展。
使用方法:

from bs4 import BeautifulSoup as Soup
from soupselect import select
import urllib
soup = Soup(urllib.urlopen('http://slashdot.org/'))
select(soup, 'div.title h3')

个字符

vom3gejh

vom3gejh2#

考虑PyQuery:
http://packages.python.org/pyquery/

>>> from pyquery import PyQuery as pq
>>> from lxml import etree
>>> import urllib
>>> d = pq("<html></html>")
>>> d = pq(etree.fromstring("<html></html>"))
>>> d = pq(url='http://google.com/')
>>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read())
>>> d = pq(filename=path_to_html_file)
>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> p.html()
'Hello world !'
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> p.html()
u'you know <a href="http://python.org/">Python</a> rocks'
>>> p.text()
'you know Python rocks'

字符串

rvpgvaaj

rvpgvaaj4#

BeautifulSoup,支持**css selectors**

import requests
from bs4 import BeautifulSoup as Soup
html = requests.get('https://stackoverflow.com/questions/3051295').content
soup = Soup(html)

字符串

  • 此 * 问题的标题
soup.select('h1.grid--cell :first-child')[0].text


问题赞成数

# first item 
soup.select_one('[itemprop="upvoteCount"]').text

相关问题