scrapy 如何抓取标签属性的值?[已关闭]

l2osamch  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(159)

**已关闭。**此问题需要debugging details。当前不接受答案。

编辑问题以包含desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem。这将有助于其他人回答问题。
两个月前关门了。
Improve this question
我正在尝试提取位于此标记的标签区域中的数据,但我不太确定如何提取它。我是Web抓取的新手,我在shell中使用Scrapy

<object id="svg_chart" class="map-svg uk-animation-fade" type="image/svg+xml" data="https://services.dat.com/svg/graph.svg?showAllLabels=true&amp;vgrid=true&amp;lineWidth=4&amp;op_min=75&amp;minYValue=0&amp;maxYValue=10&amp;yStep=2&amp;title=Van%20Load-to-Truck%20Ratio&amp;labels=Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec&amp;colors=_F89929,_57CAF1,_B686BB&amp;ds_3=2020|2.23,1.84,2.89,0.98,1.91,3.52,4.40,5.31,5.45,4.33,4.49,4.84&amp;ds_2=2021|4.27,7.54,5.78,4.79,6.12,5.56,5.81,6.46,6.32,5.59,5.19,6.54&amp;ds_1=2022|9.34,7.33,4.57,3.42,4.39,3.88,3.84,3.54,null,null,null,null&amp;copyright=%40%202022%20DAT%20Freight%20%26%20Analytics" width="" height=""></object>

我能够将主标签提取到响应中,我已经尝试使用.get().extract()来获取数据。我还没有尝试.re(),所以如果这是答案,那么很抱歉。

mzillmmw

mzillmmw1#

尝试bs4

from bs4 import BeautifulSoup
    import lxml

    mystr = '<object id="svg_chart" class="map-svg uk-animation-fade" type="image/svg+xml" data="https://services.dat.com/svg/graph.svg?showAllLabels=true&amp;vgrid=true&amp;lineWidth=4&amp;op_min=75&amp;minYValue=0&amp;maxYValue=10&amp;yStep=2&amp;title=Van%20Load-to-Truck%20Ratio&amp;labels=Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec&amp;colors=_F89929,_57CAF1,_B686BB&amp;ds_3=2020|2.23,1.84,2.89,0.98,1.91,3.52,4.40,5.31,5.45,4.33,4.49,4.84&amp;ds_2=2021|4.27,7.54,5.78,4.79,6.12,5.56,5.81,6.46,6.32,5.59,5.19,6.54&amp;ds_1=2022|9.34,7.33,4.57,3.42,4.39,3.88,3.84,3.54,null,null,null,null&amp;copyright=%40%202022%20DAT%20Freight%20%26%20Analytics" width="" height=""></object>'
    soup = BeautifulSoup(mystr, 'lxml')

    obj = soup.find('object')

    print(obj['id'])
    print(obj['class'])
    print(obj['type'])
    print(obj['data'])

...结果

'''
svg_chart
['map-svg', 'uk-animation-fade']
image/svg+xml
https://services.dat.com/svg/graph.svg?showAllLabels=true&vgrid=true&lineWidth=4&op_min=75&minYValue=0&maxYValue=10&yStep=2&title=Van%20Load-to-Truck%20Ratio&labels=Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec&colors=_F89929,_57CAF1,_B686BB&ds_3=2020|2.23,1.84,2.89,0.98,1.91,3.52,4.40,5.31,5.45,4.33,4.49,4.84&ds_2=2021|4.27,7.54,5.78,4.79,6.12,5.56,5.81,6.46,6.32,5.59,5.19,6.54&ds_1=2022|9.34,7.33,4.57,3.42,4.39,3.88,3.84,3.54,null,null,null,null&copyright=%40%202022%20DAT%20Freight%20%26%20Analytics
'''

这是因为obj.attrs是一个字典,您可以对它执行任何操作:

print(type(obj.attrs))

# result

# <class 'dict'>

# You can get all of the attrs like this too

for k, v in obj.attrs.items():
    print(k + ' == ' + str(v))

# result

'''
id == svg_chart
class == ['map-svg', 'uk-animation-fade']
type == image/svg+xml
data == https://services.dat.com/svg/graph.svg?showAllLabels=true&vgrid=true&lineWidth=4&op_min=75&minYValue=0&maxYValue=10&yStep=2&title=Van%20Load-to-Truck%20Ratio&labels=Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec&colors=_F89929,_57CAF1,_B686BB&ds_3=2020|2.23,1.84,2.89,0.98,1.91,3.52,4.40,5.31,5.45,4.33,4.49,4.84&ds_2=2021|4.27,7.54,5.78,4.79,6.12,5.56,5.81,6.46,6.32,5.59,5.19,6.54&ds_1=2022|9.34,7.33,4.57,3.42,4.39,3.88,3.84,3.54,null,null,null,null&copyright=%40%202022%20DAT%20Freight%20%26%20Analytics
width ==
height ==
'''

相关问题