我正试图从亚马逊网站提取一本书的描述。注意:我正在使用Scrapy spider:这是亚马逊书的链接:https://www.amazon.com/Local-Woman-Missing-Mary-Kubica/dp/1665068671
这是包含内部说明文本的div:
<div aria-expanded="true" class="a-expander-content a-expander-partial-collapse-content
a-expander-content-expanded" style="padding-bottom: 20px;"> <p><span class="a-text-
bold">MP3 CD Format</span></p><p><span class="a-text-bold">People don’t just disappear
without a trace…</span></p><p class="a-text-bold"><span class="a-text-bold">Shelby Tebow
is the first to go missing. Not long after, Meredith Dickey and her six-year-old
daughter, Delilah, vanish just blocks away from where Shelby was last seen, striking
fear into their once-peaceful community. Are these incidents connected? After an elusive
search that yields more questions than answers, the case eventually goes cold.</span>
</p><p class="a-text-bold"><span class="a-text-bold">Now, eleven years later, Delilah
shockingly returns. Everyone wants to know what happened to her, but no one is prepared
for what they’ll find…</span></p><p class="a-text-bold"><span class="a-text-bold">In
this smart and chilling thriller, master of suspense and New York Times bestselling
author Mary Kubica takes domestic secrets to a whole new level, showing that some people
will stop at nothing to keep the truth buried.</span></p><p></p> </div>
其实我试过这句话
div = response.css(".a-expander-content.a-expander-partial-collapse-content.a-expander-content-expanded")
description = " ".join([re.sub('<.*?>', '', span) for span in response.css('.a-expander-content span').extract()])
它没有按预期工作。请如果你有任何想法分享它在这里。提前感谢
1条答案
按热度按时间zour9fqk1#
下面是代码: