使用scrapy从网站上抓取数据

t5zmwmid 于 2023-04-12 发布在其他

关注(0)|答案(2)|浏览(205)

我试图从一个网站刮数据使用scrapy。这是css路径

<div _ngcontent-amb-c25="" appcoloredmultiplier="" class="bubble-multiplier font-weight-bold" style="padding: 2px 11px; border-radius: 11px; color: rgb(52, 180, 255);"> 1.21x </div>

但是我想提取标签之间的数据，这是1.21x，我如何更新我的代码来提取我所说的数据。

def parse(self, response):
# Extract game history data from the webpage
game_history_elements = response.css('div.bubble-multiplier')

# Extract the multiplier value from each game history element
game_history = [re.search(r'(\d+\.\d+)x', element.css('::text').get()).group(1) for element in game_history_elements]

# Print the game history data
print(game_history)

scrapy

来源：https://stackoverflow.com/questions/75966476/scraping-data-from-a-website-using-scrapy

2条答案

按热度按时间

jtjikinw1#

正如在注解中提到的，您可以使用xpath表达式中的::text css指令获取标记之间的文本，然后在选择器上应用get或getall方法。
如果类bubble-multiplier中有多个div，并且您需要每个div的文本，则可以使用getall()，另一方面，如果只有一个匹配元素，或者您只需要第一个，则可以使用getall()。

def parse(self, response):
    game_history = response.css('div.bubble-multiplier::text').get()
    print(game_history)

或

def parse(self, response):
    game_history = response.css('div.bubble-multiplier::text').getall()
    print(game_history)

当只有一个匹配时，仍然可以使用getall，唯一的区别是返回值将是一个list，只有一个字符串作为内容。

赞(0）回复(0）举报 2023-04-12

e5nqia272#

def parse(self, response):
    # Extract game history data from the webpage
    game_history_elements = response.css('div.bubble-multiplier::text')

    # Extract the multiplier value from each game history element
    game_history = [re.search(r'(\d+\.\d+)x', element.get()).group(1) for element in game_history_elements]

    # Print the game history data
    print(game_history)

试试这个

赞(0）回复(0）举报 2023-04-12

我来回答

使用scrapy从网站上抓取数据

2条答案

相关问题

热门标签

最新问答