python 如何在youtube视频上像计算评论一样爬行？

3okqufwl 于 2022-12-10 发布在 Python

关注(0)|答案(3)|浏览(131)

我正在尝试用selenium和BeautifulSoup对youtube上的某个视频进行评论（我没有尝试使用Youtube Data API，因为有限制）。
我差点就做到了，但我本可以只用评论和id就得到结果的...
我检查了包含像计数信息的空间，然后我把它放到我的代码中，它运行得很好，但它没有检索到结果，它给我什么都没有......... idk为什么......

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd 
import re
from collections import Counter
from konlpy.tag import Twitter

options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path='C:\chrome\chromedriver_win32\chromedriver.exe', options=options)
url = 'https://www.youtube.com/watch?v=D4pxIxGdR_M&t=2s'
driver.get(url)
driver.implicitly_wait(10)

SCROLL_PAUSE_TIME = 3

# Get scroll height
last_height = driver.execute_script("return document.documentElement.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

html_source = driver.page_source

driver.close()

soup = BeautifulSoup(html_source, 'lxml')

ids = soup.select('div#header-author > a > span')

comments = soup.select('div#content > yt-formatted-string#content-text')

likes = soup.select('ytd-comment-action-buttons-renderer#action-buttos > div#tollbar > span#vote-count-middle')

print('ID :', len(ids), 'Comments : ', len(comments), 'Likes : ' ,len(likes))

而0只是打印出来...我已经搜索了一些处理它的方法，但大部分的答案只是让我使用API。

python

来源：https://stackoverflow.com/questions/65381693/how-to-crawl-like-counts-of-comments-on-youtube-video

3条答案

按热度按时间

qlfbtfca1#

实际上，我不会使用BeautifulSoup进行提取，只需使用内置的 selenium 工具，即：

ids = driver.find_elements_by_xpath('//*[@id="author-text"]/span')
comments = driver.find_elements_by_xpath('//*[@id="content-text"]')
likes = driver.find_elements_by_xpath('//*[@id="vote-count-middle"]')

这样你仍然可以使用len()，因为它们是可迭代的。你也可以迭代变量likes，得到.text的值，把它们加在一起：

total_likes = 0
for like in likes:
    total_likes += int(like.text)

为了让这个更像Python，你还不如用一个适当的列表理解。

赞(0）回复(0）举报 2022-12-10

3pmvbmvn2#

不确定是否完全符合要求，但是像socialbalde和moreofit这样的网站可以结合起来获得一个总体的概述，可能是一个更好的爬行起点，这取决于你的目的。我个人发现这是有用的。

赞(0）回复(0）举报 2022-12-10

dgtucam13#

因为youtube隐藏了自己的评论数据，所以在对视频URL抓取的响应中没有数据。

赞(0）回复(0）举报 2022-12-10

我来回答

python 如何在youtube视频上像计算评论一样爬行？

3条答案

相关问题

热门标签

最新问答