scrapy 破烂的贝壳与剧作家

yqlxgs2m  于 2022-11-09  发布在  其他
关注(0)|答案(2)|浏览(222)

有没有可能在Scrapy shell中调用剧作家?
我想使用一个shell来测试我的xpath,我打算将它放在一个包含Scrapy Playwright的spider中。
我的scrapy设置文件具有通常的剧作家设置:


# Scrapy Playwright Setup

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
xnifntxz

xnifntxz1#

是的,这是可能的。事实上,你所要做的就是在一个包含scrapy项目的文件夹中运行scrapy shell。它会自动加载settings.py中的所有默认设置。你可以在运行scrapy shell时的日志中看到它。
此外,您还可以使用-s参数覆盖设置。

scrapy shell -s DOWNLOAD_HANDLERS='<<your custom handlers>>'

快乐刮痧:)

oprakyz7

oprakyz72#

I believe the shell command might not be possible to do with scrapy playwright. Here i am using python3 as demonstration:
This documentation link should help you further: https://playwright.dev/python/docs/intro#interactive-mode-repl
I believe instead of shell you just need python3 or python3 in interactive mode. This way you have auto complete which the scrapy shell never did.
Here is the synchronous example in a file called spider_interactive.py:

from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
browser = playwright.firefox.launch()
page = browser.new_page()
page.goto("http://whatsmyuseragent.org/")

# Remember to run these manually when your done to prevent left over garbage on the machine.

# browser.close()

# playwright.stop()

Run with:
python3 -i spider_interactive.py
Then you can enter for example the following command:

page.locator("p.intro-text").all_inner_texts()

returns
['Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0', 'My IP Address: your_ip_address_here]

相关问题