如何在更改语言时不更改URL的网站上使用Scrapy

w8f9ii69  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(137)

据我所知,当语言按钮被按下时,这个网站https://www.learnit.nl/通过向https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1发送POST请求来获取英语版本,我不知道如何用Scrapy复制。我将感谢任何帮助。

waxmsbnn

waxmsbnn1#

数据是在API调用json时用post方法响应的,其中payload是一个很大的json,如何用Scrapy进行复制,你可以按照下一个例子:

import json
import scrapy

class CourseSpider(scrapy.Spider):

    name = 'course'
    body = add payload here

    def start_requests(self):
        yield scrapy.Request(
            url='https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1',
            callback=self.parse,
            body=json.dumps(self.body),
            method="POST",
            headers={

            }
        )

    def parse(self, response):
        response = json.loads(response.body)

        for resp in response['to_words']:
            yield {
                'course': resp
                }

输出:

{'course': 'Writing clear texts'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML e-mail'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Basics'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Continued'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML Training E-learning'}

 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 1.879555,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 4, 28, 16, 3, 22, 536326),
 'httpcompression/response_bytes': 36269,
 'httpcompression/response_count': 1,
 'item_scraped_count': 514,

...等等
由于有效负载是一个大的json,不能在这里发布为超限。完整的工作代码here

相关问题