我想在每一个标题或段落结束后使用python中的scrapy换行

3pvhb19x  于 2022-11-09  发布在  Python
关注(0)|答案(1)|浏览(123)

我想在每一个标题或段落完成后使用python中的scrapy来换行,而不换行,这很难阅读或理解哪个是标题或哪个是段落
这是我的密码

class NewsSpider(scrapy.Spider):
    name = "travelandleisure"

    def start_requests(self):
        url = input("Enter the article url: ")

        yield scrapy.Request(url, callback=self.parse_dir_contents)

    def parse_dir_contents(self, response):
        Content = ''.join(response.xpath('//*[@id="mntl-sc-page_1-0"]//text()').getall()).strip()
    yield{
        'Article Content': Content
    }

它给出这样的输出

The last to years may not have brought an immediate end to the coronavirus pandemic, but it brought a renewed sense of hope when it comes to traveling. And many Americans are taking advantage of that feeling, looking ahead, and planning their next vacations.\n\n\n\nThe options for those who want to add another stamp to their passport have steadily grown since the world was first put on hold last year — albeit often with more paperwork, testing, and pre-planning required. Now, foreign national air travelers to the United States will be required to be fully vaccinated and 
to provide proof of vaccination status and a negative test prior to boarding an airplane to the United States. The United States\' new international air travel policy, replaces the existing country-by-country restrictions, putting in place a consistent approach worldwide.\n\n\n\nThose who fly back to the United States will also be required to show a negative test before boarding a flight home. To provide even greater peace of mind to travelers, many airlines and airports have started offering on-site rapid COVID-19 tests.\n\n\n\nBelow is a list of countries currently accepting American travelers along with each destination\'s travel protocol and their advisory level determined by the State Department. Countries that are accepting American travelers but require visitors 
to quarantine for two weeks upon arrival are also listed separately.\n\n\n  Albania  \n\n\n\n\n\n\n\n\n\n\nA woman wearing a face mask, walks in Tirana\'s main square.\nGENT SHKULLAKU/AFP via Getty Images\n\n\n\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\n\n\nVaccinated U.S. citizens are allowed to enter Albania without showing any test results or being required to quarantine, according to the U.S. Embassy in Albania.\n\n\n  Anguilla  \n\nLevel 2: Moderate Levels of COVID-19 Transmission (CDC) \n\n\n\nTravelers to Anguilla must be vaccinated and submit a negative COVID-19 test within two days of arrival. Guests who have not received their booster must also test upon arrival.\n\n\n\nGuests staying for more than eight days may need to test again on day four.\n\n\n  Antigua and Barbuda  \n\nLevel 3: High levels of COVID-19 Transmission (CDC)\n\n\n\nVaccinated travelers no longer need to test prior to travel to Antigua and Barbuda. Unvaccinated travelers must present a negative PCR test within three days or an antigen test within 24 hours.\n\n\n  Argentina  \n\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\n\n\nEligible travelers to Argentina must have received a completed vaccination at least 14 days before entering and will also have to fill out an "Affidavit of 
Migration" and show proof they have insurance that covers COVID-19.\n\n\n\n\n\n\n  Armenia  \n\nLevel 1: Low Levels of COVID-19 Transmission (CDC)\n\n\n\nAmericans can enter Armenia by air and must either arrive with proof of vaccination, negative PCR COVID-19 test taken within 72 hours before arrival or get tested upon arrival at the airport, which will also require self isolation until there id a negative result. according to the U.S. Embassy in Armenia. Children under 6 are exempt from testing.\n\n\n  Australia 
 \n\nLevel 3: High Level of COVID-19 Transmission (CDC)\n\n\n\nAustralia is open to fully vaccinated passengers. Upon arrival, all passengers will need to present a Digital Passenger Declaration within 72 hours of departure for Australia.\n\n\n\n\n\n\n\n\n\n\n\n\nCourtesy of Aruba Tourism Authority\n\n\n  Aruba  \n\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\n\n\nTravelers to Aruba no longer are required to be vaccinated or present a negative Covid-19 test. Passengers will be required to purchase Aruba Visitor Insurance and complete an embarkation form prior to arrival.\n\n\n  Bahamas  \n\nLevel 3: High Level of Transmission (CDC)\n\n\n\nTravelers to the Bahamas can skip the islands\' mandatory quarantine if they test negative for COVID-19 within three days before their departure, along with applying for a Bahamas Health Travel Visa after their test. Unvaccinated travelers must take a molecular test, while vaccinated travelers have the choice between taking a rapid test or a molecular test.\n\n\n\nVisitors are then required to opt-in to mandatory COVID-19 health insurance when applying for their Health Travel Visa.\n\n\n  Barbados  \n\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\n\n\nBarbados requires travelers to show proof of a negative COVID-19 PCR test taken within three days of their arrival to enter, or a rapid PCR test within one day of travel, according to the Barbados tourism website. Unvaccinated travelers must quarantine for at least five days before taking a second PCR test. Upon return to the United States outbound travelers will be required to pay $100 USD per test.\n\n\n\nTravelers must complete an immigration form and download the BIMSafe app, which public health teams will use to check-in. Travelers must also monitor their temperature for seven days after arrival.\n\n\n\nMask wearing is required in public spaces.\n\n\n\nBarbados is also welcoming visitors to move to the island for a year for the ultimate remote work experience.\n\n\n  Bahrain  \n\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\n\n\nPassengers must download the BeAware Bahrain app. Bahrain no longer requires testing or proof of vaccination.\n\n\n  Belize  \n\nLevel 3: High Level of Transmission (CDC)\n\n\n\nVaccinated travelers to Belize will no longer need a negative test to enter. Unvaccinated visitors aged 5 and older will need a negative test to enter. All tourists will need proof of country health insurance  and will need to stay in a government approved accommodation.\n\n\n\nTesting is available in the airport for $50 cash only.\n\n\n\n\n\n\n\n\n\n\n\n\nStonehole Bay in Bermuda.\nBermuda Tourism Authority\n\n\n  Bermuda  \n\nLevel 3: High COVID-19 Transmission (CDC)\n\n\n\nBermuda will require all visitors to show proof of current vaccination status and a negative COVID-19 test result (both antigen or PCR tests are allowed) within two days of arriving on the island, according to the Bermuda Tourism Authority.

这是URL https://www.travelandleisure.com/travel-news/where-can-americans-travel-right-now-a-country-by-country-guide

23c0lvtd

23c0lvtd1#

您可以使用字典将文章分成几个部分,将标题作为关键字,将段落作为内容。这样就可以更容易地确定哪个是标题,哪个文本属于哪个文章。
例如:

import scrapy

class NewsSpider(scrapy.Spider):
    name = "travelandleisure"

    def start_requests(self):
        url = "https://www.travelandleisure.com/travel-news/where-can-americans-travel-right-now-a-country-by-country-guide"

        yield scrapy.Request(url, callback=self.parse_dir_contents)

    def parse_dir_contents(self, response):
        sections = {}
        header,paragraphs  = "main", []
        for element in response.xpath('//*[@id="mntl-sc-page_1-0"]/*'):
            tag = element.re(r"<(\w+)\s")  # get the tag name
            # if its a paragraph add it to the paragraph list
            if tag[0] == "p":              
                paragraphs += element.xpath(".//text()").getall()
            # if it's a heading place the heading and paragraphs in the
            # dictionary and start a new heading with the current text.
            elif tag[0] == "h3":
                sections[header] = ''.join(paragraphs)
                header = ' '.join(element.xpath(".//text()").getall()).strip()
                paragraphs = []
        yield sections  # yield all sections

JSON输出:

[
  {
    "main": "\nThe last to years may not have brought an immediate end to the coronavirus pandemic, but it brought a renewed sense of hope when it comes to traveling. And many Americans are taking advantage of that feeling, looking ahead, and planning their next vacations.\n\nThe options for those who want to add another stamp to their passport have steadily grown since the world was first put on hold last year \u2014 albeit often with more paperwork, testing, and pre-planning required. Now, foreign national air travelers to the United States will be required to be fully vaccinated and to provide proof of vaccination status and a negative test prior to boarding an airplane to the United States. The United States' new international air travel policy, replaces the existing country-by-country restrictions, putting in place a consistent approach worldwide.\n\nThose who fly back to the United States will also be required to show a negative test before boarding a flight home. To provide even greater peace of mind to travelers, many airlines and airports have started offering on-site rapid COVID-19 tests.\n\nBelow is a list of countries currently accepting American travelers along with each destination's travel protocol and their advisory level determined by the State Department. Countries that are accepting American travelers but require visitors to quarantine for two weeks upon arrival are also listed separately.\n",
    "Albania": "\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\nVaccinated U.S. citizens are allowed to enter Albania without showing any test results or being required to quarantine, according to the U.S. Embassy in Albania.\n",
    "Anguilla": "\nLevel 2: Moderate Levels of COVID-19 Transmission (CDC) \n\nTravelers to Anguilla must be vaccinated and submit a negative COVID-19 test within two days of arrival. Guests who have not received their booster must also test upon arrival.\n\nGuests staying for more than eight days may need to test again on day four.\n",
    "Antigua and Barbuda": "\nLevel 3: High levels of COVID-19 Transmission (CDC)\n\nVaccinated travelers no longer need to test prior to travel to Antigua and Barbuda. Unvaccinated travelers must present a negative PCR test within three days or an antigen test within 24 hours.\n",
    "Argentina": "\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\nEligible travelers to Argentina must have received a completed vaccination at least 14 days before entering and will also have to fill out an \"Affidavit of Migration\" and show proof they have insurance that covers COVID-19.\n",
    "Armenia": "\nLevel 1: Low Levels of COVID-19 Transmission (CDC)\n\nAmericans can enter Armenia by air and must either arrive with proof of vaccination, negative PCR COVID-19 test taken within 72 hours before arrival or get tested upon arrival at the airport, which will also require self isolation until there id a negative result. according to the U.S. Embassy in Armenia. Children under 6 are exempt from testing.\n",
    "Australia": "\nLevel 3: High Level of COVID-19 Transmission (CDC)\n\nAustralia is open to fully vaccinated passengers. Upon arrival, all passengers will need to present a Digital Passenger Declaration within 72 hours of departure for Australia.\n",
    "Aruba": "\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\nTravelers to Aruba no longer are required to be vaccinated or present a negative Covid-19 test. Passengers will be required to purchase Aruba Visitor Insurance and complete an embarkation form prior to arrival.\n",
    "Bahamas": "\nLevel 3: High Level of Transmission (CDC)\n\nTravelers to the Bahamas can skip the islands' mandatory quarantine if they test negative for COVID-19 within three days before their departure, along with applying for a Bahamas Health Travel Visa after their test. Unvaccinated travelers must take a molecular test, while vaccinated travelers have the choice between taking a rapid test or a molecular test.\n\nVisitors are then required to opt-in to mandatory COVID-19 health insurance when applying for their Health Travel Visa.\n",
    "Barbados": "\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\nBarbados requires travelers to show proof of a negative COVID-19 PCR test taken within three days of their arrival to enter, or a rapid PCR test within one day of travel, according to the Barbados tourism website. Unvaccinated travelers must quarantine for at least five days before taking a second PCR test. Upon return to the United States outbound travelers will be required to pay $100 USD per test.\n\nTravelers must complete an immigration form and download the BIMSafe app, which public health teams will use to check-in. Travelers must also monitor their temperature for seven days after arrival.\n\nMask wearing is required in public spaces.\n\nBarbados is also welcoming visitors to move to the island for a year for the ultimate remote work experience.\n",
    "Bahrain": "\nLevel 3: High Levels of COVID-19 Transmission (CDC)\n\nPassengers must download the BeAware Bahrain app. Bahrain no longer requires testing or proof of vaccination.\n",
    "Belize": "\nLevel 3: High Level of Transmission (CDC)\n\nVaccinated travelers to Belize will no longer need a negative test to enter. Unvaccinated visitors aged 5 and older will need a negative test to enter. All tourists will need proof of country health insurance  and will need to stay in a government approved accommodation.\n\nTesting is available in the airport for $50 cash only.\n",
    "Bermuda": "\nLevel 3: High COVID-19 Transmission (CDC)\n\nBermuda will require all visitors to show proof of current vaccination status and a negative COVID-19 test result (both antigen or PCR tests are allowed) within two days of arriving on the island, according to the Bermuda Tourism Authority. Travelers will need to complete an authorization form with this information 24 \u2013 48 hours prior to arrival. Up-to-date vaccination status is defined as having received a second dose within six months, or three doses of the vaccine.\n\nNo further testing will be required upon arrival. If the country origin requires a negative test to reenter, Bermuda will automatically schedule the test for visitors.\n",
...
...
}]

相关问题