在Scrapy中嵌套解析器

vxbzzdmp  于 2022-11-23  发布在  其他
关注(0)|答案(1)|浏览(105)

我正在使用Python中的Scrapy框架从this page中抓取数据。我想创建一个单独的蜘蛛,它将首先跟踪到六个图库的链接,然后从每个页面抓取一些数据,并跟踪每个页面中的链接到 Read the Curators' Statement;我想从该页中抓取语句的文本。解析器应该如何嵌套才能完成这项任务?

import scrapy

class GalleriesSpider(scrapy.Spider):
    name = "galleries"
    start_urls = ['https://www.exploratorium.edu/visit/galleries']

    def parse(self, response):
        galleries_page_links = response.xpath('//h2[text()="Museum Galleries"]/following-sibling::div//h5/a/@href')
        yield from response.follow_all(galleries_page_links, self.parse_gallery)

    def parse_gallery(self, response):
        def extract(query):
            return response.xpath(query).get(default='').replace(u'\xa0', u' ').strip()

        def extracts(query):
            return [item.replace(u'\xa0', u' ').strip() for item in response.xpath(query).getall()]

        # def parse_curator(response):
        #     def extracts_merge(query):
        #         return ' '.join(extracts(query))
        # 
        #     yield {
        #         'curator-statement': extracts_merge('//div[@id="main-content"]'
        #                                                       '//div[@class="field-items"]//p//text()')
        #     }

        # this_curator_url = extracts('//div[@id="main-content"]//p/a/@href')[-1]
        # this_curator_statement = response.follow(this_curator_url, parse_curator(this_curator_url))

        yield {
            'url': response.url,
            'title': extract('//div[@id="main-content"]//h1/text()'),
            'subtitle': extract('//div[@id="main-content"]//h3/text()'),
            'description': extract('//div[@id="main-content"]//h3/following-sibling::p/text()'),
            'highlights_url': extracts('//div[@class="grid-33 grid-parent pod-body"]//h5/a/@href'),
            'curator-url': extract('//div[@id="main-content"]//p/a/@href'),
        }\
            #.update(this_curator_statement)

上面的代码产生了一个蜘蛛,它从图库页面抓取数据(正如预期的那样)。但是,当我试图添加注解代码时,我得到了AttributeError: 'str' object has no attribute 'xpath'。我认为这是因为this_curator_url不是一个Scrapy响应对象。在这种情况下,嵌套解析器的最佳方式是什么?

wsewodh2

wsewodh21#

您根本不需要嵌套解析器。您需要做的是为每个抓取的页面创建单独的解析回调方法,并在从每个页面提取必要数据时依次调用它们。然后,您可以通过scrapy请求的cb_kwargs参数将必要信息传递给下一个解析器,这样您就可以一次性完成项目并产生最终结果。
比如说

import scrapy

def extract(query, response):
    return response.xpath(query).get(default='').replace(u'\xa0', u' ').strip()

def extracts(query, response):
    return [item.replace(u'\xa0', u' ').strip() for item in response.xpath(query).getall()]

def extracts_merge(query, response):
    return ' '.join(extracts(query, response))

class GalleriesSpider(scrapy.Spider):
    name = "galleries"
    start_urls = ['https://www.exploratorium.edu/visit/galleries']

    def parse(self, response):
        galleries_page_links = response.xpath('//h2[text()="Museum Galleries"]/following-sibling::div//h5/a/@href')
        yield from response.follow_all(galleries_page_links, self.parse_gallery)

    def parse_gallery(self, response):
        kwargs = {
            'url': response.url,
            'title': extract('//div[@id="main-content"]//h1/text()', response),
            'subtitle': extract('//div[@id="main-content"]//h3/text()', response),
            'description': extract('//div[@id="main-content"]//h3/following-sibling::p/text()', response),
            'highlights_url': extracts('//div[@class="grid-33 grid-parent pod-body"]//h5/a/@href', response),
            'curator-url': extract('//div[@id="main-content"]//p/a/@href', response),
        }
        url = response.urljoin(kwargs['curator-url'])
        yield scrapy.Request(url, self.parse_curator, cb_kwargs=kwargs)

    def parse_curator(self, response, **kwargs):
        kwargs['curator-statement'] = extracts_merge('//div[@id="main-content"]//div[@class="field-items"]//p//text()', response)
        yield kwargs

输出

2022-09-02 18:07:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.exploratorium.edu/visit/outdoor-gallery/curator-statement> (referer: https://www.exploratorium.edu/visit/gallery-5)
2022-09-02 18:07:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.exploratorium.edu/visit/west-gallery/curator-statement>
{'url': 'https://www.exploratorium.edu/visit/gallery-1', 'title': 'Bernard and Barbro Osher Gallery 1: Human Phenomena', 'subtitle': 'Experiment with thoughts, feelings, and social behavior.', 'description': 'Humans think, feel, and interact, and these phenomena are all open to scientific investigation and creative exploration. Here, you and others are the exhibits—so play with social interactions, observe others, and contribute yourreflections.', 'highlights_url': ['/arts/black-box', '/visit/calendar/stories-of-change', '/exhibits/recollections', '/exhibits/pi-has-your-number', '/exhibits/catenary-arch', '/exhibits/survival-game'], 'curator-url': '/visit/west-gallery/curator-statement', 'curator-statement': "The experiences in the Osher Gallery focus on cognition, emotion, social behavior, and the interplay between science, society, art, and culture. We all perceive the world, remember the past, look forward to the future, and communicate with each other—and both scientists and artists investigate how and why we do so. In this gallery, you can explore how your mind works and learn about the scientific study of human behavior through exhibits on emotion, language, memory, and pattern recognition. The space is also home to Science of Sharing , a project funded by the National Science Foundation to develop exhibits that let you experiment with cooperation, competition, and strategies for sharing resources. Here, you're the exhibit; the mechanisms presented here are just tools through which you can play with and reflect on your experiences. The gallery is also a venue for dynamic temporary exhibitions; the first was The Changing Face of What Is Normal , a collection of artifacts and experiences exploring the evolving natureof normality and the lives of those affected by mental illness. In addition, the Black Box offers a state-of-the-art immersive environment for large, media-based exhibitions by visiting artists. The gallery also features works by past and present Exploratorium artists-in-residence.  Pamela Winfrey , Curator  Hugh McDonald , Associate Curator"}
2022-09-02 18:07:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.exploratorium.edu/visit/bay-observatory-gallery/curator-statement>
{'url': 'https://www.exploratorium.edu/visit/gallery-6', 'title': 'Fisher Bay Observatory Gallery 6: Observing Landscapes', 'subtitle': 'Uncover the history, geography, and ecology of the Bay Area.', 'description': 'Natural and human forces interact to create the dynamic landscape surrounding us. Learn to uncover the stories embedded in a place by directly observing the geography, history, and ecology of the San Francisco Bay region.', 'highlights_url': ['/environmental-field-station', '/visit/calendar/conversations-about-landscape', '/exhibits/library-of-earth-anatomy', '/exhibits/bay-lexicon', '/exhibits/visualizing-the-bay-area', '/exhibits/timepieces'], 'curator-url': '/visit/bay-observatory-gallery/curator-statement', 'curator-statement': 'This second-floor, indoor/outdoor exhibition space features spectacular views of the Bay and San Francisco’s northern waterfront, as well as its urban, downtown cityscape. The Fisher Bay Observatory Gallery  and Terrace use these views as an entry point for investigations of the history and dynamic processes in the local landscape, and the human impact. The exhibits, artworks, and instruments here probe the environment from multiple perspectives, such as physical and geographic sciences, ecology, astronomy, history, and contemporary experience. A smallbrowsing library of maps and books from the past and present helps visitors explore ideas that shape the Bay Area. The Fisher Bay Observatory Gallery also introduces visitors to the process of observation, and the tools and methods scientists use to gather information about the world around us. Some instruments like cameras and telescopes help us observe the landscape directly, while other exhibits present live or archived data or, visualizations, and eventually video streams, creating a picture of our surroundings that we otherwise might never see. Susan Schwartzenberg , Curator'}
2022-09-02 18:07:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.exploratorium.edu/visit/east-gallery/curator-statement>
{'url': 'https://www.exploratorium.edu/visit/gallery-4', 'title': 'Gordon and Betty Moore Gallery 4: Living Systems', 'subtitle': 'Explore life from DNA and cells to organisms and ecosystems.', 'description': 'Sometimes life is hard to observe, because it’s too tiny or fast or is hidden underground or in the ocean. Discover what you’ve been missing: use scientific tools to investigate living things of different sizes, the ecosystems they inhabit, and the processes they share.', 'highlights_url': ['/cellstoself', '/exhibits/living-systems-explainer-station', '/exhibits/plankton-populations', '/exhibits/tidal-memory', '/exhibits/live-chicken-embryos',  'http://www.exploratorium.edu/imaging_station/'], 'curator-url': '/visit/east-gallery/curator-statement', 'curator-statement': 'Gallery 4 fosters an appreciation of the living world and the many ways to explore it. Using authentic scientific methods and tools, visitors learn about living things at different scales, the processes they share, and their ecosystems. Anchored by the Life Sciences Laboratory, a working laboratory that cultivates organisms for exhibits, the gallery is a dynamic space where contributions by the scientific and artistic communities come together to provide unique and engaging experiences. A Cells and Development section includes the renovated Microscope Imaging Station, which gives visitors a direct look through research-grade microscopes at stem cell biology and other aspects of development. Living Liquid is a new exhibit area that focuses on tiny drifting marine organisms called plankton as well as often-overlooked species in the Bay and the ocean beyond. Life around Us blends new and classic exhibits that examine familiar organisms and reveal their amazing behaviors and unusual features. Located at the eastern end of Pier 15, with a magnificent view of the Bay, the gallery invites an exploratory yet contemplative interaction with the biological world. Kristina Yu ,Curator  Jennifer Frazier , Associate Curator'}
2022-09-02 18:07:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.exploratorium.edu/visit/central-gallery/curator-statement>
{'url': 'https://www.exploratorium.edu/visit/gallery-3', 'title': 'Bechtel Gallery 3: Seeing & Reflections', 'subtitle': 'Experiment with light, mirrors, and bubbles.', 'description': 'Our eyes respond to light, but this is just one aspect of how we perceive the world. Playing with light is a great way to learn how it works. And investigating real phenomena can give you a deeper understanding of the scientific process.', 'highlights_url': ['/exhibits/cubatron-core', '/exhibits/giant-mirror', '/exhibits/soap-film-painting', '/exhibits/colored-shadows', '/exhibits/monochromatic-room', '/exhibits/out-quiet-yourself'], 'curator-url': '/visit/central-gallery/curator-statement', 'curator-statement': 'Bechtel Gallery 3 is the heart of the Exploratorium, a place designed to spark and nurture visitors’ curiosity and challenge them to investigate natural phenomena for themselves—with tools and gentle guidance to catalyze their explorations. The gallery features many of our favorite classic exhibits, but it also introduces new exhibits and experimental prototypes that reflect our efforts to share current science as it advances. The primary activity in the gallery is experimentation in the broadest sense. Visitors are encouraged to discover things for themselves through exhibits designed as experiments, with opportunities for experimental variations and controls. Most importantly, visitors have a unique opportunity to learn-by-doing about the scientific process itself, the power of experiment to answer questions, and the roles of knowledge and creativity in discovering connections among diverse phenomena. Immersive and evocative experiences will inspire further explorations. Thomas Humphrey, Curator Richard O. Brown, Associate Curator'}
2022-09-02 18:07:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.exploratorium.edu/visit/south-gallery/curator-statement>
{'url': 'https://www.exploratorium.edu/visit/gallery-2', 'title': 'Gallery 2: Tinkering', 'subtitle': 'Explore your creativity and our curious contraptions.', 'description': 'Making things and developing ideas by hand helps us construct understanding. Slow down, settle in, and make something personally meaningful—from playful contraptions to surprising connections between mechanical systems and natural phenomena.', 'highlights_url': ['/exhibits/tinkerers-clock', 'https://www.exploratorium.edu/tinkering/blog', '/video/art-tinkering-scott-weavers-100000-toothpick-sculpture-san-francisco', '/exhibits/your-turn-counts', '/exhibits/lariat-chain', 'https:
//www.exploratorium.edu/tinkering/projects/cardboard-automata', 'https://www.exploratorium.edu/tinkering/projects/chain-reaction', 'https://www.exploratorium.edu/tinkering/projects/circuit-boards'], 'curator-url': '/visit/south-gallery/curator-statement', 'curator-statement': 'A tall, fanciful, interactive Tinkerer’s Clock towers over Gallery 2, welcoming you to a public workshop area where you can make, build, or tinker, either alone or with others, as a way of exploring the world and your own creativity. Here, familiar materials are used in unfamiliar ways, and exhibits highlight the beauty—and, sometimes, whimsy—of scientific complexity and discovery. The Tinkering Studio is the heart of this gallery. In this immersive space, visitors use tools and materials to explore the intersection of science, art, and technology. We try experiments for the first time, or play along with other makers and artists. Whether expert of novice, we’re all learning together by making something that is personally meaningful. Adjacent to the gallery is the museum’s exhibit-building workshop, whee most of our exhibits are made. Open to public view, you’ll see our staff working with a variety of materials—woodworking tools, drills, and lathes, for example—and some of our exhibits in various stages of development. The Learning Studio is also in the gallery space. It serves as a research-and-development lab for staff and artists/collaborators. Here, we try things out, make mistakes, get excited, become delighted, and every now and then stumble on to something great that we share with visitors in the Tinkering Studio and in professional development workshops for teachers and museum educators.  Mike Petrich and Karen Wilkinson, Curators '}
2022-09-02 18:07:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.exploratorium.edu/visit/outdoor-gallery/curator-statement>
{'url': 'https://www.exploratorium.edu/visit/gallery-5', 'title': 'Gallery 5: Outdoor Exhibits', 'subtitle': 'Explore winds, tides, and natural phenomena.', 'description': 'Investigate forces shaping the City, Bay, and region. Watch shifting winds and tides, reveal hidden life, shake a bridge, observe human behavior, and find new ways to notice the places we inhabit.', 'highlights_url': ['/exhibits/aeolian-harp', '/exhibits/color-of-water', '/exhibits/bike-rope-squirter', '/exhibit/wind-arrows', '/exhibits/sun-swarm', '/exhibits/research-buoy', '/exhibits/golden-gate-bridge', '/exhibits/disappearing-rings', '/visit/outdoor-gallery/remote-rains'], 'curator-url': '/visit/outdoor-gallery/curator-statement', 'curator-statement': 'The guiding principle of the Gallery 5 is to support and expand the Exploratorium’s role as a community museum dedicated to awareness. Helping to reinvent the civic role of a public museum as a place to gather and exchange ideas, the gallery also exemplifies how direct observations of natural and urban phenomena can blossom into artistic endeavors, scientific investigations, and open-ended inquiries. The gallery features a combination of large- and small-scale exhibits, rotating art installations, and public programs  (including vendors, performance artists, and public exhibitions). Our defining location on the urban edge of the city and the Bay enhances visitors’ ability to perceive their surroundings with heightened precision and clarity that leads to deepened insight and understanding. The gallery team is also extending its efforts beyond the boundaries of the Exploratorium campus, developing community-based partnerships that stretch throughout San Francisco and the Bay Area to create interactive outposts that both engage and delight. Shawn Lani , Curator Eric Dimond, Associate Curator'}

相关问题