我尝试从网页中提取值,但得到的是AttributeError
。我不知道为什么会出现此错误。如果查看代码,您将不会发现导致此错误的原因。实际上,第一个值plan
提取得很好,但问题出在第二个值price
上。以下是我的代码:看看我做错了什么?
price = plan.xpath('normalize-space(.//h4)').get()
AttributeError: 'str' object has no attribute 'xpath'
下面是我的代码。
import requests
from scrapy import Selector
headers = {
'authority': 'www.spectrum.com',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'en-PK,en;q=0.9,ur-PK;q=0.8,ur;q=0.7,en-GB;q=0.6,en-US;q=0.5,sv;q=0.4,it;q=0.3',
'cache-control': 'max-age=0',
'cookie': "akaas_AB-Testing=2147483647~rv=43~id=c941a85b15ef281f13281b8048cf4ae5; bm_sz=2632BBEE923E9AD7E1EA69FF928F5DBB~YAAQjr3XF/bDqsiBAQAACQuP0xCDh7v649Sa+PSXCzISjJqKSYw3+PEzlGEQ08DzO/hwCZjGgWULy/XTuertMLmrj9NhRZGG3f4suAnQfrfqZMrXfLmCdP7IvopupSVOQ2qe1ZStkhlKIbXRwa42EKQDoBeZ5kbnOv9YmFeNKCmuSAZARPgAhHrVHy/R0/2eELZ/6yNHbaBBkuqrZOpbgcSPALVHVmebbaU6TcUdtN9wWkvC04SsZP1cByGcljFlokraEKx73zOWRmTXFAf40kHHYngR+diPWYNgpn+25gtQv81EjA==~4277812~4339012; akacd_RWASP-default-phased-release=3834564575~rv=100~id=46a77af27ddb8f9add7bb7387dd9bda5; PIM-SESSION-ID=JcAqb8mZ1FcAPeQJ; domain=%22spectrum.com%22; omnitureId=%22b9eff532-7818-4db6-b867-ef1f30f6ff13%22; akamaiHeader=%7B%22georegion%22%3A%22167%22%2C%22country_code%22%3A%22PK%22%2C%22city%22%3A%22ISLAMABAD%22%2C%22lat%22%3A%2233.70%22%2C%22long%22%3A%2273.17%22%2C%22timezone%22%3A%22GMT%2B5%22%2C%22continent%22%3A%22AS%22%2C%22asnum%22%3A%2217557%22%2C%22throughput%22%3A%22vhigh%22%2C%22bw%22%3A%225000%22%2C%22client_ip%22%3A%2239.40.46.103%22%2C%22device_os%22%3A%22Windows%20NT%22%2C%22brand_name%22%3A%22Chrome%22%2C%22is_wireless%22%3A%22false%22%2C%22internal_corp_traffic%22%3A%22false%22%2C%22zip%22%3A%22%22%7D; spectrum-residential-user-profile=%7B%22zipcode%22%3A%22%22%2C%22city%22%3A%22%22%2C%22state%22%3A%22%22%2C%22serviceVendorName%22%3A%22%22%2C%22isSPP1%22%3A%22%22%2C%22isSPP2%22%3A%22%22%2C%22isSPP3%22%3A%22%22%2C%22isSPP4%22%3A%22%22%2C%22isSPP5%22%3A%22%22%2C%22isSPP6%22%3A%22%22%2C%22isSPP7%22%3A%22%22%2C%22isSPP8%22%3A%22%22%2C%22isNPP%22%3A%22%22%2C%22isTwcD3%22%3A%22%22%2C%22isTwcSTDA%22%3A%22%22%2C%22isTwcSTD%22%3A%22%22%2C%22isTwcSELA%22%3A%22%22%2C%22isTwcSEL%22%3A%22%22%2C%22isCharterD3%22%3A%22%22%2C%22isCharterD3NCS%22%3A%22%22%2C%22isCharterSTDS%22%3A%22%22%2C%22isCharterD3STL%22%3A%22%22%2C%22isCharterSELS%22%3A%22%22%2C%22isBhnSTD%22%3A%22%22%2C%22isBhnSEL%22%3A%22%22%2C%22isBackToSchool%22%3A%22%22%2C%22isBHNMultipleMSO%22%3A%22%22%2C%22isCharterMultipleMSO%22%3A%22%22%2C%22isTWCMultipleMSO%22%3A%22%22%2C%22isServiceableHawaii%22%3A%22%22%2C%22isNYCOutOfFootprint%22%3A%22%22%2C%22isResi30%22%3A%22%22%2C%22isResi60%22%3A%22%22%2C%22isResi100%22%3A%22%22%2C%22isResi200%22%3A%22%22%2C%22isResi400%22%3A%22%22%2C%22isResi940%22%3A%22%22%2C%22isNewWaveSwitch%22%3A%22%22%2C%22isCDELightbandSwitch%22%3A%22%22%2C%22isMicrologicSwitch%22%3A%22%22%2C%22isLocalTelSwitch%22%3A%22%22%2C%22isSpectrumInternetAssist%22%3A%22%22%2C%22isMINet%22%3A%22%22%2C%22isMontanaOpticom%22%3A%22%22%2C%22isSilverStarCommunications%22%3A%22%22%2C%22isTCTWest%22%3A%22%22%2C%22isTSC%22%3A%22%22%2C%22isCPWS%22%3A%22%22%2C%22isCitiLinks%22%3A%22%22%2C%22isATMC%22%3A%22%22%2C%22isVast%22%3A%22%22%2C%22isHorizon%22%3A%22%22%2C%22isClevelandOHZipcode%22%3A%22%22%2C%22isColumbusOHZipcode%22%3A%22%22%2C%22isEvansvilleINZipcode%22%3A%22%22%7D; SERVERID=pub19ncw_aem65; ak_bmsc=7CFCF89508757C6F483E18BBF380D8F1~000000000000000000000000000000~YAAQjr3XF9XEqsiBAQAAdRuP0xC1bQwnbConyIesU3jSyxFV3s4s0K86rrYUyh3i/scryjxBAa5tSYNpQd7MZiVfNUsXmFooX+ASzzQ9qsJNng1o+6iPzfYLJoWaN6wt/1CaWpqaJP0DhvHHnX0MfKYITq8DgsHVIwd1EKwaCB+g67hQlS8kB1N3yIXyauX2Gpni1NgGsyRzMoALxQ4VdZEJ6WYV97Wg1vtPz/Lwh+9pb/4HrP7yGwwMBIANj+pPuazwkyQ2TI7DstOeEVdmZkMcW4YI2l8agJm6gS4F9k6kipH8TGAUDwSWbyuJvxpWoqBXLUhSdGxx+38Di0hIvV4FNyzb5/+BR4rrl2MBDC7yU1BHk1JW+/sTntHKSWxcABHImErdMA1C+WDmYFNKx4kWj9nDNomgN+vdh0FoT5G3B2TSyahQsqfKDaxv39FO4hZ+DCFtF/XiA8aKkPrxtaGLagk+bPnzWm/TSAgWx0TiaBIhZQ1NVUsev5U=; bm_sv=045DB25F286F7DA3315AE8AE1E1C7E71~YAAQjr3XF//EqsiBAQAAgh2P0xDZIIr9HHEmWcpdNzKev8fZcqKHcP8rdJpPncyVd1SK6nw+n4lHEEe02QIQ/ksIcZ5ICEIgijODVMciTV6oKVE8qcGMRqiUKHFGb7aT3lmGH15wVUS1DTF75Axpil33ILOnZ9y3UskYq7Ii+TzXr2S8u8pKKFuzonfdXjgo0/omvKHVQTj/+zGBLoRxdWwdpVwQ7MhwJQpo6XxKyeHGsgDAD9sOktUKAdhh8rRURLw=~1; _abck=9B09AE84C627BD6C56ED5324C2377BFC~0~YAAQjr3XFxHFqsiBAQAAQx6P0wgnXELg9/IFkwYW75PtoKE9E6Jc32OQ2EjSzgj2xQBs6VFVawJUKc3KWHVLU0zfebAO8EU4QvFCRduo3iBXp/wX3XeB3IHiwhFU+XQCu6vgWvZkXXcN02TIRleJV7BrEFYB8oTAKVGYNvfSq4gtXd+EfUwCXeML71VlUpqg+ux6tv9DzUxMIzjR6phg3sJkwvJdRSgQC8sHBxMGqO/bceL+pQvP3ocsAvcXBUHqpffNXN29NqMRgZXl1LnDYENX7/pM38sCYbgCk1wM1SX6CPR2RBxS1zTx5DQH7j9rdxR79eGPTeVT8thZ+LES0hbMDnbrCM25xifVxEhmVGUAE8GMZjliHpL5y3fgS5qrt+32P5nhOxOoeO4nAC0O61SBQbnsR77Asno=~-1~||-1||~-1; _cls_v=fd065e19-d04d-4037-95f1-2915722ce6fa; _cls_s=c2d97766-d6a9-400e-9256-cc7f306d74ff:0; _fbp=fb.1.1657111781756.1149817254; _uetsid=1a9139f0fd2a11ecac7a8b3cd8b5e640; _uetvid=1a917710fd2a11eca2847d6bd10b6b3f; _gcl_au=1.1.730465435.1657111782; com.silverpop.iMAWebCookie=ea054721-7458-28b9-e149-b14adde1da9d; com.silverpop.iMA.session=ade0dd24-3755-e2c5-5a79-b3a6053a5ce5; com.silverpop.iMA.page_visit=1519396976:; sc.ASP.NET_SESSIONID=vx0d41gedc4quweaf0dopvwd; _tq_id.18365409-1.91a5=a49be617f522fe17.1657111783.0.1657111783..; sc.UserId=bbe2768c-41cc-45d2-8e4f-d5f20bf1d6e0; _clck=hx5wof|1|f2x|0; btIdentify=13899909-7620-426b-df11-185cd019a2f0; _bts=94b9da89-0af8-4fb8-c8cf-4cfb1d262628; _clsk=14hzxd1|1657111783980|1|0|l.clarity.ms/collect; AMCVS_97C902BE53295FC80A490D4C%40AdobeOrg=1; AMCV_97C902BE53295FC80A490D4C%40AdobeOrg=-330454231%7CMCMID%7C22421919196769338640237431338961795837%7CMCAAMLH-1657716584%7C3%7CMCAAMB-1657716584%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1657118984s%7CNONE%7CvVersion%7C3.1.2; s_cc=true; _bti=%7B%22app_id%22%3A%22charter%22%2C%22bsin%22%3A%22pEXKvtttqTjjtgQef5zYKUDXao8SkpfqlrDanE%2F2zNR9slbf6%2FhE2Gyr4IjNsJC9w5PwIyuprzJtPvIM%2F2KPhw%3D%3D%22%2C%22is_identified%22%3Afalse%7D; aam_uuid=12467149554957211440944457970446103725; akavpau_Global=1657112106~id=e008851a51085afc5fa869b5b4bcbf42; s_sess=%20s_ppv%3D23%3B%20s_prop20%3Dbrowse%3B%20search_prop17%3DNo%2520Information%2520Avaliable%3B; s_pers=%20s_vnum%3D1688647784115%2526vn%253D1%7C1688647784115%3B%20s_previousPage%3Dcom%253Ainternet%7C1657113615499%3B%20s_nr%3D1657111815507-New%7C1659703815507%3B%20s_invisit%3Dtrue%7C1657113615510%3B%20s_dayslastvisit%3D1657111815512%7C1751719815512%3B%20s_dayslastvisit_s%3DFirst%2520Visit%7C1657113615512%3B; utag_main=v_id:0181d38f1b81000e58ce096006f10506f002106700bd0{_sn:1$_ss:1$_pn:1%3Bexp-session$_st:1657113615521$ses_id:1657111780225%3Bexp-session$_ga:3936920691.1657111781$vapi_domain:spectrum.com$dcsyncran:1%3Bexp-session$dc_visit:1$dc_event:1%3Bexp-session$dc_region:eu-central-1%3Bexp-session$aam_load:true%3Bexp-session;} RT=\"z=1&dm=www.spectrum.com&si=b2dc65b1-5a7d-4a10-896d-0e0a7e16f4ef&ss=l59lkcgo&sl=1&tt=8t5&bcn=%2F%2F684d0d44.akstat.io%2F&ld=8t8&ul=z6k\"",
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
r = requests.get('https://www.spectrum.com/internet', headers=headers)
response = Selector(text=r.text)
plans = response.xpath('(//div[@class="cardsContainer__body"])[1]/div')
for plan in plans:
plan = plan.xpath('normalize-space(.//div[@class="iconcard__subheader"])').get()
price = plan.xpath('normalize-space(.//h4)').get()
data = {
"Plan": plan,
"Price": price
}
print(data)
2条答案
按热度按时间yhxst69z1#
https://docs.scrapy.org/en/latest/topics/selectors.html
.xpath(...)
方法会传回Selector
,您可以在此Selector
上再次呼叫.xpath
。但是在选择器上调用
.get()
会返回一个字符串值。所以
plan = plan.xpath('normalize-space(.//div[@class="iconcard__subheader"])').get()
将plan
设置为一个字符串值,然后你就不能在下一行调用.xpath
了。试试看:
更新
根据OP在their own answer中发布的内容,听起来他们原始代码
price = plan.xpath('normalize-space(.//h4)').get()
的意图是使用原始的plan
示例,而不是他们在上面一行中用plan = plan.xpath(...)
替换的示例。所以代码应该是:
hs1ihplo2#
非常奇怪的行为。我移动了上面的
price
行,它工作了。