我是网页抓取的初学者。我正在尝试从这个网站获取纬度和经度:
https://urbania.pe/inmueble/proyecto/ememhvin-proyecto-mariscal-castilla-lima-santiago-de-surco-tale-inmobiliaria-65659522
包含此类数据的部件为:
<script type="text/javascript"> ==$0
const POSTING = {{[...]
"locationId":"V1-B-4368","name":"Lima","label":"PROVINCIA","depth":1,"parent":{"locationId":"V1-A-111","name":"Peru urbania","label":"PAIS","depth":0,"parent":null,"acronym":null},"acronym":null},"acronym":null},"acronym":null},"postingGeolocation":{"geolocation":{"latitude":-12.133920500000000,"longitude":-77.014942900000000},
[...]
<script>
我试着去做,但没有效果:
import requests
import pandas as pd
import re
from bs4 import BeautifulSoup
import time
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
import urllib.parse
sa_key = 'ea69223fa47f72fac0907759' # TOKEN from a web
sa_api = 'https://api.scrapingant.com/v2/general'
page='https://urbania.pe/inmueble/proyecto/ememhvin-proyecto-mariscal-castilla-lima-santiago-de-surco-tale-inmobiliaria-65659522'
qParams = {'url':page , 'x-api-key': sa_key} #OJO: aqui tener cuidado con /proyecto/ y /clasificado/ , estructura para 1°
reqUrl = f'{sa_api}?{urllib.parse.urlencode(qParams)}'
r = requests.get(reqUrl)
soup = BeautifulSoup(r.content, 'html.parser')
list_geolocalization=[]
# trying to get latitude and lingitude
geolocalization=soup.find_all('script',{'type': 'text/javascript'})
for tag in geolocalization:
list_geolocalization.append(tag.find('latitude'))
df_geolocalization=pd.DataFrame(list_geolocalization,columns = ["geolocalization"])
#other
lat, long=re.findall(r'(?is)("latitude":|"longitude":)([0-9.]+)',geolocalization)
有人能帮帮我吗?先谢了!
1条答案
按热度按时间toe950271#
在这种情况下,您可以利用正则表达式,如下所示:
输出: