regex 在网页中查找字符串而不将其保存在文件中？

laximzn5 于 2023-04-07 发布在其他

关注(0)|答案(2)|浏览(77)

我是Python新手，有一些问题！！

def extractdownloadurl(url):

    uresponse = urllib2.urlopen(url) #open url
    contents = uresponse.readlines() #readlines from url file
    fo = open("test.html","w") #open test.html
    for line in contents: 
        fo.write(line)#write lines from url file to text file
    fo.close()#close text file

    cadena = os.system('more test.html | grep uploads | grep zip >> cadena.html')

    f = open("cadena.html","r")
    text = f.read()
    f.close()

    match = re.search(r'href=[\'"]?([^\'" >]+)', text)
    if match:
        cadena=match.group(0)

    texto = cadena[6:]

    os.system('rm test.html')
    os.system('rm cadena.html')
    return texto

这是我的功能下载网页，并采取一个网址以下的一些条件.它的作品.但我想申请一个更有效的方式比保存在一个文件上的网页.我想做一些类似的grep没有保存和读取文件（这是真的很慢）.和其他更快的方式复制网址到一个字符串.
请编写代码来查找内容中的URL，而不将内容保存到文件中。
我知道有很多问题，但如果你能回答所有的问题，我将非常感激。

regex

来源：https://stackoverflow.com/questions/7895163/look-for-an-string-inside-a-webpage-without-saving-it-in-a-file

2条答案

按热度按时间

9bfwbjaz1#

这个脚本使用正则表达式打印网页上的所有链接：

import re, urllib
page = urllib.urlopen("http://sebsauvage.net/index.html").read()
urls = re.findall('href=[\'"]?([^\'" >]+)',page)
for url in urls:
    print url

赞(0）回复(0）举报 2023-04-07

dojqjjoe2#

更新了Lycha对python3的回答

import re, urllib.request
page = urllib.request.urlopen("http://sebsauvage.net/index.html").read().decode('utf-8')
urls = re.findall('href=[\'"]?([^\'" >]+)', page)
for url in urls:
    print(url)

赞(0）回复(0）举报 2023-04-07

我来回答

regex 在网页中查找字符串而不将其保存在文件中？

2条答案

相关问题

热门标签

最新问答