python 如何合并字符串|正则表达式

7lrncoxx  于 2022-10-30  发布在  Python
关注(0)|答案(4)|浏览(293)
import re

def tst():
  text = '''
  <script>
  '''
  if proxi := re.findall(r"(?:<td\s[^>]*?><font\sclass\=spy14>(.*?)<script.*?\"\+(.*?)\)<\/script)", text):
    for proxy, port in proxi:
      yield f"{proxy}:{''.join(port)}"

    if dtt := re.findall(r"<td colspan=1><font class\=spy1><font class\=spy14>(.*?)</font> (\d+[:]\d+) <font class\=spy5>([(]\d+ \w+ \w+[)])", text):
      for date, time, taken in dtt:
        yield f"{date} {' '.join([time, taken])}"

    return None
  return None

for proxy in tst():
  print(proxy)

我得到的输出

51.155.10.0:8000
178.128.96.80:7497
98.162.96.41:4145
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)

所以我使用下面这个正则表达式从输出中捕获组

(\w+[.]\w+[.]\w+[.]\w+[:]\w+)|(\w+.*)

我想要这样的结果,如何从输出中合并它?

157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)
lyr7nygr

lyr7nygr1#

假设您(编辑过的)问题顶部的代码具有运行良好的正则表达式,它们运行相同数量的匹配,您可以使用zip

import re

def tst():
    text = '''
    <script>
    '''
    proxi = re.findall(r"(?:<td\s[^>]*?><font\sclass\=spy14>(.*?)<script.*?\"\+(.*?)\)<\/script)", text)
    dtt = re.findall(r"<td colspan=1><font class\=spy1><font class\=spy14>(.*?)</font> (\d+[:]\d+) <font class\=spy5>([(]\d+ \w+ \w+[)])", text)
    if proxi and dtt:
        for (proxy, port), (date, time, taken) in zip(proxi, dtt):
            yield f"{proxy}:{''.join(port)} {date} {' '.join([time, taken])}"

for proxy in tst():
    print(proxy)
irtuqstp

irtuqstp2#

这种方法将所有行读入一个列表,然后依次迭代IP行和日期行以生成输出。

text = '''157.245.247.84:7497
184.190.137.213:8111
202.149.89.67:7999
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)'''
lines = text.split('\n')
output = []
for i in range(0, len(lines) / 2):
    val = lines[i] + ' - ' + lines[i + len(lines)/2]
    output.append(val)

print('\n'.join(output))

这将打印:

157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

请注意,此答案假定每个IP行始终只有一个匹配的日期行。它还假定这些行是有序的,并且所有IP行都在日期行之前。

gstyhher

gstyhher3#

如果文本保证包含N行IP地址,后面跟着N行“时间戳”,那么您可以这样做:

text = '''157.245.247.84:7497
184.190.137.213:8111
202.149.89.67:7999
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)'''

lines = text.splitlines()

for ip, t in zip(lines, lines[len(lines)//2:]):
    print(f'{ip} - {t}')

输出:

157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)
dl5txlt9

dl5txlt94#

使用正则表达式

import re

text = '''
157.245.247.84:7497
184.190.137.213:8111
202.149.89.67:7999
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)
'''
ip_regex = r"(?:\d{1,3}\.){3}\d{1,3}\:\d{4}"
time_regex = r'\d{2}\-\w+\-\d{4}\s\d{2}\:\d{2}\s\(.+\)'

ip_list = re.findall(ip_regex, text)
time_list = re.findall(time_regex, text)

for i in range(len(ip_list)):
    print(f'{ip_list[i]} - {time_list[i]}')

>>> 157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
>>> 184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
>>> 202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

相关问题