我是Python新手,我想在CSV文件中对IP执行一些反向DNS查找。CSV有一个'ip'列,地址格式如下:
example of source input data
我的python脚本只适用于单个IP,但我不确定如何读取具有多个IP地址的行。
#pip install pandas
#pip install dnspython
#pip install xlrd, pip install xlsxwriter
### Import needed libraries ###
import pandas as pd
import time
from pandas.io.excel import ExcelWriter
from dns import resolver,reversename
### Time variable ###
startTime = time.time()
#Custom Delimiter
custom_delimiter = ','
logs = pd.read_csv('C:\\Users\\user\\Desktop\\rdnslookup\\input_file\\logs.csv', sep=custom_delimiter, quotechar=',', engine='python', converters={'ip': lambda x:x.strip('["]')}, on_bad_lines = "skip")
#Diagnostic print statement
print(logs.head)
# Create new Dataframe that removed duplicate IP addresses
logs_filtered = logs.drop_duplicates(['ip']).copy()
### Perform DNS lookup on deduplicated IPs ###
def reverseDns(ip):
try:
return str(resolver.query(reversename.from_address(ip), 'PTR')[0])
except:
return 'N/A'
### Create DNS column with the reverse IP DNS result ###
logs_filtered['dns'] = logs_filtered['ip'].apply(reverseDns)
### Merge DNS column to full logs matching IP ###
logs_filtered = logs.merge(logs_filtered[['ip','dns']], how='left', on=['ip'])
### Output IP addresses to CSV with DNS lookups ###
writer = ExcelWriter('C:\\Users\\user\\Desktop\\rdnslookup\\output_file\\validated_logs.xlsx', engine='xlsxwriter',)
logs_filtered.to_excel(writer,'Sheet1', index=False)
writer.close()
字符串
我试着把引号改为““,但这并没有从多个地址的行中删除引号,感谢任何帮助。谢谢
1条答案
按热度按时间hc8w905p1#
你正在遭受“当你只有一把锤子时,整个世界开始看起来像钉子”综合症。Pandas很漂亮,但是当你浪费时间试图将你的问题压缩成适合Pandas的形式时,这是没有效率的。你的问题更容易通过阅读它的本质来解决--一个JSON记录列表。
我没有安装
dns
模块,所以我伪造了它。字符串
输出量:
型