我有ATM日志数据,其中包含每个客户的交易日志详细信息。我正在尝试从日志文件中提取客户数据。我在从文本文件中提取Trx_datetime字段时遇到问题。
我的样本数据
[01012020 101221 168][1][INFO]> -Cash Withdraw Initiated -------------
[01012020 101221 177][1][INFO]> -----Amount : 2500
[01012020 101221 187][21][INFO]> ----AUX NO : :xx:A00000XXX0200101101DD1-02
[01012020 101221 196][21][INFO]> ----AUX NO : :xx:A00000XXX200101101DD1-03
[01012020 101221 205][21][INFO]> ----AUX NO : :xx:A00000942020010XXXX221-04
[01012020 101222 487][1][INFO]> ---- Image Capture (TRX_RESPONSE_WITHDRAW)
[01012020 101222 560][1][INFO]> -----Withdraw Status : OK
[01012020 101222 567][1][INFO]> -----Account : 60700XXXXXXXX
[01012020 101222 574][1][INFO]> -----Action Code :na
[01012020 101222 580][1][INFO]> -----Response : 000
[01012020 101222 587][1][INFO]> -----Trace ID : 000000
[01012020 101222 595][1][INFO]> -----EOD ID :
[01012020 101222 602][1][INFO]> -----BATCH ID :
[01012020 101222 609][1][INFO]> -----TRX NO :
[01012020 101222 615][1][INFO]> ---Cash Withdraw Initiated Completed
[01012020 101222 757][1][INFO]> ---Send Online Data
[01012020 101222 763][1][INFO]> -----ARC : 3030
[01012020 101222 770][1][INFO]> -----Trx DateTime : 11/1/2020
[01012020 101222 777][1][INFO]> -----Online Status : Online_Perfoamed
[01012020 101223 091][1][INFO]> -EMV Transaction Completed------------
[01012020 101223 099][1][INFO]> --- Status : Success
[01012020 101223 108][1][INFO]> --- Message : Approved
[01012020 101941 893][1][INFO]> -Cash Withdraw Initiated -------------
[01012020 101941 900][1][INFO]> -----Amount : 30000
[01012020 101941 910][15][INFO]> ----AUX NO : :xx:A00000942xxxxxxxxx1941-02
[01012020 101941 919][15][INFO]> ----AUX NO : :xx:A000009420200XXXXXXXXX-03
[01012020 101941 928][15][INFO]> ----AUX NO : :xx:A000009xxxxxxxxx11xx41-04
[01012020 101943 317][1][INFO]> ---- Image Capture (TRX_RESPONSE_WITHDRAW)
[01012020 101943 406][1][INFO]> -----Withdraw Status : OK
[01012020 101943 415][1][INFO]> -----Account : 6075XXXXXXXXX8
[01012020 101943 422][1][INFO]> -----Action Code :na
[01012020 101943 429][1][INFO]> -----Response : 000
[01012020 101943 436][1][INFO]> -----Trace ID : 165870
[01012020 101943 442][1][INFO]> -----EOD ID :
[01012020 101943 449][1][INFO]> -----BATCH ID :
[01012020 101943 456][1][INFO]> -----TRX NO :
[01012020 101943 463][1][INFO]> ---Cash Withdraw Initiated Completed
[01012020 101943 605][1][INFO]> ---Send Online Data
[01012020 101943 613][1][INFO]> -----ARC : 3030
[01012020 101943 619][1][INFO]> -----Trx DateTime : 1/1/2020
[01012020 101943 628][1][INFO]> -----Online Status : Online_Perfoamed
[01012020 101943 972][1][INFO]> -EMV Transaction Completed------------
[01012020 101943 979][1][INFO]> --- Status : Success
[01012020 101943 986][1][INFO]> --- Message : Approved
[01012020 102838 263][1][INFO]> -Cash Withdraw Initiated -------------
[01012020 102838 271][1][INFO]> -----Amount : 5000
[01012020 102838 281][10][INFO]> ----AUX NO : :xx:A000009420XXXXXXXXXXXX-02
[01012020 102838 290][10][INFO]> ----AUX NO : :xx:A00000942XXXXXXXXXXXXX-03
[01012020 102838 298][10][INFO]> ----AUX NO : :xx:A00000942XXXXXXXXXXXXX-04
[01012020 102839 660][1][INFO]> ---- Image Capture (TRX_RESPONSE_WITHDRAW)
[01012020 102839 735][1][INFO]> -----Withdraw Status : OK
[01012020 102839 742][1][INFO]> -----Account : 106XXXXXXXXX
[01012020 102839 748][1][INFO]> -----Action Code :na
[01012020 102839 755][1][INFO]> -----Response : 000
[01012020 102839 762][1][INFO]> -----Trace ID : 167030
[01012020 102839 768][1][INFO]> -----EOD ID :
[01012020 102839 777][1][INFO]> -----BATCH ID :
[01012020 102839 783][1][INFO]> -----TRX NO :
[01012020 102839 790][1][INFO]> ---Cash Withdraw Initiated Completed
[01012020 102839 931][1][INFO]> ---Send Online Data
[01012020 102839 940][1][INFO]> -----ARC : 3030
[01012020 102839 947][1][INFO]> -----Trx DateTime : 11/12/2020
[01012020 102839 953][1][INFO]> -----Online Status : Online_Perfoamed
[01012020 102840 273][1][INFO]> -EMV Transaction Completed------------
[01012020 102840 280][1][INFO]> --- Status : Success
[01012020 102840 325][1][INFO]> --- Message : Approved
我试过这个代码:
import re
import pandas as pd
# Extract the required data from the text file using regular expressions
amounts = [int(m) for m in re.findall(r'Amount\s*:\s*(\d+)', text)]
withdraw_statuses = re.findall(r'Withdraw\s+Status\s*:\s*(\w+)', text)
accounts = re.findall(r'Account\s*:\s*(\d+)', text)
#trace_ids = [int(m) for m in re.findall(r'Trace\s+ID\s*:\s*(\d+)', text)]
trace_ids = re.findall(r'Trace\s+ID\s*:\s*(\d+)', text)
#trx_datetimes = re.findall(r'Trx\s+DateTime\s\s:\s*(.+)', text)
trx_datetimes = re.findall(r'Trx\s+DateTime\s\s:\s\d{1,2}\/\d{1,2}\/\d{4}\s+', text)
#trx_datetimes = re.findall(r'Trx\s+DateTime\s\s:\s(\d{1,2}\/\d{1,2}\/\d{4}\s+\d{1,2}:\d{1,2}:\d{1,2}\s+(?:AM|PM))', text)
online_statuses = re.findall(r'Online\s+Status\s*:\s*(.+)', text)
statuses = re.findall(r'Status\s\s:\s*(.+)', text)
messages = re.findall(r'Message\s*:\s*(.+)', text)
# Create a list of dictionaries to store the extracted data for each transaction
data_list = []
for i in range(len(amounts)):
data_dict = {
'amount': amounts[i],
'withdraw_status': withdraw_statuses[i],
'account': accounts[i],
'trace_id': trace_ids[i],
'trx_datetime': trx_datetimes[i],
'online_status': online_statuses[i],
'status': statuses[i],
'message': messages[i],
}
data_list.append(data_dict)
# Create a pandas dataframe from the list of dictionaries
dff = pd.DataFrame(data_list)
dff['trx_datetime'] = pd.to_datetime(df['trx_datetime'])
dff['upload_datetime'] = pd.Timestamp('now')
dff
我的输出是:
trx_datetime在第二行有一个空值,只是它捕获了第一个值。如何捕获 Dataframe 中的所有trx_datetime值?
1条答案
按热度按时间axzmvihb1#
您的代码运行良好,除了您忘记了
trx_datetimes
的捕获组:输出: