我在纯文本文件中有一个协议转储,格式如下:
Frame 380: 19 bytes on wire (152 bits), 19 bytes captured (152 bits)
Bluetooth HCI H4
[Direction: Sent (0x00)]
HCI Packet Type: ACL Data (0x02)
0000 02 0b 00 0e 00 0a 00 01 00 05 0e 06 00 07 07 00 ................
0010 00 00 00 ...
Frame 381: 8 bytes on wire (64 bits), 8 bytes captured (64 bits)
Bluetooth HCI H4
[Direction: Rcvd (0x01)]
HCI Packet Type: HCI Event (0x04)
0000 04 13 05 01 0b 00 01 00 ........
Frame 382: 23 bytes on wire (184 bits), 23 bytes captured (184 bits)
Bluetooth HCI H4
[Direction: Rcvd (0x01)]
HCI Packet Type: ACL Data (0x02)
0000 02 0b 20 12 00 0e 00 01 00 05 12 0a 00 47 00 00 .. ..........G..
0010 00 00 00 01 02 00 04 .......
在这个简化的例子中,帧号380,381等是文本格式的每个帧的第一行的一部分。我想将其转换为以下形式的pandas Dataframe :
FrameNumber Details
|---------------------------------------------------------------------------------------|
| | Frame 380: 19 bytes on wire (152 bits), 19 bytes captured (152 bits) |
| | Bluetooth HCI H4 |
| 380 | [Direction: Sent (0x00)] |
| | HCI Packet Type: ACL Data (0x02) |
| | 0000 02 0b 00 0e 00 0a 00 01 00 05 0e 06 00 07 07 00 ................ |
| | 0010 00 00 00 |
|---------------------------------------------------------------------------------------|
| | Frame 381: 8 bytes on wire (64 bits), 8 bytes captured (64 bits) |
| | Bluetooth HCI H4 |
| 381 | [Direction: Rcvd (0x01)] |
| | HCI Packet Type: HCI Event (0x04) |
| | 0000 04 13 05 01 0b 00 01 00 ........ |
|---------------------------------------------------------------------------------------|
| | Frame 382: 23 bytes on wire (184 bits), 23 bytes captured (184 bits) |
| | Bluetooth HCI H4 |
| 382 | [Direction: Rcvd (0x01)] |
| | HCI Packet Type: ACL Data (0x02) |
| | 0000 02 0b 20 12 00 0e 00 01 00 05 12 0a 00 47 00 00 .. ..........G.. |
| | 0010 00 00 00 01 02 00 04 ....... |
+---------------------------------------------------------------------------------------+
我尝试使用pandas read_csv()
,但由于我对多行正则表达式选择的知识有限,我无法解决这个问题。有人能帮助我提出一个简单的解决方案吗?
2条答案
按热度按时间wfauudbj1#
另一种解决方案,使用
re
模块:图纸:
btqmn9zl2#
使用
extract
和groupby
:输出: