我有一个基本上由字符串行组成的文件。我试图将行的部分提取到字符串行之间的单独文件中。该文件如下所示:
**File Begins**
"Name: XXX_2"
"Description: Object 1210 , 111"
"Sampling_info: statexy=1346"
"Num value: 15"
"32 707; 33 71; 37 11; 38 3; 40 146; "
"41 64; 42 36; 43 24; 44 69; 45 324; "
"46 49; 47 52; 50 11; 51 90; 52 22; "
"Name: XXX_3"
"Description: Object 1341 , 111"
"Sampling_info: statexy=1346"
"Num value: 18"
"32 999; 33 4; 34 17; 39 84; 41 84; "
"42 4; 44 137; 45 102; 50 13; 52 22; "
"53 4; 54 4; 55 84; 58 40; 59 13; "
"65 57; 66 13; 67 173; "
"Name: XXX_4"
"Description: Object 1561 , 111"
"Sampling_info: statexy=1346"
"Num value: 21"
"32 925; 34 5; 40 409; 41 55; 44 43; "
"45 154; 46 5; 47 5; 50 38; 52 16; "
"56 99; 58 5; 59 110; 61 5; 62 55; "
"63 11; 68 5; 69 38; 70 22; 73 999; "
"74 49; "
"Name: XXX_5"
**And then the next entry begins**
我想得到“Num value:15”和“名称:XXX_3”,同时排除这两行,并将其放入自己的文本文件中。接下来的两个条目也是如此。这将在for循环或其他循环中实现,以将文件中的所有独立条目提取到它们自己的文件中。
我尝试了str_match,但它返回NA:
str_match(data, "Name: UNK_1\\s*(.*?)\\s*Name: UNK_2")
我也尝试了gsub,但它返回了整个文件...:
gsub(".*Name: UNK_1 (.+) Name: UNK_2.*", "\\1", data)
str_match和gsub的实现有什么问题吗?
提前感谢!
3条答案
按热度按时间41ik7eoe1#
这样的事情怎么样:
3phpmpom2#
一种无循环的方法:
输出(列
value
包含的名称来自初始名称:xxx行)您可能希望对上述管道进行分区,并检查中间 Dataframe ,以了解在哪个步骤发生了什么。
htrmnn0y3#
with
base
andfor
, and various notes.index msp_list, for start and end of future sub dfs. I am assuming you .mps is properly formed, which is to say all are complete (I've wished away your line 25 above).
At this point we have what we need in the global environment to inform the operations in the for loop so it won't complain that some value isn't found. And things are sufficiently detailed to operate on one index (i.e. not nested
i,j
), well, at least I hope, and we'll do a bunch of data cleaning here, but hopefully return an extracted list of .msp values in a two column df each (that basically relies on the regularity of the .msp file formatExtending:
so can be done with a
for
loop. The accessing/addressing in this list stuff may be a little less apparent when reaching into col1, col2 values as you havewhich is unexpected, at first.