基于模式将csv文件拆分为多个文件

p5cysglq  于 2023-04-03  发布在  其他
关注(0)|答案(4)|浏览(158)

我有一个csv文件,结构如下:

time,magnitude
0,13517
292.5669,370
620.8469,528
0,377
832.3269,50187
5633.9419,3088
20795.0950,2922
21395.6879,2498
21768.2139,647
21881.2049,194
0,3566
292.5669,370
504.1510,712
1639.4800,287
46709.1749,365
46803.4400,500

我想将此csv文件拆分为单独的csv文件,如下所示:
文件一:

time,magnitude
0,13517
292.5669,370
620.8469,528

档案二:

time,magnitude
0,377
832.3269,50187
5633.9419,3088
20795.0950,2922
21395.6879,2498

等等。
我读过几篇类似的文章(例如,thisthisthis one),但他们都在一个列中搜索特定的值,并将每组值保存到单独的文件中。然而,在我的情况下,time列的值是不一样的。我想根据一个条件拆分base:If time = 0, save that row and all subsequent rows in a new file until the next time =0
有人能告诉我怎么做吗?

klr1opcd

klr1opcd1#

使用pandas,您可以使用groupby和 *boolean索引 *:

#pip install pandas
import pandas as pd

df = pd.read_csv("input_file.csv", sep=",") # <- change the sep if needed

for n, g in df.groupby(df["time"].eq(0).cumsum()):
    g.to_csv(f"file_{n}.csv", index=False, sep=",")

输出:

time  magnitude   # <- file_1.csv
  0.0000      13517
292.5669        370
620.8469        528

      time  magnitude # <- file_2.csv
    0.0000        377
  832.3269      50187
 5633.9419       3088
20795.0950       2922
21395.6879       2498
toe95027

toe950272#

datasplit.awk

#!/usr/bin/awk -f

BEGIN
{
    filename = "output_file_"
    fileext = ".csv"
    FS = ","

    c = 0
    file = filename c fileext
    getline
    header = $0
}
{
    if ($1 == 0){
        c = c + 1
        file = filename c fileext
        print header > file
        print $0 >> file
    } else {
        print >> file
    }
}

使文件可执行:

chmod +x datasplit.awk

从写入数据的文件夹开始:

datasplit.awk datafile
8cdiaqws

8cdiaqws3#

我冒昧地创建了一些类似于您提供的数据来测试我的解决方案。此外,我没有使用输入csv文件,而是使用了dataframe。下面是我的解决方案:

import pandas as pd
import numpy as np

# Create a random DataFrame

data = {
   'time': [0, 292.5669, 620.8469, 0, 832.3269, 5633.9419, 20795.0950, 21395.6879, 0, 230.5678, 456.8468, 0, 784.3265, 5445.9452, 20345.0980, 21095.6898],
   'magnitude': [13517, 370, 528, 377, 50187, 3088, 2922, 2498, 13000, 369, 527, 376, 50100, 3087, 2921, 2497]
}

df = pd.DataFrame(data)

# Function to split a DataFrame based on a pattern

def split_dataframe_by_pattern(df, output_prefix):
    file_count = 1
    current_group = pd.DataFrame(columns=df.columns)  # Initialize the current group

    for index, row in df.iterrows():
        if row['time'] == 0 and not current_group.empty:  # If time = 0 and the current group is not empty, create a new file
            output_file = f'{output_prefix}_{file_count}.csv'

            # Save the current group to the new file

            current_group.to_csv(output_file, index=False)
            current_group = pd.DataFrame(columns=df.columns)  # Reset the current group
            file_count += 1

        # Use pandas.concat to append the row to the current group
        current_group = pd.concat([current_group, row.to_frame().T], ignore_index=True)

    # Save the last group to a file

    current_group.to_csv(f'{output_prefix}_{file_count}.csv', index=False)

# Example usage:
output_prefix = 'output_file'
split_dataframe_by_pattern(df, output_prefix)

我的输出是四个csv文件:
output_file_1.csv

time,magnitude
0.0,13517.0
292.5669,370.0
620.8469,528.0

output_file_2.csv

time,magnitude
0.0,377.0
832.3269,50187.0
5633.9419,3088.0
20795.095,2922.0
21395.6879,2498.0

output_file_3.csv

time,magnitude
0.0,13000.0
230.5678,369.0
456.8468,527.0

output_file_4.csv

time,magnitude
0.0,376.0
784.3265,50100.0
5445.9452,3087.0
20345.098,2921.0
21095.6898,2497.0
owfi6suc

owfi6suc4#

你可以用panda很容易地做到这一点,就像这样:

import pandas as pd
df = pd.read_csv("mydata.csv")
last_idx = 0
file_idx = 0
for i,time in enumerate(df.time):
    if time == 0 and i != 0:
        df.iloc[last_idx:i].to_csv(f"mydata_{file_idx}.csv", index=None)
        file_idx += 1
        last_idx = i
df.iloc[last_idx:].to_csv(f"mydata_{file_idx}.csv", index=None)

相关问题