基于模式将csv文件拆分为多个文件

p5cysglq 于 2023-04-03 发布在其他

关注(0)|答案(4)|浏览(171)

我有一个csv文件，结构如下：

time,magnitude
0,13517
292.5669,370
620.8469,528
0,377
832.3269,50187
5633.9419,3088
20795.0950,2922
21395.6879,2498
21768.2139,647
21881.2049,194
0,3566
292.5669,370
504.1510,712
1639.4800,287
46709.1749,365
46803.4400,500

我想将此csv文件拆分为单独的csv文件，如下所示：
文件一：

time,magnitude
0,13517
292.5669,370
620.8469,528

档案二：

time,magnitude
0,377
832.3269,50187
5633.9419,3088
20795.0950,2922
21395.6879,2498

等等。
我读过几篇类似的文章（例如，this，this或this one），但他们都在一个列中搜索特定的值，并将每组值保存到单独的文件中。然而，在我的情况下，time列的值是不一样的。我想根据一个条件拆分base：If time = 0, save that row and all subsequent rows in a new file until the next time =0。
有人能告诉我怎么做吗？

csv

来源：https://stackoverflow.com/questions/75908794/split-a-csv-file-into-multiple-files-based-on-a-pattern

4条答案

按热度按时间

klr1opcd1#

使用pandas，您可以使用groupby和 *boolean索引 *：

#pip install pandas
import pandas as pd

df = pd.read_csv("input_file.csv", sep=",") # <- change the sep if needed

for n, g in df.groupby(df["time"].eq(0).cumsum()):
    g.to_csv(f"file_{n}.csv", index=False, sep=",")

输出：

time  magnitude   # <- file_1.csv
  0.0000      13517
292.5669        370
620.8469        528

      time  magnitude # <- file_2.csv
    0.0000        377
  832.3269      50187
 5633.9419       3088
20795.0950       2922
21395.6879       2498

赞(0）回复(0）举报 2023-04-03

toe950272#

datasplit.awk

#!/usr/bin/awk -f

BEGIN
{
    filename = "output_file_"
    fileext = ".csv"
    FS = ","

    c = 0
    file = filename c fileext
    getline
    header = $0
}
{
    if ($1 == 0){
        c = c + 1
        file = filename c fileext
        print header > file
        print $0 >> file
    } else {
        print >> file
    }
}

使文件可执行：

chmod +x datasplit.awk

从写入数据的文件夹开始：

datasplit.awk datafile

赞(0）回复(0）举报 2023-04-03

8cdiaqws3#

我冒昧地创建了一些类似于您提供的数据来测试我的解决方案。此外，我没有使用输入csv文件，而是使用了dataframe。下面是我的解决方案：

import pandas as pd
import numpy as np

# Create a random DataFrame

data = {
   'time': [0, 292.5669, 620.8469, 0, 832.3269, 5633.9419, 20795.0950, 21395.6879, 0, 230.5678, 456.8468, 0, 784.3265, 5445.9452, 20345.0980, 21095.6898],
   'magnitude': [13517, 370, 528, 377, 50187, 3088, 2922, 2498, 13000, 369, 527, 376, 50100, 3087, 2921, 2497]
}

df = pd.DataFrame(data)

# Function to split a DataFrame based on a pattern

def split_dataframe_by_pattern(df, output_prefix):
    file_count = 1
    current_group = pd.DataFrame(columns=df.columns)  # Initialize the current group

    for index, row in df.iterrows():
        if row['time'] == 0 and not current_group.empty:  # If time = 0 and the current group is not empty, create a new file
            output_file = f'{output_prefix}_{file_count}.csv'

            # Save the current group to the new file

            current_group.to_csv(output_file, index=False)
            current_group = pd.DataFrame(columns=df.columns)  # Reset the current group
            file_count += 1

        # Use pandas.concat to append the row to the current group
        current_group = pd.concat([current_group, row.to_frame().T], ignore_index=True)

    # Save the last group to a file

    current_group.to_csv(f'{output_prefix}_{file_count}.csv', index=False)

# Example usage:
output_prefix = 'output_file'
split_dataframe_by_pattern(df, output_prefix)

我的输出是四个csv文件：
output_file_1.csv

time,magnitude
0.0,13517.0
292.5669,370.0
620.8469,528.0

output_file_2.csv

time,magnitude
0.0,377.0
832.3269,50187.0
5633.9419,3088.0
20795.095,2922.0
21395.6879,2498.0

output_file_3.csv

time,magnitude
0.0,13000.0
230.5678,369.0
456.8468,527.0

output_file_4.csv

time,magnitude
0.0,376.0
784.3265,50100.0
5445.9452,3087.0
20345.098,2921.0
21095.6898,2497.0

赞(0）回复(0）举报 2023-04-03

owfi6suc4#

你可以用panda很容易地做到这一点，就像这样：

import pandas as pd
df = pd.read_csv("mydata.csv")
last_idx = 0
file_idx = 0
for i,time in enumerate(df.time):
    if time == 0 and i != 0:
        df.iloc[last_idx:i].to_csv(f"mydata_{file_idx}.csv", index=None)
        file_idx += 1
        last_idx = i
df.iloc[last_idx:].to_csv(f"mydata_{file_idx}.csv", index=None)

赞(0）回复(0）举报 2023-04-03

我来回答

基于模式将csv文件拆分为多个文件

4条答案

相关问题

热门标签

最新问答