pandas 对一系列panda Dataframe 实施训练-测试-验证拆分

xdyibdwo  于 2023-02-27  发布在  其他
关注(0)|答案(2)|浏览(112)

我想为一个 Dataframe 列表实现一个训练-测试-验证分割。
对于单个 Dataframe ,我可以:

train, validate, test = np.split(df.sample(frac=1, random_state=42), [int(.6*len(df)), int(.8*len(df))])

然而,我无法实现它的 Dataframe 列表。

import pandas as pd
train, validate, test = zip(*[(dfs[i][np.split(dfs[i].sample(frac=1, random_state), [int(.6*len(dfs[i])), int(.8*len(dfs[i]))]) for i in range(len(dfs))])])

追溯:

SyntaxError: positional argument follows keyword argument
xyhw6mcr

xyhw6mcr1#

由于zip方法试图将 Dataframe 解压缩到单独的变量中,因此我将通过将集合存储到单独的列表中来稍微不同地设计结构。
也许下面的代码可以帮助您解决问题:

# define your lists such as trainData = [] etc.

for df in dfs:
    splitData = np.split(df.sample(frac=1, random_state=42), [int(.6*len(df)), int(.8*len(df))])
    trainData.append(splitData[0])
    validateData.append(splitData[1])
    testData.append(splitData[2])

之后你可以通过相应的索引来处理它们。

vngu2lb8

vngu2lb82#

from sklearn.model_selection import train_test_split

# Define the size of the train, test, and validation sets
train_size = 0.7  # 70% of the data for training
test_size = 0.15  # 15% of the data for testing
val_size = 0.15   # 15% of the data for validation

# Split the list of dataframes into train and test sets
train_data, test_data = train_test_split(dataframes, test_size=test_size, random_state=42)

# Split the test set further into test and validation sets
test_data, val_data = train_test_split(test_data, test_size=val_size/(test_size+val_size), random_state=42)

相关问题