pandas 将特定值从一个pd Dataframe 传输到另一个pd Dataframe

ffscu2ro  于 2023-03-11  发布在  其他
关注(0)|答案(1)|浏览(135)

你能帮帮我吗?
我有两个panda Dataframe ,我需要从列表中第一个 Dataframe 的特定列中每隔10个值提取一个值。然后我需要在第二个 Dataframe 中创建一个新列,并将这些值放入此列。我还需要在两个 Dataframe 中匹配行号。以下是代码:

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

# we will use random forest classifier as our classifier
forest_classifier=RandomForestClassifier(max_depth=4)

# reading in accelerometer data
train_time = pd.read_csv("https://courses.edx.org/assets/courseware/v1/b98039c3648763aae4f153a6ed32f38b/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/train_time_series.csv", index_col=0)
train_labels = pd.read_csv("https://courses.edx.org/assets/courseware/v1/d64e74647423e525bbeb13f2884e9cfa/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/train_labels.csv", index_col=0)

x = []
y = []
z = []

# making lists out of the x, y, z columns
for i in range(3, len(train_time), 10):
    x.append(train_time.iloc[i]["x"])
    y.append(train_time.iloc[i]["y"])
    z.append(train_time.iloc[i]["z"])
    
print(z)  # checking the list

# making new columns in train_labels file with the obtained lists
train_labels["x"] = pd.Series([x])
train_labels["y"] = pd.Series([y])
train_labels["z"] = pd.Series([z])

train_labels.head()

但是我得到了输出,其中创建的列只有“NaN”的值
输出应该是包含创建的x、y、z列的 Dataframe ,这些列具有相应的观测数。

qhhrdooz

qhhrdooz1#

您可以堆叠xyz并用途:

train_labels[['x', 'y', 'z']] = np.stack([x, y, z], axis=1)

输出:

>>> train_labels
           timestamp                 UTC time  label         x         y         z
20589  1565109931087  2019-08-06T16:45:31.087      1 -0.053802 -0.987701  0.068985
20599  1565109932090  2019-08-06T16:45:32.090      1  0.013718 -0.852371 -0.000870
20609  1565109933092  2019-08-06T16:45:33.092      1  0.145584 -1.007843 -0.036819
20619  1565109934094  2019-08-06T16:45:34.094      1 -0.099380 -1.209686  0.304489
20629  1565109935097  2019-08-06T16:45:35.097      1  0.082794 -1.001434 -0.025375
...              ...                      ...    ...       ...       ...       ...
24289  1565110302030  2019-08-06T16:51:42.030      4 -0.641953 -1.469177  0.301041
24299  1565110303032  2019-08-06T16:51:43.032      4 -0.171616 -0.366074 -0.059082
24309  1565110304034  2019-08-06T16:51:44.034      4  0.401810 -1.077698  0.258911
24319  1565110305037  2019-08-06T16:51:45.037      4  0.330338 -1.470062  0.303894
24329  1565110306039  2019-08-06T16:51:46.039      4  0.689346 -0.991043  0.034973

[375 rows x 6 columns]

Note 1:您的代码不起作用,因为您创建pd.Series时没有使用train_labels中的索引,因此索引未对齐:

train_labels['x'] = pd.Series(x, index=train_labels.index)
train_labels['y'] = pd.Series(y, index=train_labels.index)
train_labels['z'] = pd.Series(z, index=train_labels.index)

注2:您也可以避免循环:

x, y, z = train_time.iloc[3::10][['x', 'y', 'z']].T.values

因此代码可能是:

x, y, z = train_time.iloc[3::10][['x', 'y', 'z']].T.values
train_labels[['x', 'y', 'z']] = np.stack([x, y, z], axis=1)

# Or (if you needn't x, y, and z)
train_labels[['x', 'y', 'z']] = train_time.iloc[3::10][['x', 'y', 'z']

相关问题