使用Pandas合并数据集

wi3ka0sx  于 2022-11-27  发布在  其他
关注(0)|答案(1)|浏览(138)

下面我有提供给我的代码,以便加入2个数据集.

import pandas as pd
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

df= pd.read_csv("student/student-por.csv")
ds= pd.read_csv("student/student-mat.csv")

print("before merge")

print(df)
print(ds)

print("After merging:")

dq = pd.merge(df,ds,by=c("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))

print(dq)

我得到这个错误:

Traceback (most recent call last):
  File "/Users/PycharmProjects/datamining/main.py", line 15, in <module>
    dq = pd.merge(df, ds,by=c ("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))
NameError: name 'c' is not defined

任何帮助都是很好的,我已经尝试了一段时间,我相信'by=c'是问题所在。
谢谢

bn31dyow

bn31dyow1#

嗨,👋🏻希望你一切顺利!
发生错误的原因是merge函数的参数中有一个c符号。另外,merge函数有一个不同的签名,它没有参数by,而是应该是on,它只接受列的列表🙂。因此,总结起来,它应该类似于以下内容:

import pandas as pd

df = pd.read_csv("student/student-por.csv")
ds = pd.read_csv("student/student-mat.csv")

print("Before merge.")
print(df)
print(ds)

print("After merge.")
dq = pd.merge(
    left=df,
    right=ds,
    on=[
        "school",
        "sex",
        "age",
        "address",
        "famsize",
        "Pstatus",
        "Medu",
        "Fedu",
        "Mjob",
        "Fjob",
        "reason",
        "nursery",
        "internet",
    ],
)
print(dq)

文件:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

相关问题