pandas 合并期间的问题:要么不得到数据输出,要么简单地通过将数据中的列加倍来合并它们

z4iuyo4d  于 2023-11-15  发布在  其他
关注(0)|答案(2)|浏览(160)
# Here's the first try:

# Create a custom function for merging the data together:
def getXDataMerged():    
print('Income Statement CSV data is(rows, columns): ', df1.shape)
print('Balance Sheet CSV data is: ', df2.shape)
print('Cash Flow CSV data is: ' , df3.shape)

# Merge the data together
    result = pd.merge(df1, df2, on=['Ticker', 'SimFinId', 'Currency',
                      'Fiscal Year', 'Fiscal Period', 'Report Date', 'Publish Date'], how='inner')    
    result = pd.merge(result, df3, on=['Ticker','SimFinId','Currency',
                    'Fiscal Year','Report Date','Publish Date'])        
    print('Merged X data matrix shape is: ', result.shape)    
    return result

# Use getXDataMerged() to retrieve some data, and then save it to a CSV file named "Annual_Stock_Price_Fundamentals.csv"
X = getXDataMerged()
X.to_csv("Annual_Stock_Price_Fundamentals.csv")

# Output for first try:

Income Statement CSV data is(rows, columns):  (17185, 28)
Balance Sheet CSV data is:  (17185, 30)
Cash Flow CSV data is:  (17185, 28)
Merged X data matrix shape is:  (0, 73)

# Second try (only changed the merging method to 'outer', everything else stays the same:
  
    # Merge the data together
    result = pd.merge(df1, df2, on=['Ticker', 'SimFinId', 'Currency',
                      'Fiscal Year', 'Fiscal Period', 'Report Date', 'Publish Date'], how='outer')    
    result = pd.merge(result, df3, on=['Ticker','SimFinId','Currency',
                    'Fiscal Year','Report Date','Publish Date'])        

# Output for second try:

Income Statement CSV data is(rows, columns):  (17185, 28)
Balance Sheet CSV data is:  (17185, 30)
Cash Flow CSV data is:  (17185, 28)
Merged X data matrix shape is:  (34370, 73)

字符串

我尝试使用'inner'合并数据,然后没有数据。

我尝试使用'outer'合并数据,然后将列加倍,但无法通过合并公共值对它们进行排序。

a1o7rhls

a1o7rhls1#

我看不到你的数据,所以很难判断问题出在哪里,但是当使用pd.merge()方法的'inner'语句时,新的框架将只有来自两个框架对象的匹配键,你试图合并。如果你在两个框架中有相同的键,那么使用'inner'总是好的,据我所知,你的数据是不可复制的,你有不同的值跨帧你试图合并,这就是为什么你没有合并后的值。例如。列'SimFindID'是不一样的,对于两个帧,这就是为什么它不匹配结果。尝试使用更少的列进行合并。
就合并的“外部”方法而言,story是相同的,但它给了你“不匹配”的数据行,所以这就是为什么你不会以任何方式对其进行排序,因为两个帧的所有行都是不同的。
如果你想实现的只是添加到框架中,使用pd.merge()方法的“cross”选项,或者查看两个框架中是否有匹配的键,只选择那些匹配的列。
如果你需要更多的帮助,请添加您的数据,让我看看它

uemypmqf

uemypmqf2#

我的方法:

def getXDataMerged(myLocalPath='C:/Users/...'):       
 # apply Pandas read that seperates data using delimiter = ; into different var names.
    incomeStatementData=pd.read_csv(myLocalPath+'us-income-annual.csv',
                                    delimiter=';')
    balanceSheetData=pd.read_csv(myLocalPath+'us-balance-annual.csv',
                                 delimiter=';')
    CashflowData=pd.read_csv(myLocalPath+'us-cashflow-annual.csv',
                             delimiter=';')
    # print information on the shapes of the data
    print('Income Statement CSV data is(rows, columns): ',
          incomeStatementData.shape)
    print('Balance Sheet CSV data is: ',
          balanceSheetData.shape)
    print('Cash Flow CSV data is: ' ,
          CashflowData.shape)
    # Merge the data together... merge the first two data together with the specific column names using on= and assign to 'result'
    result = pd.merge(incomeStatementData, balanceSheetData,\
                on=['Ticker','SimFinId','Currency',
                    'Fiscal Year','Report Date','Publish Date'])
    # update 'result' with merge with the third data on the same column names as before    
    result = pd.merge(result, CashflowData,\
                on=['Ticker','SimFinId','Currency',
                    'Fiscal Year','Report Date','Publish Date'])
    
    print('Merged X data matrix shape is: ', result.shape)
    
    return result

X = getXDataMerged()
X.to_csv("Annual_Stock_Price_Fundamentals.csv")

字符串
输出量:

Income Statement CSV data is(rows, columns):  (17213, 28)
Balance Sheet CSV data is:  (17213, 30)
Cash Flow CSV data is:  (17213, 28)
Merged X data matrix shape is:  (17213, 74)

相关问题