pandas 如何在Python中减去列数据

kb5ga3dv  于 2023-09-29  发布在  Python
关注(0)|答案(1)|浏览(134)

我有这个代码,需要计算:[diff =最大-最小]。我在最后一行代码中得到了一个错误,我正在尝试减法。我已经使用了pd.concat函数,我应该使用merge函数吗?

import numpy as np
import pandas as pd

ts = ['25/02/2023 0:00', '25/02/2023 0:01', '25/02/2023 0:02', 
      '25/02/2023 0:03', '25/02/2023 0:04', '25/02/2023 0:05', 
      '25/02/2023 0:06', '25/02/2023 0:07', '25/02/2023 0:08', 
      '25/02/2023 0:09', '25/02/2023 0:10', '25/02/2023 0:11', 
      '25/02/2023 0:12', '25/02/2023 0:13', '25/02/2023 0:14', 
      '25/02/2023 0:15', '25/02/2023 0:16', '25/02/2023 0:17', 
      '25/02/2023 0:18', '25/02/2023 0:19', '25/02/2023 0:20', 
      '25/02/2023 0:21', '25/02/2023 0:22', '25/02/2023 0:23', 
      '25/02/2023 0:24', '25/02/2023 0:25', '25/02/2023 0:26', 
      '25/02/2023 0:27', '25/02/2023 0:28', '25/02/2023 0:29', 
      '25/02/2023 0:30', '25/02/2023 0:31', '25/02/2023 0:32', 
      '25/02/2023 0:33', '25/02/2023 0:34', '25/02/2023 0:35', 
      '25/02/2023 0:36', '25/02/2023 0:37', '25/02/2023 0:38', 
      '25/02/2023 0:39', '25/02/2023 0:40', '25/02/2023 0:41', 
      '25/02/2023 0:42', '25/02/2023 0:43', '25/02/2023 0:44', 
      '25/02/2023 0:45', '25/02/2023 0:46', '25/02/2023 0:47', 
      '25/02/2023 0:48', '25/02/2023 0:49', '25/02/2023 0:50', 
      '25/02/2023 0:51', '25/02/2023 0:52', '25/02/2023 0:53', 
      '25/02/2023 0:54', '25/02/2023 0:55', '25/02/2023 0:56', 
      '25/02/2023 0:57', '25/02/2023 0:58', '25/02/2023 0:59', 
      '25/02/2023 1:00']

temp = ['0', '21', '20', '30', '40', '50', '6', '7', '8', '9', 
        '10', '11', '12', '13', '14', '15', '16', '17', '18', 
        '19', '20', '21', '22', '23', '24', '25', '26', '27', 
        '28', '29', '68', '31', '32', '33', '34', '35', '36', 
        '37', '38', '39', '40', '41', '42', '43', '44', '45', 
        '46', '47', '48', '49', '50', '51', '52', '53', '54', 
        '55', '56', '57', '58', '59', '60', '61', '62']

df = pd.DataFrame(list(zip(ts, temp)),
                  columns = ['ts', 'temp'])
df['ts'] = pd.to_datetime(df['ts'])
df1 = df.set_index('ts')
print(df1)

df2 = df1.rolling(1, step=1).agg(['min'])  
df3 = df2[df2.index.minute.isin([0,30])]      
df31 = pd.DataFrame(df3)
df32 = df31.reset_index()

df4 = df1.rolling(1, step=1).agg(['max'])
df5 = df4[df4.index.minute.isin([13,43])]  
df51 = pd.DataFrame(df5)
df52 = df51.reset_index()

df6 = pd.concat([df32, df52], axis=1, join='inner')
df7 = pd.DataFrame(df6)

df7['diff_1'] = df7.apply(lambda x: x['temp max'] - 
x['temp min'], axis=0)
df7

输出如下,我想减去最小从最大和显示

ts  temp                  ts  temp
                        min                       max
0 2023-02-25 00:00:00   0.0 2023-02-25 00:13:00  13.0
1 2023-02-25 00:30:00  68.0 2023-02-25 00:43:00  43.0
2 2023-02-25 01:00:00  60.0 2023-02-25 01:13:00  73.0
ryhaxcpt

ryhaxcpt1#

代码中的错误可能源于agg(['min'])agg(['max'])操作创建的多级列索引。这些列变成了层次结构,使得访问它们以进行进一步的操作(如减法)更加麻烦。
我没有使用pd.concat,而是使用pd.merge来合并这两个DataFrame。我还确保temp的值是浮点数,以方便数学运算。最后,我直接在合并的DataFrame中计算差异。

import pandas as pd

# Sample data
ts = ['25/02/2023 0:00', '25/02/2023 0:01', '25/02/2023 0:12', '25/02/2023 0:13']
temp = ['0', '21', '12', '13']

# Create the DataFrame
df = pd.DataFrame(list(zip(ts, temp)), columns=['ts', 'temp'])
df['ts'] = pd.to_datetime(df['ts'])
df['temp'] = df['temp'].astype(float)  # Convert to float for calculations
df.set_index('ts', inplace=True)

# Rolling calculations for min and max
df_min = df.rolling(window='30T').min()
df_max = df.rolling(window='30T').max()

# Reset index for merge
df_min.reset_index(inplace=True)
df_max.reset_index(inplace=True)

# Merge the two DataFrames
df_merged = pd.merge(df_min, df_max, on='ts', suffixes=('_min', '_max'))

# Calculate the difference
df_merged['diff'] = df_merged['temp_max'] - df_merged['temp_min']

df_merged

这对你有用吗?

# Sample data
ts = ['25/02/2023 0:00', '25/02/2023 0:01', '25/02/2023 0:12', '25/02/2023 0:13']
temp = ['0', '21', '12', '13']

# Create DataFrame and set index
df = pd.DataFrame(list(zip(ts, temp)), columns=['ts', 'temp'])
df['ts'] = pd.to_datetime(df['ts'])
df['temp'] = df['temp'].astype(float)  # Convert to float for calculations
df1 = df.set_index('ts')

# Rolling min and filter for specific minutes
df2 = df1.rolling('1T').min()
df3 = df2[df2.index.minute.isin([0, 30])]  # Filtering for minutes 0 and 30
df32 = df3.reset_index()

# Rolling max and filter for specific minutes
df4 = df1.rolling('1T').max()
df5 = df4[df4.index.minute.isin([13, 43])]  # Filtering for minutes 13 and 43
df52 = df5.reset_index()

df6 = pd.concat([df32, df52], axis=1, ignore_index=True)
df6.columns = ['ts', 'temp_min', 'ts', 'temp_max']  # ts_min and ts_max are both named 'ts'

# Calculate the difference
df6['diff'] = df6['temp_max'] - df6['temp_min']  # Column name changed to 'diff'

df6

相关问题