pandas 使用线性回归查找Dataframe中唯一列条目的未来值

f0ofjuux  于 2023-01-15  发布在  其他
关注(0)|答案(1)|浏览(148)

在下面的“示例数据框”中,“标签”列中有三个唯一标签:“A”、“B”和“C”。
我想使用线性回归预测当“值2”为65000000时“值1”对于“A”、“B”和“C”的未来值。

** Dataframe 示例**

import pandas as pd
data = {'Label': ['A','A','A','A','A','A','B','B','B','B','B','B','C','C','C','C','C','C'],
        'Value1': ['1672964520','1672966620','1672967460','1672969380','1672971840',
                   '1672972200','1672963800','1672966140', '1672967760','1672969020',
                   '1672970520', '1672971360','1672963200','1672964700','1672966260',
                   '1672967820', '1672969980', '1672971180'],
        'Value2': ['54727520', '54729380', '54740070', '54744720', '54775410', '54779130',
                   '59598560','59603190','59605060','59611320','59628900','59630950',
                   '58047810','58049680','58051550','58058460','58068740','58088280']}
df=pd.DataFrame(data)
print (df)

当“A”是 Dataframe 中的唯一标签时,我可以预测“Value1”的未来值**(见下文)**。
但是,我在示例 Dataframe 中应用这种方法时遇到了麻烦。有没有一种简单的方法可以修改此代码,以便为示例 Dataframe 中找到的任何标签预测“Value1”?
所需输出值1:A =“X”、B =“Y”、C =“Z”等的预测值

data = {'Label': ['A','A','A','A','A','A',],
        'Value1': ['1672964520','1672966620','1672967460','1672969380','1672971840', '1672972200'],
        'Value2': ['54727520', '54729380', '54740070', '54744720', '54775410', '54779130']}

# Create dataframe using data
df = pd.DataFrame(data)
# Change Value1 and Value2 from obj to float64
df["Value1"] = df.Value1.astype("int64")
df["Value2"] = df.Value2.astype("int")
# Calc means for x and y respectively
xmean = np.mean(df["Value1"])
ymean = np.mean(df["Value2"])
# Calc numerator and denominator of beta
df["xyCov"] = (df["Value1"] - xmean) * (df["Value2"] - ymean)
df["xVar"] = (df["Value2"] - xmean) ** 2
# Calc beta and alpha
beta = df["xyCov"].sum() / df["xVar"].sum()
alpha = ymean - (beta * xmean)
# Calc anode due date timestamp
Predicted_Value1 = (65000000 - alpha) / beta
# Convert timestamp to datetime
print("Future A value", Predicted_Value1)
rqqzpn5f

rqqzpn5f1#

下面是使用Pandas groupby和Python f字符串处理示例 Dataframe 的一种方法:

for label, df_ in df.groupby("Label"):
    # Change Value1 and Value2 from obj to float64
    df_["Value1"] = df_.Value1.astype("int64")
    df_["Value2"] = df_.Value2.astype("int")

    # Calc means for x and y respectively
    xmean = np.mean(df_["Value1"])
    ymean = np.mean(df_["Value2"])

    # Calc numerator and denominator of beta
    df_["xyCov"] = (df_["Value1"] - xmean) * (df_["Value2"] - ymean)
    df_["xVar"] = (df_["Value2"] - xmean) ** 2

    # Calc beta and alpha
    beta = df_["xyCov"].sum() / df_["xVar"].sum()
    alpha = ymean - (beta * xmean)

    # Calc anode due date timestamp
    Predicted_Value1 = (65000000 - alpha) / beta

    # Convert timestamp to datetime
    print(f"Future {label} value", Predicted_Value1)

其输出:

Future A value 4.922122808656915e+17
Future B value 4.68780950852079e+17
Future C value 4.970684516509964e+17

相关问题