在下面的“示例数据框”中,“标签”列中有三个唯一标签:“A”、“B”和“C”。
我想使用线性回归预测当“值2”为65000000时“值1”对于“A”、“B”和“C”的未来值。
** Dataframe 示例**
import pandas as pd
data = {'Label': ['A','A','A','A','A','A','B','B','B','B','B','B','C','C','C','C','C','C'],
'Value1': ['1672964520','1672966620','1672967460','1672969380','1672971840',
'1672972200','1672963800','1672966140', '1672967760','1672969020',
'1672970520', '1672971360','1672963200','1672964700','1672966260',
'1672967820', '1672969980', '1672971180'],
'Value2': ['54727520', '54729380', '54740070', '54744720', '54775410', '54779130',
'59598560','59603190','59605060','59611320','59628900','59630950',
'58047810','58049680','58051550','58058460','58068740','58088280']}
df=pd.DataFrame(data)
print (df)
当“A”是 Dataframe 中的唯一标签时,我可以预测“Value1”的未来值**(见下文)**。
但是,我在示例 Dataframe 中应用这种方法时遇到了麻烦。有没有一种简单的方法可以修改此代码,以便为示例 Dataframe 中找到的任何标签预测“Value1”?
所需输出值1:A =“X”、B =“Y”、C =“Z”等的预测值
data = {'Label': ['A','A','A','A','A','A',],
'Value1': ['1672964520','1672966620','1672967460','1672969380','1672971840', '1672972200'],
'Value2': ['54727520', '54729380', '54740070', '54744720', '54775410', '54779130']}
# Create dataframe using data
df = pd.DataFrame(data)
# Change Value1 and Value2 from obj to float64
df["Value1"] = df.Value1.astype("int64")
df["Value2"] = df.Value2.astype("int")
# Calc means for x and y respectively
xmean = np.mean(df["Value1"])
ymean = np.mean(df["Value2"])
# Calc numerator and denominator of beta
df["xyCov"] = (df["Value1"] - xmean) * (df["Value2"] - ymean)
df["xVar"] = (df["Value2"] - xmean) ** 2
# Calc beta and alpha
beta = df["xyCov"].sum() / df["xVar"].sum()
alpha = ymean - (beta * xmean)
# Calc anode due date timestamp
Predicted_Value1 = (65000000 - alpha) / beta
# Convert timestamp to datetime
print("Future A value", Predicted_Value1)
1条答案
按热度按时间rqqzpn5f1#
下面是使用Pandas groupby和Python f字符串处理示例 Dataframe 的一种方法:
其输出: