scipy 时间序列线性回归模型结果与Excel中的线性回归不匹配

kuuvgm7e  于 2023-10-20  发布在  其他
关注(0)|答案(2)|浏览(133)

我使用seaborn和sklearn为时间序列数据集创建了一个线性回归模型。两个模型(seaborn和sklearn)对于一个简单的线性模型y = mx + b输出相同的斜率和截距。斜率与excel结果匹配。但是,在python中使用这两种方法得到的截距-35874.5873与在excel中使用-1404.3得到的截距非常不同。
我的python代码中有一个设置不正确吗?模型计算是否不同?
这是Excel的数据。

Date Column: 
1/1/2002
4/1/2002
7/1/2002
10/1/2002
1/1/2003
4/1/2003
7/1/2003
10/1/2003
1/1/2004

Bicarbonate Column:
446
450
454
483
457
465
465
474
495

Python脚本如下:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn import linear_model

df = pd.read_excel(r'TestData.xls')
print(df)

Bicarbonate = df['Bicarbonate']
Date = df['Date']
DateO = df['Date'].apply(lambda x: x.toordinal())
df['DateO'] = DateO
print(DateO)

# Plotting the regression model with seaborn
ax1 = sns.regplot(x = 'DateO', y = 'Bicarbonate', data=df, color='magenta', label='Linear Model', ci =None, scatter=True)

# calculate slope and intercept of regression equation.
slope, intercept, r, p, se = scipy.stats.linregress(x=ax1.get_lines()[0].get_xdata(),
                                                       y=ax1.get_lines()[0].get_ydata())

print(slope)
print(intercept)
print(p)

# Linear Regression with sklearn.
x = df['DateO'].values.reshape(-1, 1)
y = df['Bicarbonate'].values.reshape(-1,1)
model = linear_model.LinearRegression().fit(x,y)
print('intercept:', model.intercept_)
print('slope:', model.coef_)
6rqinv9w

6rqinv9w1#

excel和python给予不同的截取的原因是基于两个软件如何处理日期时间。
日期时间序列回归的截距没有任何内在意义,而是取决于0的定义。

在excel中:

DateTime从1970/01/01开始,所以如果您将任何日期转换为数字1970/01/01将是1,并且每个日期将基于此分配一个数字。

python中:

您用于按toordinal()转换的datetime软件包将0000/01/01视为开始。
如果你真的想把这两个日期对齐(不知道为什么),你必须把你的序号日期减去719163(“1970/01/01”的序号)。

yrdbyhpb

yrdbyhpb2#

  • How to plot a regression line on a timeseries line plot中所示
  • 使用x_test = np.arange(0, ax1.get_xlim()[1]).reshape(-1, 1)预测model.predict(x_test)并绘制数据,以查看它在计算点处的交叉。
  • 'Date'数据已被转换为序数,并且序数日期1对应于'0001-01-01 00:00:00'。考虑到这比数据中最早的日期早≈1971年,截距是有意义的,特别是考虑到0.00243的平缓斜率。
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn import linear_model
import numpy as np
from datetime import datetime
import yfinance as yf  # for sample data; conda install -c conda-forge yfinance

# download sample data
df = pd.concat((yf.download(ticker, start='1970-02-01', end='2023-09-06').assign(tkr=ticker) for ticker in ['pfe']), ignore_index=False).reset_index()

Bicarbonate = df['Open']
Date = df['Date']
df['DateO'] = df['Date'].apply(lambda x: x.toordinal())

# Plotting the regression model with seaborn
fig, ax1 = plt.subplots(figsize=(12, 8))
sns.regplot(x = 'DateO', y = 'Open', data=df, color='magenta', label='Linear Model', ci =None, scatter=True, ax=ax1)

# calculate slope and intercept of regression equation.
slope, intercept, r, p, se = scipy.stats.linregress(x=ax1.get_lines()[0].get_xdata(), y=ax1.get_lines()[0].get_ydata())

print('scipy intercept:', intercept)
print('scipy slope:', slope)

# Linear Regression with sklearn.
x = df['DateO'].values.reshape(-1, 1)
y = df['Open'].values.reshape(-1,1)
model = linear_model.LinearRegression().fit(x,y)
print('sklearn intercept:', model.intercept_)
print('sklearn slope:', model.coef_)

# test data with x to 0
x_test = np.arange(0, ax1.get_xlim()[1]).reshape(-1, 1)

# predicted y values
y_pred = model.predict(x_test)

# plot
ax1.plot(x_test, y_pred)

fig.suptitle(f'At x=0 the Linear Model cross y at {round(intercept)}')

ax1.margins(0)
scipy intercept: -1757.1923682739996
scipy slope: 0.002432274440853285
sklearn intercept: [-1757.19236827]
sklearn slope: [[0.00243227]]

  • 为了与Excel匹配,请将日期从1970转换为.total_seconds
# download sample data
df = pd.concat((yf.download(ticker, start='1970-02-01', end='2023-09-06').assign(tkr=ticker) for ticker in ['pfe']), ignore_index=False).reset_index()

Bicarbonate = df['Open']
Date = df['Date']

# total seconds from 1970
df['DateO'] = (df['Date'] - datetime.fromisoformat('1970-01-01')).dt.total_seconds()

# Plotting the regression model with seaborn
fig, ax1 = plt.subplots(figsize=(12, 8))
sns.regplot(x = 'DateO', y = 'Open', data=df, color='magenta', label='Linear Model', ci =None, scatter=True, ax=ax1)

# calculate slope and intercept of regression equation.
slope, intercept, r, p, se = scipy.stats.linregress(x=ax1.get_lines()[0].get_xdata(), y=ax1.get_lines()[0].get_ydata())

print('scipy intercept:', intercept)
print('scipy slope:', slope)

# Linear Regression with sklearn.
x = df['DateO'].values.reshape(-1, 1)
y = df['Open'].values.reshape(-1, 1)
model = linear_model.LinearRegression().fit(x,y)
print('sklearn intercept:', model.intercept_)
print('sklearn slope:', model.coef_)

# test data with x to 0
x_test = np.arange(0, df['DateO'].max(), 10000).reshape(-1, 1)

# predicted y values
y_pred = model.predict(x_test)

# plot
ax1.plot(x_test, y_pred)

fig.suptitle(f'At x=0 the Linear Model cross y at {round(intercept)}')

ax1.margins(0)

plt.show()
scipy intercept: -7.990584566627653
scipy slope: 2.815132454691304e-08
sklearn intercept: [-7.99058457]
sklearn slope: [[2.81513245e-08]]

相关问题