我正在尝试实现自己的内核回归,与sklearn库兼容。我的实现如下:
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin, TransformerMixin, RegressorMixin
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
from sklearn.utils.multiclass import unique_labels
from sklearn.metrics import euclidean_distances
import models.kernel as ker
class MyKerReg(BaseEstimator, RegressorMixin):
def __init__ (self, kernel = "gaussian", bandwidth = 1.0):
self.kernel = ker.kernel(kernel)
self.bandwidth = bandwidth
def fit(self, X, y):
X, y = check_X_y(X, y, accept_sparse=True, ensure_2d=False)
self.is_fitted_ = True
self.X_ = X
self.y_ = y
return self
def predict(self, X):
X = check_array(X, accept_sparse=True, ensure_2d=False)
check_is_fitted(self, 'is_fitted_')
pred = []
for x in X:
tmp = [x - v for v in self.X_]
ker_values = [(1/self.bandwidth)*self.kernel(v/self.bandwidth) for v in tmp]
ker_values = np.array(ker_values)
values = np.array(self.y_)
num = np.dot(ker_values.T, values)
denom = np.sum(ker_values)
pred.append(num/denom)
return pred
当我单独调用函数predict时,一切都工作得很好。当在cross_瓦尔_score中使用这个对象时,就像这样...
y, x = misc.data_generating_process(1000)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 44)
kr = ker_reg.MyKerReg(kernel = "gaussian", bandwidth = 0.5)
print(cross_val_score(kr, x_train, y_train, scoring="neg_mean_squared_error", cv=5))
...我得到以下错误:
Exception has occurred: RuntimeError
Cannot clone object MyKerReg(bandwidth=0.5, kernel=<models.kernel.kernel object at 0x7fab359bc940>), as the constructor either does not set or modifies parameter kernel
During handling of the above exception, another exception occurred:
File "/home/dragos/Projects/ML_Homework/kernel_regression/main.py", line 24, in main
print(cross_val_score(kr, x_train, y_train, scoring="neg_mean_squared_error", cv=5))
File "/home/dragos/Projects/ML_Homework/kernel_regression/main.py", line 85, in <module>
main()
有人知道怎么解决这个问题吗?我知道在这个主题上有类似的趋势,我还是想不出来。谢谢大家。
我已经阅读了关于这个主题的文档和文章,看起来我做的一切都是正确的。
1条答案
按热度按时间hpxqektj1#
__init__
方法应该将其参数设置为属性,不需要更改名称或验证。在您的示例中,self.kernel = ker.kernel(kernel)
是罪魁祸首。您可以将其移动到fit
的开头:在init中只保留self.kernel = kernel
,在fit中只保留self.kernel_ = ker.kernel(self.kernel)
。开发者指南:
__init__
接受的每个关键字参数都应该对应于示例上的一个属性。Scikit-learn在进行模型选择时依靠此来查找要在估计器上设置的相关属性。[...]
不应该有逻辑,甚至不应该有输入验证,参数也不应该改变。相应的逻辑应该放在使用参数的地方,通常在
fit
中。