我试图复制论文Recommendation for Effective Standardized Exam Preparation,其中他们使用了一个特殊的术语,称为正确性概率函数,并被描述为:
正确性概率函数p
,我们使用this paper中引入的矩阵分解模型。利用用户的问题-响应序列,我们找到了分解X = L*R_Transpose
,该分解使用Frobenius范数正则化来最小化二进制交叉熵(BCE)损失,其中L = (L_uj )
表示学生u
对隐藏概念j
的理解,R = (R_qj )
表示隐藏概念j
对问题q
的贡献。条目X = (X_uq)
表示学生u
对问题q
的理解,并且基于IRT中的M2 PR潜在特质模型的变化使用X_uq
计算回答正确概率p(q|I_u ) = P_uq
,给予nin this paper。
然后他们继续定义确切的过程,如下图所示:
现在他们有了一个名为phi
的Sigmoid修改版本,它使用了3个额外的参数作为phi_a = 0.25, phi_b = 0.5, phi_c = 10
。我对phi
的实现如下:
# Modified sigmoid function
def phi(x, phi_a = phi_a, phi_b = phi_b, phi_c = phi_c):
'''
Putting value of X in phi will give the probability score of that question
According to the paper it is given that Custom sigmoid = phi:
φ(x) = φ_a + ((1 - φ_a) / (1 + e^ -φ_c(x - φ_b ))) and values of (φ_a , φ_b , φ_c ) are (0.25, 0.5, 10) respectively
'''
return phi_a + (1 - phi_a) / (1 + torch.exp(-phi_c * (x - phi_b)))
他们有一个自定义的Optimization Problem
,我们称之为custom loss
,我定义如下:
# Custom loss function
def custom_loss(Y_true, Y_pred, L, R, mu):
'''
In the paper, it given as:
BCE Loss + ((mu/2) * (Frobenius norm of L squared + Frobenius norm of R squared))
and Frobenius_norm(L) = √(Σ (L_ij)^2) so the authors might have used square just to eliminate the square root value so it becomes: Σ(L_ij)^2
'''
bce_loss = torch.nn.BCELoss()(Y_pred, Y_true)
frobenius_norm_L = torch.sum(L ** 2)
frobenius_norm_R = torch.sum(R ** 2)
reg_loss = (mu / 2) * (frobenius_norm_L + frobenius_norm_R)
return bce_loss + reg_loss
有一些条件在下面的图像中给出:
0 <= L_uj <= 1
:Student - Latent Topic
矩阵L
中的所有条目将在[0,1]之间。0 <= R_qj <= 1
:Latent Topic - Question
矩阵R
中的所有条目将在[0,1]之间
1.每行中R_qj
的所有元素之和必须等于1。这意味着如果每个问题都由n_topics
表示,那么每个问题的n_topics
的总和将完全等于1(像Softmax)
1.论文采用随机梯度下降(SGD)方法求解矩阵分解问题
然后不知何故(在很多帮助下不择手段),我到达了一个点,我遇到了一些解决方案:
# Dummy numbers
n_students = 50
n_questions = 700
n_concepts = 15
# Hyperparameters
n_epochs = 10
learning_rate = 0.01
mu = 0.1 # regularization
# Given constants for the phi() function
phi_a, phi_b, phi_c = 0.25, 0.5, 10
# ---------------------------------------------------
# Generate random student responses (1 if correct, 0 otherwise)
Y = np.random.randint(0, 2, (n_students, n_questions))
L = np.random.rand(n_students, n_concepts) # Initialize L and R matrices randomly
R = np.random.rand(n_questions, n_concepts)
optimizer = torch.optim.SGD([L, R], lr=learning_rate) # SGD as given
# -------------------------------------------------------------
# Training loop
for epoch in range(n_epochs):
optimizer.zero_grad()
X = torch.matmul(L, torch.transpose(R, 0, 1))
Y_pred = phi(X) # predicted score
loss = custom_loss(Y, Y_pred, L, R, mu)
loss.backward()
optimizer.step()
# Enforce 3 conditions
R.data = R.data / R.data.sum(axis=1, keepdims=True) # Normalize rows of R to sum up to 1
L.data = torch.clamp(L.data, min=0, max=1) # 0 ≤ L[u, j] ≤ 1
R.data = torch.clamp(R.data, min=0, max=1) # 0 ≤ R[q, j] ≤ 1
# Calculate the optimized understanding matrix X_opt (n_students x n_questions)
X_opt = torch.matmul(L, torch.transpose(R, 0, 1))
# --- Evaluate --------------------------------
# Calculate the probability score for a student u and question q using the modified sigmoid function
u, q = 0, 4
probability_score = phi(X_opt[u, q]) # Given in the formula
print(f"Probability score for student {u} and question {q}: {probability_score.item()}")
现在的问题是,我不知道它是否是正确的解决方案,并给出了它应该给出的结果。
PS:如果有其他方法可以解决这个问题,请一定要让我知道。我只是想用Any库复制这张纸。
1条答案
按热度按时间guicsvcw1#
你所做的一切似乎都是合理的。
缺少的部分是合成数据生成:
随机化
L
和R
,然后获得然后以同样的方式生成
Y_validation
。Y_train
用于训练循环,Y_validation
用于评估最终性能。