pytorch 正在计算此问题的多标签召回

6xfqseft 于 2022-12-04 发布在其他

关注(0)|答案(2)|浏览(163)

bounty将在6天后过期。回答此问题可获得+50的声望奖励。konstant正在寻找标准答案。

我有一个有两列的表，一行的两个条目显示它们是相关的：
| 列1|列2|
| - -|- -|
| 一种|A级|
| B| B|
| 一种|C语言|
| C语言|A级|
| B| D级|
这里a与A, C相关，b与B, D相关，c与A相关，意味着x1M6 N1 x中的相同条目可能具有相关的x1M7 N1 x中的多个标签。我训练了一个Machine Learning模型来量化Col1和Col2之间的关系，方法是创建Col1和Col2的向量嵌入，并优化两个向量之间的cosine_similarity。现在，我想通过在测试集上计算Recall来测试我的模型。我想检查在不同的recall@N下，这些正关系中有多大比例可以被检索到。假设我已经对每列中的所有条目的向量表示进行了归一化，那么我可以计算它们之间的余弦距离：

cosine_distance = torch.mm(col1_feature, col2_feature.t())

pytorch

来源：https://stackoverflow.com/questions/74633636/calculating-multilabel-recall-for-this-problem

2条答案

按热度按时间

4nkexdtk1#

您可以使用群集算法将Col1和Col2中的条目分组到群集中。然后，您可以使用MultilabelRecall度量来计算每个群集的召回。这样，您就不必为Col1中的每个条目指定标签数。

赞(0）回复(0）举报 2022-12-04

efzxgjgh2#

如果表中有大量的行，则计算Col 1和Col 2中所有条目对之间的余弦距离可能效率不高。提高效率的一种方法是使用近似最近邻（ANN）算法，可以在高维空间中快速找到最接近的向量。这些算法通常涉及构建允许高效搜索的数据结构，例如k-d树或局部敏感散列法。一旦建立了这种数据结构，就可以使用它来快速查找Col 2中与Col 1中给定条目最接近的条目，然后计算这些条目的recall@k。
下面是一个例子，说明如何使用ANN算法来计算召回率@k。这段代码使用scikit-learn库中的k-d树实现来索引Col 1和Col 2中的向量，然后使用k-d树找到Col 1中每个向量的最近邻居。然后计算Col 1中每个向量的最近邻居的召回率@k。

from sklearn.neighbors import KDTree

# Create a k-d tree to index the vectors in Col1 and Col2
tree = KDTree(np.concatenate((col1_feature, col2_feature), axis=0))

# Find the nearest neighbors of each vector in Col1 using the k-d tree
# This returns a tuple containing the indices of the nearest neighbors
# in Col2 and the distances to those neighbors
neighbors = tree.query(col1_feature, k=k)

# Calculate the recall@k for each vector in Col1
recall_at_k = 0
for i, (neighbor_indices, distances) in enumerate(neighbors):
    # Get the labels for the nearest neighbors of the current vector
    neighbor_labels = col2[neighbor_indices]

    # Count the number of true labels among the nearest neighbors
    true_labels = 0
    for label in neighbor_labels:
        if label in true_labels_for_col1[i]:
            true_labels += 1

    # Calculate the recall@k for the current vector
    recall_at_k += true_labels / k

# Calculate the average recall@k over all vectors in Col1
average_recall_at_k = recall_at_k / len(col1)

赞(0）回复(0）举报 2022-12-04

我来回答

pytorch 正在计算此问题的多标签召回

2条答案

相关问题

热门标签

最新问答