如果我为产品制作一个应用程序,系统中的每个用户都可以对产品进行评分。我想建立一个推荐系统,根据用户对其他产品的评价,向活跃用户推荐产品。使用基于项目的协作过滤使用knn。当我们发现产品之间的相似性并选择前k项时,哪种方法是正确的?
1-计算所有产品之间的相似性,然后对用户评价较高的每个产品取前k(数据集是一个矩阵(表示来自每个用户的每个项目的评级值)
2-knn应用于用户喜欢的所有产品,一个接一个,我们发现每个产品和用户没有评分的产品之间的相似性,因此对于每个产品,我们为用户评分高的每个产品取相似产品(尚未评分的用户)的前k,然后展示给用户(每次应用knn时的数据集都是一个矩阵(包含用户对每个项目的评分以及用户尚未评分的所有其他项目)。
我在申请中采用了第二种方法,如下所示:
准备数据集:
//data is a matrix contains the rating for each product from each user
private void prepareDataSet(Double[][] data, int userIndex) {
Double[][] dataset;
ArrayList<String> productsID=new ArrayList<String>();
int index=0;
for(int i=0;i<data.length;i++){ // traverse on the dataset matrix rows (traverse on products' rating values)
if(data[i][userIndex]>=3){ //check if the active user rated the product i , and the rating value equal 3 or more (the user liked the product)
ArrayStructure DataSet=new ArrayStructure();
for(int col=0;col<data[i].length;col++) {
DataSet.add(index, col, data[i][col]);
index++;
}
for(int j=0;j<data.length;j++){ // traverse on the dataset matrix rows from the beginning to find the products which the active user not rate it
// check if the post rating is zero which means that the user did not rate it
if(data[j][userIndex]==0.0) {
for(int col=0;col<data[j].length;col++) {
DataSet.add(index, col, data[j][col]);
productsID.add(products.get(j)); //"products" ArrayList has all products' ID in the system
}
index++;
} // end if
} //end forloop j
dataset=DataSet.toArray();
int k=determineK(dataset.length-1); // the sample size
K_Nearest_Algorithm(dataset,k,productsID);
productsID.clear();
} //end if
} //end for loop i
readPosts(); //after all calculations
}
knn算法:
// our algorithm
// the algorithm takes the dataset and the # of neighbors
private void K_Nearest_Algorithm(Double[][] dataset, int k, ArrayList<String> productsID){
//create an arrayList to save the similarity calculations
//item_record holds the distance between the product I want to find its similarities and other products and their IDs
ArrayList<item_record> item_similarity_values = new ArrayList <item_record> ();
for(int i=1;i<dataset.length;i++){ // traverse on the dataset matrix rows (traverse on products' rating values)
// we started traversing from i=1 to rule out item[0]--> main item
for(int j=0;j<dataset[i].length;j++){ // traverse on the dataset matrix rows from the beginning to find the products which the active user not rate it
double similarity=calcCosineSimilarity(dataset[0],dataset[i]); //calculate the cosine similarity between the two products (two vectors)
item_record Record=new item_record(productsID.get(i-1),similarity); // create an object to save the item id and its similarity to the main item
item_similarity_values.add(Record);
} //end forloop j
Collections.sort(item_similarity_values, Collections.reverseOrder()); //descending order: the items with larger similarities must be on the beginning
for (int item=0;item<k;item++){ //add the top k items to the ArrayList which has all recommended items to the Active user
if(!recommended_items.contains(item_similarity_values.get(item).getId()))
recommended_items.add(item_similarity_values.get(item).getId());
}
} //end if
item_similarity_values.clear(); // cleared to use it for another rated item
} //end for loop i
暂无答案!
目前还没有任何答案,快来回答吧!