我有一个关于两个句子之间的单词相似性的问题。我想知道相似度的代码,即相同的单词数除以长句的单词数。我应该用哪个库来做这个,谢谢。
There are many kinds of similarity as similarity. I want to know which similarity title this similarity belongs to.
irtuqstp1#
您可以使用一些类似的文档技术,如Cosine similariy在这里,我已经根据您的描述实现了一个解决方案。
Cosine similariy
double findSimilarityRatio (String sentence1, String sentence2) { HashMap<String, Integer> firstSentenceMap = new HashMap<>(); HashMap<String, Integer> secondSentenceMap = new HashMap<>(); String[] firstSentenceWords = sentence1.split(" "); String[] secondSentenceWords = sentence2.split(" "); for (String word : firstSentenceWords) { if (firstSentenceMap.containsKey(word)) { firstSentenceMap.put(word, firstSentenceMap.get(word) + 1); } else { firstSentenceMap.put(word, 1); } } for (String word : secondSentenceWords) { if (secondSentenceMap.containsKey(word)) { secondSentenceMap.put(word, secondSentenceMap.get(word) + 1); } else { secondSentenceMap.put(word, 1); } } double totalWords = 0; double totalHits = 0; if (firstSentenceWords.length >= secondSentenceWords.length) { totalWords = firstSentenceWords.length; for (Map.Entry<String, Integer> entry : firstSentenceMap.entrySet()) { String key = entry.getKey(); if (secondSentenceMap.containsKey(key)) { totalHits = totalHits + Math.min(secondSentenceMap.get(key), firstSentenceMap.get(key)); } } } else { totalWords = secondSentenceWords.length; for (Map.Entry<String, Integer> entry : secondSentenceMap.entrySet()) { String key = entry.getKey(); if (firstSentenceMap.containsKey(key)) { totalHits = totalHits + Math.min(secondSentenceMap.get(key), firstSentenceMap.get(key)); } } } return totalHits/totalWords; }
希望能有所帮助,干杯!
1条答案
按热度按时间irtuqstp1#
您可以使用一些类似的文档技术,如
Cosine similariy
在这里,我已经根据您的描述实现了一个解决方案。
希望能有所帮助,干杯!