JAVA如何创建一个字符串列表,每个字符串引用一个单独的Hashmap< String,Integer>

yk9xbfzb  于 2023-06-28  发布在  Java
关注(0)|答案(1)|浏览(125)

我正在做一个任务,我必须在Java中构建一个语言检测器。基本上它所做的是,它以字符串的形式获取一堆数据,并为每种语言创建一个HashMap。这个HashMap保存了一个特定的n-gram被列出的频率。

之前我已经实现了一个简单的HashMap类,方法如下:

  • 具有键(String)和值()的条目类
  • 加法
  • 获取方法
  • hashCode
  • 探针法
    **这里是我的HashMap类的构造函数:**N = length,basis = 31。
public HashMap(int N, int basis) {
            table = (Entry[]) Array.newInstance(Entry[].class.getComponentType(), N);
            this.basis = basis;
}

现在我需要实现以下方法:

public void learnLanguage(String language, String text)

我会先检查这个语言是否在我的列表1中列出,如果没有,我会添加它。然后,我将为这个特定的语言添加一个HashMap,在那里我存储n-Grams。(从那里开始,我认为它很简单,只需要计算一个特定的n-Gram在String“text”中的频率,并将其保存在HashMap中)
但我正在努力创建HashMap。我附上了一个图像它应该是什么样子。
how it should look like
希望有人能帮帮我!
我尝试的是:
尝试在HashMap中创建HashMap,但没有成功:

HashMap<String, HashMap<Integer>> languageMap;

有一些错误,我不知道如何摆脱。也不认为这是最好的选择。

f0ofjuux

f0ofjuux1#

Soo Baically必须做的是在HashMap中创建HashMap。
我们有一个HashMap(表1),字符串(语言)作为键,另一个HashMap(表2)作为值。因此,每个键,即语言有自己的HashMap,其中存储nGrams以及它们在语言中的使用频率。
我需要做的是实现一个我创建的HashMap示例(为了使用我自己的哈希/探测函数和其他实现的方法)。但是因为这个名字与JDK中的HashMap类相同,所以正如@bohemian指出的那样,它会引起混淆。
表1的正确初始化是:

LanguageDetector.HashMap<LanguageDetector.HashMap<Integer>> languages;

我忘了提,我们的HashMap中的键总是一个String,但是值只能是我们初始化的任何值。
以下是我们学校任务的完整代码:

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.lang.reflect.Array;
import java.text.DecimalFormat;
import java.util.ArrayList;

public class LanguageDetector {

    // n = the length of n-grams to use
    int n;

    // N = the size of the language-specific hash tables ("Tabelle 2")
    int N;

    LanguageDetector.HashMap<LanguageDetector.HashMap<Integer>> languages;
    ArrayList<String> languagesList;
    
    public LanguageDetector(int n, int N) {
        this.n = n;
        this.N = N;
        languages = new LanguageDetector.HashMap<LanguageDetector.HashMap<Integer>>(101, 31);
        languagesList = new ArrayList<>();
    }

    //a given String is split into nGrams and for each nGram the most accurate language gets a "point".
    //languaegs and their points are stored in a hash map
    public HashMap<Integer> apply(String text) {
        LanguageDetector.HashMap<Integer> resultMap = new HashMap<>(languagesList.size(), 31);
        ArrayList<String> ngramWinners = new ArrayList<>();
       

        for(String lang : languagesList){
            resultMap.add(lang, 0);  
        }

        int currentCount = 0;
        int maxCount = 0;
        
        //äußere Schleife für ngram-durchlauf 
        int k = n;
        for(int j = 0; j<=text.length()-n; j++){

            //durchlauf aller Sprachen für jedes nGram
            for(int i = 0; i<languagesList.size(); i++){
                
                //bestimmen in welcher Sprache das aktuelle nGram am häufigsten vorkommt
                String ngram = text.substring(j, k);
                currentCount = getCount(ngram, languagesList.get(i));
                //System.out.println(currentCount);
                if(currentCount>maxCount){
                    maxCount = currentCount;
                    //currentCount = 0;
                    ngramWinners.clear();
                    ngramWinners.add(languagesList.get(i));
                    break;
                }
                if(currentCount==maxCount){
                    ngramWinners.add(languagesList.get(i));
                }
            }

            //wenn das nGram in keiner Sprache verwendet wird, gibt es Punkte für niemanden
            if(maxCount != 0){
                //speicherung und aktualisieren des "scores"
                for(int i = 0; i<ngramWinners.size();i++){
                    int newCount = resultMap.get(ngramWinners.get(i)) + 1;
                    resultMap.add(ngramWinners.get(i), newCount);
                    //System.out.println(ngramWinners.get(i));
                }
            }
            ngramWinners.clear();
            maxCount = 0;
            k+=1;
        }

        return resultMap;
    }
    
    public void learnLanguage(String language, String text) {      
        if(languages.get(language) == null){
            languages.add(language, new LanguageDetector.HashMap<>(N, 31));
            languagesList.add(language);
        }
        LanguageDetector.HashMap<Integer> ngramMap = languages.get(language);

        int k = n;
        for(int j = 0; j<=text.length()-n; j++){
            String ngram = text.substring(j, k);
            if(ngramMap.get(ngram) != null){
                int ngramPlus = ngramMap.get(ngram) + 1;
                ngramMap.add(ngram, ngramPlus);
            } else{
                ngramMap.add(ngram, 1);
            }
            k+=1;
        }    
        
        //ngramMap.printHashMap();
    }

    public int getCount(String ngram, String language){
        if(languages.get(language) == null || languages.get(language).get(ngram) == null){
            return 0;
        }
        return languages.get(language).get(ngram);
    }

    //the hashMap returned from apply method can be inserted here to get the language as a string
    public String checkLanguage(LanguageDetector.HashMap<Integer> result){
        String finalLanguage;
        int currentCount = 0;
        int maxCount = 0;
        ArrayList<String> languageWinners = new ArrayList<>();

        //look for languages with most votes (and store in languageWinners)
        for(int i = 0; i < languagesList.size(); i++){
            currentCount = result.get(languagesList.get(i));
                
            if(currentCount>maxCount){
                maxCount = currentCount;
                //currentCount = 0;
                languageWinners.clear();
                languageWinners.add(languagesList.get(i));
            }
            else if(currentCount==maxCount){
                languageWinners.add(languagesList.get(i));
            }
        }

        //System.out.println(languageWinners);

        //there is one language with most votes
        if(languageWinners.size() == 1){
            finalLanguage = languageWinners.get(0);
            return finalLanguage;
        }

        //multiple languages have same votes --> lexiographically sorted.
        for(int i = 0; i<languageWinners.size();i++){          
            for (int j = i + 1; j < languageWinners.size(); j++) {
                if (languageWinners.get(i).compareToIgnoreCase(languageWinners.get(j)) > 0) {
                    String temp = languageWinners.get(i);
                    languageWinners.set(i, languageWinners.get(j));
                    languageWinners.set(j, temp);
                }
            }            
        }
        finalLanguage = languageWinners.get(0);
        return finalLanguage;
    }
    
    // open a file, read its lines, return them as an array.
    private static String read(String filename) {
        // !!! NO NEED TO CHANGE !!!
                
        try {
            
            StringBuilder sb = new StringBuilder();
                
            Reader in = new InputStreamReader(new FileInputStream(filename),
                            "UTF-8");

            BufferedReader reader = new BufferedReader(in);
                
            String s;
            while ((s = reader.readLine()) != null) {
            // Ignoriere Leerzeilen und Kommentare
            if (s.length() != 0 && s.charAt(0) != '#') {
                sb.append(s);
            }
            }
                
            reader.close();
                
            return sb.toString();
            
        } catch (IOException e) {       
            String msg = "I/O-Fehler bei " + filename + "\n" + e.getMessage();
            throw new RuntimeException(msg);        
        }
    } 

    //creating own hashmap class
    public static class HashMap<T> {

        //creating Entry class
        public class Entry {
            String key;
            T value;
            Entry next;
                
            Entry(String key, T value) {
                this.key = key;
                this.value = value;
            }
        }

        Entry[] table;
        int basis;

        public HashMap(int N, int basis) {
            table = (Entry[]) Array.newInstance(Entry[].class.getComponentType(), N);
            this.basis = basis;
        }     

        //add method, uses individual hashCode & probe function, for finding best possible index.
        public boolean add(String key, T value) {
            int index = hashCode(key);
            Entry entryOnIndex = table[index];

            if(entryOnIndex == null){
                table[index] = new Entry(key, value);
                return true;
            }
            if (entryOnIndex != null && entryOnIndex.key.equals(key)) {
                entryOnIndex.value = value;
            } 
            else{
                // Schlüssel in der HashMap suchen
                int currentIndex = index;
                int m = 0;

                while (table[currentIndex] != null && m <= table.length) {
                    currentIndex = sondieren(currentIndex, m);
                    Entry currentEntry = table[currentIndex];

                    if (currentEntry != null && currentEntry.key.equals(key)) {
                        currentEntry.value = value;
                        return true;
                    }

                    m++;
                }

                if(table[currentIndex] == null){
                    Entry newEntry = new Entry(key, value);
                    table[currentIndex] = newEntry;
                }
            }
            return false;
        }

        //very similar to add method. ..> possible alternative: implement a third methode to outsource redundant code. --> return the found index
        public T get(String key) {
            int index = hashCode(key);
            Entry entryOnIndex = table[index];

            if(entryOnIndex == null){
                return null;
            }
            if (entryOnIndex != null && entryOnIndex.key.equals(key)) {
                return entryOnIndex.value;
            }
            else{
                // Schlüssel in der HashMap suchen, da sondierung angewandt wurde
                int currentIndex = index;
                int m = 0;

                while (table[currentIndex] != null && m <= table.length) {
                    currentIndex = sondieren(currentIndex, m);
                    Entry currentEntry = table[currentIndex];

                    if (currentEntry != null && currentEntry.key.equals(key)) {
                        return currentEntry.value;
                    }

                    m++;
                }
            }
            return null;
        }

        //returns the fillratio relative to the HashMap Capacity
        public double fillRatio() {
            double ratio = 0;
            for(Entry e : table){
                if(e!=null){
                    ratio+=1;
                }
            }
            return ratio/table.length;
        }
                
        //hash-function for hashing the key (simplyfied): h(k) = (k₁ * B^(n-1) + k₂ * B^(n-2) + ... + kₙ₋₁ * B + kₙ) % N
        public int hashCode(String k) {
            double hash = 0;
            int n = k.length();

            for(int i = 0; i<k.length(); i++){
                hash = hash + (k.charAt(i) * Math.pow(basis,n-(i+1)) );
                hash %= table.length; 
            }
            
            //System.out.println("Hash-Code for " + k + ": "+ hash);
            return (int) hash;
        }

        //probe function: g(m + 1) = (g(m) + 2m + 1) % N
        public int sondieren(int index, int m) {
            return (index + 2 * m + 1) % table.length;
        }
        
        //for testing purposes only
        public void printHashMap() {
            System.out.println();
            for (int i = 0; i < this.table.length; i++) {
                Entry entry = (Entry) this.table[i];
                System.out.print(i + ": ");
                while (entry != null) {
                    System.out.print("(" + entry.key + " = " + entry.value + ") ");
                    entry = entry.next;
                }
                System.out.println();
            }
        }
    //end of HashMap Class
    }

    //reads txt files into Strings. These are used to learn languages .> given senctences are checked and the language is displayed (accuraccy is also displayed) [using different n-Gram and hashMap sizes]
    public static LanguageDetector runTest(String BASE, int n, int N) {
        
        LanguageDetector myDetector = new LanguageDetector(n, N);

        String alice_en = read("src\\alice\\alice.en.txt");
        String alice_de = read("src\\alice\\alice.de.txt");
        String alice_eo = read("src\\alice\\alice.eo.txt");
        String alice_fi = read("src\\alice\\alice.fi.txt");
        String alice_fr = read("src\\alice\\alice.fr.txt");
        String alice_it = read("src\\alice\\alice.it.txt");

        myDetector.learnLanguage("english", alice_en);
        myDetector.learnLanguage("german", alice_de);
        myDetector.learnLanguage("esperanto", alice_eo);
        myDetector.learnLanguage("finnish", alice_fi);
        myDetector.learnLanguage("french", alice_fr);
        myDetector.learnLanguage("italian", alice_it);

        
        String[] sentences = new String[] {

            // 11 x English
            "I'm going to make him an offer he can't refuse.",
            "Toto, I've got a feeling we're not in Kansas anymore.",
            "May the Force be with you.",
            "If you build it, he will come.",
            "I'll have what she's having.",
            "A martini. Shaken, not stirred.",
            "Some people can’t believe in themselves until someone else believes in them first.",
            "I feel the need - the need for speed!",
            "Carpe diem. Seize the day, boys. Make your lives extraordinary.",
            "Nobody puts Baby in a corner.",
            "I'm king of the world!",

            // 11 x Deutsch
            "Aber von jetzt an steht ihr alle in meinem Buch der coolen Leute.",
            "Wäre, wäre, Fahradkette",
            "Sehe ich aus wie jemand, der einen Plan hat?",
            "Erwartet mein Kommen, beim ersten Licht des fünften Tages.",
            "Du bist terminiert!",
            "Ich hab eine Wassermelone getragen.",
            "Einigen wir uns auf Unentschieden!",
            "Du wartest auf einen Zug, ein Zug der dich weit weg bringen wird.",
            "Ich bin doch nur ein Mädchen, das vor einem Jungen steht, und ihn bittet, es zu lieben.",
            "Ich genoss seine Leber mit ein paar Fava-Bohnen, dazu einen ausgezeichneten Chianti.",
            "Dumm ist der, der Dummes tut.",

            // 9 x Esperanto
            "Al du sinjoroj samtempe oni servi ne povas.",
            "Al la fiŝo ne instruu naĝarton.",
            "Fiŝo pli granda malgrandan englutas.",
            "Kia patrino, tia filino.",
            "La manĝota fiŝo estas ankoraŭ en la rivero.",
            "Ne kotas besto en sia nesto.",
            "Ne singardema kokino fidas je vulpo.",
            "Por sperto kaj lerno ne sufiĉas eterno.",
            "Unu hako kverkon ne faligas.",

            // 8 x Finnisch
            "Hei, hauska tavata.",
            "Olen kotoisin Suomesta.",
            "Yksi harrastuksistani on lukeminen.",
            "Nautin musiikin kuuntelusta.",
            "Juhannusperinteisiin kuuluu juhannussauna tuoreiden saunavihtojen kera, sekä pulahtaminen järveen.",
            "Aamu on iltaa viisaampi.",
            "Työ tekijäänsä neuvoo.",
            "Niin metsä vastaa, kuin sinne huudetaan.",

            // 9 x Französisch
            "Franchement, ma chère, c’est le cadet de mes soucis.",
            "À tes beaux yeux.",
            "Si j’aurais su, j’aurais pas v’nu!",
            "Merci la gueuse. Tu es un laideron mais tu es bien bonne.",
            "Vous croyez qu’ils oseraient venir ici?",
            "La barbe ne fait pas le philosophe.",
            "Inutile de discuter.",
            "Paris ne s’est pas fait en un jour!",
            "Quand on a pas ce que l’on aime, il faut aimer ce que l’on a.",

            // 9 x Italienisch
            "Azzurro, il pomeriggio è troppo azzurro e lungo per me",
            "Con te, cos lontano e diverso Con te, amico che credevo perso ",
            "è restare vicini come bambini la felicità",
            "Buongiorno, Principessa!",
            "Ho Ucciso Napoleone.",
            "L’amore vince sempre.",
            "La semplicità è l’ultima sofisticazione.",
            "Una cena senza vino e come un giorno senza sole.",
            "Se non hai mai pianto, i tuoi occhi non possono essere belli."
        };

        String[] labels = new String[] {

            "english","english","english","english","english","english","english","english","english","english","english",
            "german","german","german","german","german","german","german","german","german","german","german",
            "esperanto","esperanto","esperanto","esperanto","esperanto","esperanto","esperanto","esperanto","esperanto",
            "finnish","finnish","finnish","finnish","finnish","finnish","finnish","finnish",
            "french","french","french","french","french","french","french","french","french",
            "italian","italian","italian","italian","italian","italian","italian","italian","italian"

        };

        assert sentences.length == labels.length;

        LanguageDetector.HashMap<Integer> result = new LanguageDetector.HashMap(N, 31);
        int countCorrectDetection = 0;
        int loopCount = 0;
        double successRate = 0;

        //detect language and count how many languages were checked and how many were correct
        for(int i = 0; i<57; i++){
            result = myDetector.apply(sentences[i]);
            String detectedLanguage = myDetector.checkLanguage(result);
            //System.out.println("detected language *"+detectedLanguage+"* for: \n" + sentences[i]+"\n");
            //result.printHashMap();
            //System.out.println();
            if(detectedLanguage.equals(labels[i])){
                countCorrectDetection += 1;
            }
            loopCount += 1;
        }

        //calculate percentage
        if(countCorrectDetection != 0 && loopCount != 0){
            successRate = (double) countCorrectDetection / loopCount;
            DecimalFormat prozentFormat = new DecimalFormat("#0.00%");
            // Wandle die Double-Zahl in eine Prozentzahl mit zwei Nachkommastellen um
            String succesRateP = prozentFormat.format(successRate);
            
            System.out.println("For n = "+n+", N = "+N);
            System.out.println(loopCount + " senctences were checked. For "+countCorrectDetection+" senctences the correct language was detected.\nsucces-rate: "+succesRateP+"\n");
        }
    
        return myDetector;
    }


    public static void main(String[] args) {
        runTest("31", 1, 120001);
        runTest("31", 2, 120001);
        runTest("31", 2, 120001);
        runTest("31", 3, 120001);
        runTest("31", 4, 120001);
        runTest("31", 6, 120001);
        runTest("31", 5, 120001);
        runTest("31", 7, 120001);
        runTest("31", 8, 120001);
        runTest("31", 9, 120001);

        runTest("31", 1, 1200001);
        runTest("31", 2, 1200001);
        runTest("31", 2, 1200001);
        runTest("31", 3, 1200001);
        runTest("31", 4, 1200001);
        runTest("31", 5, 1200001);
        runTest("31", 6, 1200001);
        runTest("31", 7, 1200001);
        runTest("31", 8, 1200001);
        runTest("31", 9, 1200001);
    
    }
/*
 N = 120001

 hier tabelle erstellen...
 */

    /*
For n= 1, N= 120001
57 senctences were checked. For 11 senctences the correct language was detected.
succes-rate: 19.298245614035086%

For n= 2, N= 120001
57 senctences were checked. For 11 senctences the correct language was detected.
succes-rate: 19.298245614035086%

For n= 2, N= 120001
57 senctences were checked. For 11 senctences the correct language was detected.
succes-rate: 19.298245614035086%

For n= 3, N= 120001
57 senctences were checked. For 12 senctences the correct language was detected.
succes-rate: 21.052631578947366%

For n= 4, N= 120001
57 senctences were checked. For 38 senctences the correct language was detected.
succes-rate: 66.66666666666666%

For n= 5, N= 120001
57 senctences were checked. For 56 senctences the correct language was detected.
succes-rate: 98.24561403508771%

For n= 7, N= 120001
57 senctences were checked. For 56 senctences the correct language was detected.
succes-rate: 98.24561403508771%

For n= 8, N= 120001
57 senctences were checked. For 53 senctences the correct language was detected.
succes-rate: 92.98245614035088%

For n= 9, N= 120001
57 senctences were checked. For 46 senctences the correct language was detected.
succes-rate: 80.7017543859649%

For n= 1, N= 1200001
57 senctences were checked. For 11 senctences the correct language was detected.
succes-rate: 19.298245614035086%

For n= 2, N= 1200001
57 senctences were checked. For 11 senctences the correct language was detected.
succes-rate: 19.298245614035086%

For n= 2, N= 1200001
57 senctences were checked. For 11 senctences the correct language was detected.
succes-rate: 19.298245614035086%

For n= 3, N= 1200001
57 senctences were checked. For 12 senctences the correct language was detected.
succes-rate: 21.052631578947366%

For n= 4, N= 1200001
57 senctences were checked. For 38 senctences the correct language was detected.
succes-rate: 66.66666666666666%

For n= 5, N= 1200001
57 senctences were checked. For 56 senctences the correct language was detected.
succes-rate: 98.24561403508771%

For n= 7, N= 1200001
57 senctences were checked. For 56 senctences the correct language was detected.
succes-rate: 98.24561403508771%

For n= 8, N= 1200001
57 senctences were checked. For 53 senctences the correct language was detected.
succes-rate: 92.98245614035088%

For n= 9, N= 1200001
57 senctences were checked. For 46 senctences the correct language was detected.
succes-rate: 80.7017543859649%
     */
}

我希望这能澄清我的问题,使它更容易理解。我只是不能弄清楚我的HashMap的正确初始化。
对不起,非结构化的代码,我希望这不是一个痛苦的人有兴趣阅读它。

相关问题