我想创造一个 TreeMap<String,List<String,Integer>>
. 条件是
如果某个单词不存在:将该单词插入树Map,并将该单词与arraylist(docid,count)关联。
如果单词出现在树Map中,则检查当前docid是否与arraylist匹配,然后增加计数。
下面是我正在使用的代码。
public class StemTreeMap
{
private static final String r1 = "\\$DOC";
private static final String r2 = "\\$TITLE";
private static final String r3 = "\\$TEXT";
private static Pattern p1,p2,p3;
private static Matcher m1,m2,m3;
public static void main(String[] args)
{
BufferedReader rd,rd1;
String docid = null;
String id;
int tf = 0;
//CountPerDocument cp = new CountPerDocument(docid, count);
List<CountPerDocument> ls = new ArrayList<>();
Map<String,List<CountPerDocument>> mp = new TreeMap<>();
try
{
rd = new BufferedReader(new FileReader(args[0]));
rd1= new BufferedReader(new FileReader(args[0]));
int docCount = 0;
String line = rd.readLine();
p1 = Pattern.compile(r1);
p2 = Pattern.compile(r2);
p3 = Pattern.compile(r3);
while(line != null)
{
m1 = p1.matcher(line);
m2 = p2.matcher(line);
m3 = p3.matcher(line);
if(m1.find())
{
docid = line.substring(5, line.length());
docCount++;
//System.out.println("The Document ID is :");
//System.out.println(docid);
line = rd.readLine();
}
else if(m2.find()||m3.find())
{
line = rd.readLine();
}
else
{
if(!(mp.containsKey(line))) // if the stem is not on the TreeMap
{
//System.out.println("The stem is not present in the tree");
//System.out.println("The stem is not present in the tree: " + line + " The Document is :" + docid);
tf = 1;
ls.add(new CountPerDocument(docid,tf));
mp.put(line, ls);
System.out.println("Inserted string is: "+ mp.get(line));
line = rd.readLine();
}
else
{
if(ls.indexOf(docid) > 0) //if its last entry matches the current document number
{
//System.out.println("The Stem is present for the same docid so incrementing docid: " +line + ":"+ docid);
tf = tf+1;
ls.add(new CountPerDocument(docid,tf));
line = rd.readLine();
}
else
{
//System.out.println("Stem is present but not the same docid so inserting new docid: "+line + ":"+ docid);
tf = 1;
ls.add(new CountPerDocument(docid,tf)); //set did to the current document number and tf to 1
line = rd.readLine();
}
}
}
}
rd.close();
System.out.println("The Number of Documents in the file is:"+ docCount);
//Write to an output file
String l = rd1.readLine();
File f = new File("dictionary.txt");
if (f.createNewFile())
{
System.out.println("File created: " + f.getName());
}
else
{
System.out.println("File already exists.");
Path path = Paths.get("dictionary.txt");
Files.deleteIfExists(path);
System.out.println("Deleted Existing File:: Creating New File");
f.createNewFile();
}
FileWriter fw = new FileWriter("dictionary.txt");
fw.write("The Total Number of Stems: " + mp.size() +"\n");
/*Set<Map.Entry<String,List<CountPerDocument>>> entries = mp.entrySet();
for(Map.Entry<String,List<CountPerDocument>> entry : entries)
{
fw.write(entry.getKey() + entry.getValue());
} */
Iterator<Map.Entry<String, List<CountPerDocument>>> iterator = mp.entrySet().iterator();
Map.Entry<String, List<CountPerDocument>> entry = null;
while(iterator.hasNext())
{
entry = iterator.next();
fw.write(entry.getKey() + "=>" + entry.getValue() + "\n" );
}
//System.out.println(mp.get("todai"));
fw.close();
}catch(IOException e)
{
e.printStackTrace();
}
}
}
为了创建arraylist,我使用了
public class CountPerDocument
{
private final String documentId;
private final int count;
CountPerDocument(String documentId, int count)
{
this.documentId = documentId;
this.count = count;
}
public String getDocumentId()
{
return this.documentId;
}
public int getCount()
{
return this.count;
}
@Override
public String toString()
{
return this.documentId + "-" + this.count;
}
}
当我试着打印我正在打印的东西时 mp.get(line)
,得到的输出如下:
Stem is: attempt
DocId is: LA010190-0002TF is : 1
Inserted string is: [LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1]
我不知道为什么要插入这么多。是打印输出错误,还是选择的方法有问题?
1条答案
按热度按时间kpbwa7wx1#
基本体与对象
java集合保存的是对象(从技术上讲是对象引用),而不是原语。所以你不能用
int
当定义List
. 使用Integer
类的oop等价物int
原始的。列表属于一种类型
没有这样的事
List < String , Integer >
. 列表是一个单一的列表,包含一系列属于同一类型的元素。你可以选择List < String >
或者List < Integer >
但不是组合。Map中的Map
显然,您希望在多个文档中进行字数统计,但要按文档跟踪每个单词的使用情况。您希望将每个单词与一个集合相关联,该集合将每个文档与该文档中该单词的计数相关联。
用于关联对象的集合是
Map
. 所以你需要一个将每个单词Map到另一个Map的Map,一个要计数的文档标识符Map。也就是说,键是字符串、值是Map的Map。每个单词都有一张Map。…第一个
String
指正在计算的单词,第二个String
指文档标识符。你的逻辑应该是这样的:
当您遇到每个文档中的每个单词时,请在外部Map中找到该单词作为键。如果找不到,则将密钥和新的空内部Map放入外部Map。此时,您手头有一个内部Map,可以是一个预先存在的内部Map,也可以是一个新的空内部Map。
在该内部Map中,搜索文档标识符。如果找不到,请将文档id与新的
Integer
设置为零。所以现在你手头上有Integer
或者一个新的Integer
. 再加一个Integer
去买新的Integer
. 将带有新整数的doc id放回内部Map中。或者,你可以使用
AtomicInteger
而不是Integer
. 然后您可以调用它的递增方法,而不是替换一个不可变的方法Integer
另一个不变的Integer
.你一定是个做作业的学生,剩下的我留给你做。
提示:请注意,在编写代码时,用简单的散文写出逻辑是如何提供澄清和提纲的。