如何使用scanner.usedelimiter()匹配紧跟在一个单词后面的两个字符？

1qczuiv0 于 2021-06-30 发布在 Java

关注(0)|答案(1)|浏览(200)

我试图解析一个普通的.txt文件的一般结构

[[Title]]
CATEGORIES: text, text, text
some text etc...
[[Next Title]]
CATEGORIES: text, text, text
Next other text etc ...

在我的代码中，我使用这个模式

Scanner inputScanner = new Scanner(fileEntry)
inputScanner.useDelimiter("\\]\\]|\\[\\[");  
while (inputScanner.hasNext()) {
   // Get title of wiki article and contents
   String wikiName = inputScanner.next();
   String wikiContents = inputScanner.next();
}

但它也在捕捉像

"[some text [ some other text ] some more text ]"
"[[Vertebrate trachea|trachea]]s from human stem cells. Several [[artificial urinary bladder]]s"
"[[Image:Bohr-atom-PAR.svg|thumb|right|310px|The Rutherford–Bohr model of the hydrogen atom ([tpl]nowrap|Z [tpl]=[/tpl] 1[/tpl]) or a hydrogen-like ion ([tpl]nowrap|Z > 1[/tpl]), results in a photon of wavelength 656 nm (red light).]]"
"[[File:Gettysburg Campaign.png|thumb|350px|Gettysburg Campaign (through July 3); cavalry movements shown with dashed lines. [tpl]legend|#ff0000|Confederate[/tpl]]]"
"observed is not some nonphysical world of [[consciousness]], mind, or mental life "

我想让扫描器在任何时候看到

'[[' or ']] CATEGORIES'

但我不知道怎么做，因为我不太擅长模式或正则表达式。有人能找出一个可能有用的模式吗？我试过四处查看其他分隔符问题和javadocs，但很难将它们应用到我的问题中。谢谢你的时间和你能给予的任何帮助！

Java regex java.util.scanner String

来源：https://stackoverflow.com/questions/65137971/how-to-use-scanner-usedelimiter-to-match-two-characters-next-to-each-other-fol

1条答案

按热度按时间

nkcskrwz1#

为了正确匹配标题，我们可以使用 positive lookahead 在正则表达式中： \[\[(?=.*]]\nCATEGORIES:)|]]\n(?=CATEGORIES:) 说明：
匹配 [[ 后跟任意字符序列和 CATEGORIES 字符串。仅使用正向前瞻 [[ 是匹配的。
同样，匹配 ]] 然后 CATEGORIES 字符串。
更新的代码段：

String text = "[[title1]] \n" +
        "CATEGORIES: [some text [ some other text ] some more text ]\n" +
        "[[Vertebrate trachea|trachea]]s from human stem cells. Several [[artificial urinary bladder]]s\n" +
        "[[Image:Bohr-atom-PAR.svg|thumb|right|310px|The Rutherford–Bohr model of the hydrogen atom ([tpl]nowrap|Z [tpl]=[/tpl] 1[/tpl]) or a hydrogen-like ion ([tpl]nowrap|Z > 1[/tpl]), results in a photon of wavelength 656 nm (red light).]]\n" +
        "[[File:Gettysburg Campaign.png|thumb|350px|Gettysburg Campaign (through July 3); cavalry movements shown with dashed lines. [tpl]legend|#ff0000|Confederate[/tpl]]]\n" +
        "observed is not some nonphysical world of [[consciousness]], mind, or mental life\n" +
        "[[title2]]\n" +
        "CATEGORIES: [[some more text]]";

Scanner inputScanner = new Scanner(text);
inputScanner.useDelimiter("\\[\\[(?=.*]]\\s*CATEGORIES:)|]]\\s*\n(?=\\s*CATEGORIES:)");
while (inputScanner.hasNext()) {
    String wikiName = inputScanner.next();
    String wikiContents = inputScanner.next();
    System.out.printf("Name:%s\nContents:%s\n\n", wikiName, wikiContents);
}

输出：

Name:title1
Contents:CATEGORIES: [some text [ some other text ] some more text ]
[[Vertebrate trachea|trachea]]s from human stem cells. Several [[artificial urinary bladder]]s
[[Image:Bohr-atom-PAR.svg|thumb|right|310px|The Rutherford–Bohr model of the hydrogen atom ([tpl]nowrap|Z [tpl]=[/tpl] 1[/tpl]) or a hydrogen-like ion ([tpl]nowrap|Z > 1[/tpl]), results in a photon of wavelength 656 nm (red light).]]
[[File:Gettysburg Campaign.png|thumb|350px|Gettysburg Campaign (through July 3); cavalry movements shown with dashed lines. [tpl]legend|#ff0000|Confederate[/tpl]]]
observed is not some nonphysical world of [[consciousness]], mind, or mental life

Name:title2
Contents:CATEGORIES: [[some more text]]

赞(0）回复(0）举报 2021-06-30

我来回答

如何使用scanner.usedelimiter()匹配紧跟在一个单词后面的两个字符？

1条答案

相关问题

热门标签

最新问答