javaregex解析长文本的部分

11dmarpk  于 2021-07-06  发布在  Java
关注(0)|答案(2)|浏览(335)

我有一个巨大的txt文件,它可以包含几个房子的名字,对于每个房子,都有一些特定于那个房子的值,以此类推。下面是我的txt的类似部分:

getHouseName: house1
random useless text
price: 1000
squaremtr: 75
sellVal: 1000
random useless text
random useless text
random useless text
rentPrice: 150
getHouseName: house2
price: 1004
squaremtr: 85
sellVal: 950
random useless text
rentPrice: 150
getHouseName: house3
price: 1099
squaremtr: 90
random useless text
random useless text
sellVal: 1100
random useless text
rentPrice: 199

我希望,对于每个房子,检索每个房子的特定值,并使用regex将它们存储到一个变量中。现在这是我的密码:

public void testHouse() {
    Scanner txt = new Scanner(new File("path//to//file"));

    String houseName ="";
    String price = "";
    String squaremtr = "";
    String sellVal = "";
    String rentPrice = "";

    Pattern houseNamePatt = Pattern.compile("getHouseName: ((_!getHouseName: \\s).)*", Pattern.DOTALL);

    while(txt.hasNextLine()) {
        String str = txt.nextLine();
        Matcher m = houseNamePatt.matcher(str);
        if(m.find) {
            houseName=str.substring(m.end());
            System.out.println("houses: " + m.group());
        }
    }
}

但在这种情况下,我得到的只是一个包含所有房屋名称的列表,而不是每个名称之间的行,我绝对不能将特定房屋的值赋给我的变量。我错在哪里?谢谢您

gmxoilav

gmxoilav1#

您可以通过匹配紧跟捕获组的名称来获取所有值。如果中间有带有随机值的行,则可以使用负向前看来匹配所有不以下一个期望值开头的行 (?! 然后将变量的值设置为组数。

^getHouseName:\h+(.+)(?:\R(?!price:).*)*\Rprice: (\d+)(?:\R(?!squaremtr:).*)*\Rsquaremtr:\h+(\d+)(?:\R(?!sellVal:).*)*\RsellVal:\h+(\d+)(?:\R(?!rentPrice:).*)*\RrentPrice:\h+(\d+)

部分: ^ 字符串开头 getHouseName:\h+(.+) 匹配组1中gethousename的值 (?:\R(?!price:).*)*\Rprice: (\d+) 匹配到下一行,在第2组中捕获1+个数字 (?:\R(?!squaremtr:).*)*\Rsquaremtr:\h+(\d+) 用squaremtr匹配到下一行,在第3组中捕获1+个数字 (?:\R(?!sellVal:).*)*\RsellVal:\h+(\d+) 与sellval匹配到下一行,在第4组中捕获1+个数字 (?:\R(?!rentPrice:).*)*\RrentPrice:\h+(\d+) 在与rentprice的下一行匹配之前,在第5组中捕获1+个数字
正则表达式演示

a11xaf1n

a11xaf1n2#

以下正则表达式将执行此操作:

(?m)^getHouseName: (.*)\\Rprice: (.*)\\Rsquaremtr: (.*)\\RsellVal: (.*)\\RrentPrice: (.*)

测试

String hugeText = "getHouseName: house1\n" + 
                  "price: 1000\n" + 
                  "squaremtr: 75\n" + 
                  "sellVal: 1000\n" + 
                  "rentPrice: 150\n" + 
                  "getHouseName: house2\n" + 
                  "price: 1004\n" + 
                  "squaremtr: 85\n" + 
                  "sellVal: 950\n" + 
                  "rentPrice: 150\n" + 
                  "getHouseName: house3\n" + 
                  "price: 1099\n" + 
                  "squaremtr: 90\n" + 
                  "sellVal: 1100\n" + 
                  "rentPrice: 199";

String regex = "(?m)^" +
               "getHouseName: (.*)\\R" + 
               "price: (.*)\\R" + 
               "squaremtr: (.*)\\R" + 
               "sellVal: (.*)\\R" + 
               "rentPrice: (.*)";
for (Matcher m = Pattern.compile(regex).matcher(hugeText); m.find(); ) {
    String houseName = m.group(1);
    String price     = m.group(2);
    String squaremtr = m.group(3);
    String sellVal   = m.group(4);
    String rentPrice = m.group(5);
    System.out.printf("%-8s %6s %4s %6s %5s%n",
                      houseName, price, squaremtr, sellVal, rentPrice);
}

输出

house1     1000   75   1000   150
house2     1004   85    950   150
house3     1099   90   1100   199

相关问题