Perl -与限定符匹配，最大长度为捕获变量

whlutmcx 于 2023-03-03 发布在 Perl

关注(0)|答案(3)|浏览(157)

我正在尝试打印与"备忘单"匹配的org mode节点
节点以""（星号）序列开始，并在比第一个""（星号）序列短或相等的下一个"*"（星号）序列结束。节点仅从行首开始。
以下是我目前掌握的情况：

#! /usr/bin/perl
$data= <<EOF;
not printed
* not printed
** cheatsheet printed
printed
* not printed
no printed
EOF

$\="\n";
my @data = split /\n/, $data;
foreach $_( @data ) { 
    /^((\*)+) cheatsheet/ ... /^((\*)){1,length("$1")}/ and print ;
}

程序应打印

** cheatsheet printed
printed

我的意图是让perl用length("$1")代替2，这样第二个匹配操作就被动态转换为：

/^((\*)){1..2}/

因此在"* 未打印"行停止
错误消息：

Nested quantifiers in regex; marked by <-- HERE in m/^((\*)){1,length("** <-- HERE ")}/ at ./tst.pl line 14.

即使perl的行为符合我的预期，它仍然会打印行"* not printed"。任何关于如何解决这个问题的想法都是受欢迎的。
我知道org模式的解析器模块，我想用regex来练习。

perl

来源：https://stackoverflow.com/questions/75585770/perl-match-with-quantifier-up-to-length-of-capture-variable

3条答案

按热度按时间

uxhixvfz1#

if ( X ... Y ) { ... }

基本上就是

if ( !$flag ) {
   if ( X ) {
      $flag = 1;
   }
} else {
   if ( Y ) {
      $flag = 0;
   }
}

if ( $flag ) { ... }

在我们的情况下，这意味着

if ( !$flag ) {
   if ( /^((\*)+) cheatsheet/ ) {
      $flag = 1;
   }
} else {
   if ( /^((\*)){1,length("$1")}/ ) {
      $flag = 0;
   }
}

您希望在第二个匹配项中使用第一个匹配项所匹配的星数。

if ( !$flag ) {
   if ( /^(\*+) cheatsheet\b/ ) {
      $flag = 1;
      $level = length( $1 );
   }
} else {
   if ( /^\*{1,$level)} / ) {
      $flag = 0;
   }
}

除了跟踪星星的数量，我还删除了一些不必要的括号，并在第二个正则表达式中添加了一个空格，类似(?!\*)的内容非常重要，因为/^\*{1,2}/与*****匹配。
顺便说一下，我们可以避免动态地构建模式。

if ( !$flag ) {
   if ( /^(\*+) cheatsheet\b/ ) {
      $flag = 1;
      $level = length( $1 );
   }
} else {
   if ( /^(\*+)/ && length( $1 ) <= $level ) {
      $flag = 0;
   }
}

使用...：

print
   if  /^((\*)+) cheatsheet/ && ( $level = length( $1 ) )
   ... /^(\*+)/ && length( $1 ) <= $level

赞(0）回复(0）举报 2023-03-03

cl25kdpy2#

匹配整个文本，而不是逐行匹配，并使用负前瞻。下面的摘录稍微扩展了您的测试数据，只是为了确保**cheatsheet下的子节点会得到正确处理

my $txt = <<"";
not printed
* not printed
** cheatsheet printed
printed
*** subnode printed
subnode content printed
* not printed
no printed

my $rx = qr{
  ^                # start of a line
  (\*+)            # stars, captured in $1
  \s cheatsheet    # the target node
  (                # now either ..
     (?!^\*).      #   something that is not a node
    |              # .. or ..
      ^\1          #   a node with at least the same length
   )*              # repeat this as long as possible
 }xms;

say $& if $txt =~ /$rx/;

下面是另一个版本，遵循池上的建议，添加了一些注解。由于/s标志被删除，'.'的含义发生了变化，所以我们也必须在“cheatsheet”行的末尾显式地包含\n。

my $rx2 = qr{
  ^                # start of a line
  (\*+)            # stars, captured in $1
  \s cheatsheet    # the target node ..
  .* \n            # .. up to the end of that node's line
  (?: ^            # now treat each following line
     (?:           # either ..
        [^*\n] .*  #   a whole line which does not start with a star
     |             # .. or ..
        \1 .*      #   a node with at least the same length
      )?           # optional because the line may be empty
     \n            # newline character
   )*              # repeat for each line
 }xm;

赞(0）回复(0）举报 2023-03-03

sdnqo3pr3#

可以使用Perl * Code Construct * (??{ code })之一来构造正则表达式段
它是内联的，实时执行的。返回值是一个模式。
这将使用regex系统变量$^N来获取上次获取的捕获值
在使用它之前。
通过取长度，范围量词可以被构造为代码段的一部分。
注意，量词不能跟在空的后面，所以星号必须包含在结构中。
使用预视，匹配会继续进行，直到找到下一个落在该范围内的星号行。
注意，我是按照规范来做的，所以一行的开头比所需的星号范围多 * 个星号就可以匹配，这很容易克服。

use strict;
use warnings;

$/ = '';
my $str = <DATA>;

if ( $str =~ /
       (?sm)
       ^ ( \*+ ) [ ] cheatsheet .*?             # (1) Found *'s at BOL, then cheatsheet
       (?= ^(??{{ '\*{1,'.length($^N).'}' }})   # Lookahead. Construct * with quantifier. $*N is last captured group
         (?!\*)
       )
   /x ) {
  print $&, "\n" }
else {
  print "no match\n"; }

__DATA__
not printed
* not printed
** cheatsheet printed
printed
* not printed
no printed

赞(0）回复(0）举报 2023-03-03

我来回答

Perl -与限定符匹配，最大长度为捕获变量

3条答案

相关问题

热门标签

最新问答