在Perl中,正则表达式中的捕获组数量有限制吗?

wfauudbj  于 2022-12-19  发布在  Perl
关注(0)|答案(3)|浏览(176)

正则表达式中捕获组的数量有限制吗?我曾经认为是9($1...$9),但是在perlre文档中没有找到任何东西来证实这一点。事实上,下面的代码显示至少有26个。

#!/usr/local/bin/perl

use strict;
use warnings;

my $line = " a b c d e f g h i j k l m n o p q r s t u v w x y z ";

my $lp = "(\\w) ";
my $pat = "";
for (my $i=0; $i<26; $i++)
{
   $pat = $pat . $lp;
}

$line =~ /$pat/;
print "$1 $2 $3 $24 $25 $26\n";

请注意,此问题:How many captured groups are supported by pcre2 substitute function只引用PCRE2 C库,我问的是Perl。

wgmfuz8q

wgmfuz8q1#

https://perldoc.perl.org/perlre表示:
您可以使用的捕获子字符串的数量没有限制。

yhived7q

yhived7q2#

为什么不测试一下呢?Regexp有2000万次捕获,这对任何人来说都足够了。这让我觉得内存是这里的极限。这在我的旧笔记本电脑上用了25秒的时间,使用perl v5.30:

my $n = 20_000_000;                 # 20 million
my $re = join"", map "(.)", 1..$n;  # create regexp with 20 million captures
my $str = "ABC" x $n;               # create a more than long enough string
$str =~ /$re/;                      # match & capture
print $19999987, "\n";              # print the "A" in capture var number 19999987
print ${^CAPTURE}[19999987-1],"\n"; # same
print "Length: ".@{^CAPTURE}."\n";  # prints 20000000, length of array
vhmi4jdf

vhmi4jdf3#

你可以试试看!即使没有内在的限制,也可能有一个实际的限制。
让我们在我的M1 Mac Mini上试用Perl v5.36。
下面是一个小程序,它可以捕获我想要的一些捕获,然后构建一个足够长的字符串来匹配这些捕获,并构建一个包含这些捕获的模式(请查看v5.36 builtin::ceil的用法):

#!perl

use v5.36;
use experimental qw(builtin);
use builtin qw(ceil);

my $n = shift;
say "N is $n";

my $alpha = join '', 'a' .. 'z';
my $multiple = ceil($n / 26);
my $text = $alpha x ($multiple + 1);

my $n_mod_26 = $n % 26;
my $expected_letter = substr $alpha, $n_mod_26 - 1, 1;

my $pattern_text = '(.)' x $n;
my $pattern = qr/$pattern_text/;

my $result = $text =~ $pattern;
say $result ? "Matched" : 'Did not match';

no strict 'refs';
my $matched = do { no strict 'refs'; ${"$n"} };
print "Matched <$matched>; expected <$expected_letter>\n";

当我运行不同的长度时,我最终让shell给予:

brian@M1-Mini Desktop % for i in 1 3 7 50 500 5000 70000 900000 3000000 40000000 1234567890; do echo '----' && time perl test.pl $i; done
----
N is 1
Matched
Matched <a>; expected <a>
perl test.pl $i  0.02s user 0.01s system 67% cpu 0.047 total
----
N is 3
Matched
Matched <c>; expected <c>
perl test.pl $i  0.01s user 0.00s system 91% cpu 0.014 total
----
N is 7
Matched
Matched <g>; expected <g>
perl test.pl $i  0.01s user 0.00s system 92% cpu 0.011 total
----
N is 50
Matched
Matched <x>; expected <x>
perl test.pl $i  0.01s user 0.00s system 92% cpu 0.010 total
----
N is 500
Matched
Matched <f>; expected <f>
perl test.pl $i  0.01s user 0.00s system 92% cpu 0.008 total
----
N is 5000
Matched
Matched <h>; expected <h>
perl test.pl $i  0.01s user 0.00s system 93% cpu 0.008 total
----
N is 70000
Matched
Matched <h>; expected <h>
perl test.pl $i  0.02s user 0.00s system 97% cpu 0.022 total
----
N is 900000
Matched
Matched <j>; expected <j>
perl test.pl $i  0.20s user 0.02s system 97% cpu 0.229 total
----
N is 3000000
Matched
Matched <p>; expected <p>
perl test.pl $i  0.69s user 0.06s system 95% cpu 0.786 total
----
N is 40000000
Matched
Matched <n>; expected <n>
perl test.pl $i  9.32s user 1.08s system 91% cpu 11.402 total
----
N is 1234567890
zsh: killed     perl test.pl $i
perl test.pl $i  127.80s user 6.17s system 83% cpu 2:39.69 total

我的机器放弃了1,234,567,890个分组,这可能与分组的数量无关;也许perl中的其他东西决定它不满意,或者也许程序超出了一些进程资源的限制。你自己的机器可能在不同的点给予(或者根本不放弃)。我不知道是什么杀死了它,我真的不在乎,因为即使我知道,我也不会做任何事情来修复它。
但是,我能找到最大的数字吗?它大约是389,000,000次捕获。这不是一个固定的数字,我可以一致地预测,可能取决于其他不相关的事情在同一时间发生。

相关问题