Perl Split字符串有双引号和空格

slhcrj9b  于 2023-06-06  发布在  Perl
关注(0)|答案(4)|浏览(181)

我有一个这样的字符串:"abc" "cd - e"。我需要将其拆分为以下两个字符串:

  1. "abc"
  2. "cd - e"
    我在Perl中尝试了几个选项,但不能满足我需要的一个。有人能给我指路吗?谢谢
7ivaypg9

7ivaypg91#

你可以在前面是",后面是"的空白上拆分:

use strict;
use warnings; 

my $s = '"abc" "cd - e"';
my @matches = split /(?<=")\s+(?=")/, $s;
# "abc"
# "cd - e"
disho6za

disho6za2#

my @strings = $input =~ /"[^"]*"/g;
  • 假设输入有效。基本上,您可以使用正则表达式匹配来验证或提取,但同时执行这两项操作非常困难。
  • 假设带引号的字段不能包含引号,因为您没有提到转义机制。
hrysbysz

hrysbysz3#

如果你的输入像你建议的那样有两个字符串(而不是任意的 n 字符串),那么这应该可以:

$s = '"abc" "cd - e"';

$s =~ /(".*") (".*")/;
$s1 = $1;
$s2 = $2;

或者,您可以通过将.替换为“non-quote”来使其更安全一些,即。[^"]

$s =~ /("[^"]*") ("[^"]*")/;
$s1 = $1;
$s2 = $2;
pw9qyyiw

pw9qyyiw4#

下面是split_line函数的一个不那么小的实现,它处理引号和转义空格。

sub split_line {
    my $string = shift;
    my $orig_line = $string;

    my $accumulated = '';
    my @result      = ();
    my $in_str      = 0;
    my $sep_char    = '';
    for my $tok ( split /\s/, $string ) {

        # Found a string boundary in this token
        if ( $tok =~ /'|"/ ) {
            my $orig_sep_char = $sep_char;

            # Check that we are not mismatching simple and double quote
            if ( $tok =~ /'/ ) {
                die "Simple quote (') matched with double quote (\") in $orig_line" if ( $sep_char eq '"' );
                $sep_char = "'";
            }
            if ( $tok =~ /"/ ) {
                die "Double quote (\") matched with simple quote (') in $orig_line" if ( $sep_char eq "'" );
                $sep_char = '"';
            }

            die "Please don't mix quotes and escaped spaces"                          if ( $tok =~ /\\$/ );

            # Cleanup the sep char
            $tok =~ s/"|'//;

            if ( $tok =~ s/('|")// ) { # Two quotes in the same chunk. Deal with it if it's eg: >>"something"<<
                die "Mismatch between simple quote (') and double quote (\") in $orig_line" if ($sep_char ne $1);
                die "Please don't use more than two quote signs per elements, that's too hard to parse." if ( $tok =~ /'|"/ );

                $sep_char = $orig_sep_char; # Revert the fact that we are entering a quote

                # Deal with that chunk as if it were not quoted
                if ($in_str) {
                    $accumulated .= " $tok";
                } elsif ( length $accumulated ) {
                    push @result, "$accumulated $tok";
                    $accumulated = "";
                } else {
                    push @result, $tok;
                }
                next;
            }

            # Accumulate the string if entering the string (in_str = false before that chunk),
            # or push the previously accumulated things if existing the string (in_str = true previously).
            if ($in_str) {
                push @result, "$accumulated $tok";
                $accumulated = "";
                $sep_char    = '';
            } else {
                $accumulated = $tok;
            }
            $in_str = not $in_str;
            next;
        }

        # This token is ended with an escaped space
        if ( $tok =~ /\\$/ ) {
            chop $tok;
            $accumulated = ( length $accumulated ? "$accumulated " : '' ) . $tok;
            next;
        }

        # Currently within a string, no boundary in sight
        if ($in_str) {
            $accumulated .= " $tok";
            next;
        }

        # Nothing specific about this item
        if ( length $accumulated ) {
            push @result, "$accumulated $tok";
            $accumulated = "";
        } else {
            push @result, $tok;
        }
    }
    die "Expecting end of quote" if $in_str;
    return @result;
}

下面是一个使用示例:

print join "#", split_line("a  'b c   c d  ' 'titi' \"toto\" e\\ f  g\\  h ij'k l'  m\"n \" 'l '");
print "#\n";

这将显示以下内容:

a##b c   c d  #titi#toto#e f##g #h#ijk l##mn #l #

这种实现并不完美。以下是一些遗留问题:

  • 报价不能混合:>>“da 'd d'<<无效
  • 空元素将被忽略:>>\ a<<结果为“a”而不是“a”
  • 引号和空格转义不能混合使用:>>“a B“c\ d<<无效

为了记录在案,我在我的项目中实现了这一点。github上的代码可能会在未来发展以解决问题。

相关问题