Perl Split字符串有双引号和空格

slhcrj9b 于 2023-06-06 发布在 Perl

关注(0)|答案(4)|浏览(181)

我有一个这样的字符串："abc" "cd - e"。我需要将其拆分为以下两个字符串：

"abc"
"cd - e"
我在Perl中尝试了几个选项，但不能满足我需要的一个。有人能给我指路吗？谢谢

perl

来源：https://stackoverflow.com/questions/29041011/perl-split-string-that-has-double-quotes-and-space

4条答案

按热度按时间

7ivaypg91#

你可以在前面是"，后面是"的空白上拆分：

use strict;
use warnings; 

my $s = '"abc" "cd - e"';
my @matches = split /(?<=")\s+(?=")/, $s;
# "abc"
# "cd - e"

赞(0）回复(0）举报 2023-06-06

disho6za2#

my @strings = $input =~ /"[^"]*"/g;

假设输入有效。基本上，您可以使用正则表达式匹配来验证或提取，但同时执行这两项操作非常困难。
假设带引号的字段不能包含引号，因为您没有提到转义机制。

赞(0）回复(0）举报 2023-06-06

hrysbysz3#

如果你的输入像你建议的那样有两个字符串（而不是任意的 n 字符串），那么这应该可以：

$s = '"abc" "cd - e"';

$s =~ /(".*") (".*")/;
$s1 = $1;
$s2 = $2;

或者，您可以通过将.替换为“non-quote”来使其更安全一些，即。[^"]：

$s =~ /("[^"]*") ("[^"]*")/;
$s1 = $1;
$s2 = $2;

赞(0）回复(0）举报 2023-06-06

pw9qyyiw4#

下面是split_line函数的一个不那么小的实现，它处理引号和转义空格。

sub split_line {
    my $string = shift;
    my $orig_line = $string;

    my $accumulated = '';
    my @result      = ();
    my $in_str      = 0;
    my $sep_char    = '';
    for my $tok ( split /\s/, $string ) {

        # Found a string boundary in this token
        if ( $tok =~ /'|"/ ) {
            my $orig_sep_char = $sep_char;

            # Check that we are not mismatching simple and double quote
            if ( $tok =~ /'/ ) {
                die "Simple quote (') matched with double quote (\") in $orig_line" if ( $sep_char eq '"' );
                $sep_char = "'";
            }
            if ( $tok =~ /"/ ) {
                die "Double quote (\") matched with simple quote (') in $orig_line" if ( $sep_char eq "'" );
                $sep_char = '"';
            }

            die "Please don't mix quotes and escaped spaces"                          if ( $tok =~ /\\$/ );

            # Cleanup the sep char
            $tok =~ s/"|'//;

            if ( $tok =~ s/('|")// ) { # Two quotes in the same chunk. Deal with it if it's eg: >>"something"<<
                die "Mismatch between simple quote (') and double quote (\") in $orig_line" if ($sep_char ne $1);
                die "Please don't use more than two quote signs per elements, that's too hard to parse." if ( $tok =~ /'|"/ );

                $sep_char = $orig_sep_char; # Revert the fact that we are entering a quote

                # Deal with that chunk as if it were not quoted
                if ($in_str) {
                    $accumulated .= " $tok";
                } elsif ( length $accumulated ) {
                    push @result, "$accumulated $tok";
                    $accumulated = "";
                } else {
                    push @result, $tok;
                }
                next;
            }

            # Accumulate the string if entering the string (in_str = false before that chunk),
            # or push the previously accumulated things if existing the string (in_str = true previously).
            if ($in_str) {
                push @result, "$accumulated $tok";
                $accumulated = "";
                $sep_char    = '';
            } else {
                $accumulated = $tok;
            }
            $in_str = not $in_str;
            next;
        }

        # This token is ended with an escaped space
        if ( $tok =~ /\\$/ ) {
            chop $tok;
            $accumulated = ( length $accumulated ? "$accumulated " : '' ) . $tok;
            next;
        }

        # Currently within a string, no boundary in sight
        if ($in_str) {
            $accumulated .= " $tok";
            next;
        }

        # Nothing specific about this item
        if ( length $accumulated ) {
            push @result, "$accumulated $tok";
            $accumulated = "";
        } else {
            push @result, $tok;
        }
    }
    die "Expecting end of quote" if $in_str;
    return @result;
}

下面是一个使用示例：

print join "#", split_line("a  'b c   c d  ' 'titi' \"toto\" e\\ f  g\\  h ij'k l'  m\"n \" 'l '");
print "#\n";

这将显示以下内容：

a##b c   c d  #titi#toto#e f##g #h#ijk l##mn #l #

这种实现并不完美。以下是一些遗留问题：

报价不能混合：>>“da 'd d'<<无效
空元素将被忽略：>>\ a<<结果为“a”而不是“a”
引号和空格转义不能混合使用：>>“a B“c\ d<<无效

为了记录在案，我在我的项目中实现了这一点。github上的代码可能会在未来发展以解决问题。

赞(0）回复(0）举报 2023-06-06

我来回答

Perl Split字符串有双引号和空格

4条答案

相关问题

热门标签

最新问答