sub split_line {
my $string = shift;
my $orig_line = $string;
my $accumulated = '';
my @result = ();
my $in_str = 0;
my $sep_char = '';
for my $tok ( split /\s/, $string ) {
# Found a string boundary in this token
if ( $tok =~ /'|"/ ) {
my $orig_sep_char = $sep_char;
# Check that we are not mismatching simple and double quote
if ( $tok =~ /'/ ) {
die "Simple quote (') matched with double quote (\") in $orig_line" if ( $sep_char eq '"' );
$sep_char = "'";
}
if ( $tok =~ /"/ ) {
die "Double quote (\") matched with simple quote (') in $orig_line" if ( $sep_char eq "'" );
$sep_char = '"';
}
die "Please don't mix quotes and escaped spaces" if ( $tok =~ /\\$/ );
# Cleanup the sep char
$tok =~ s/"|'//;
if ( $tok =~ s/('|")// ) { # Two quotes in the same chunk. Deal with it if it's eg: >>"something"<<
die "Mismatch between simple quote (') and double quote (\") in $orig_line" if ($sep_char ne $1);
die "Please don't use more than two quote signs per elements, that's too hard to parse." if ( $tok =~ /'|"/ );
$sep_char = $orig_sep_char; # Revert the fact that we are entering a quote
# Deal with that chunk as if it were not quoted
if ($in_str) {
$accumulated .= " $tok";
} elsif ( length $accumulated ) {
push @result, "$accumulated $tok";
$accumulated = "";
} else {
push @result, $tok;
}
next;
}
# Accumulate the string if entering the string (in_str = false before that chunk),
# or push the previously accumulated things if existing the string (in_str = true previously).
if ($in_str) {
push @result, "$accumulated $tok";
$accumulated = "";
$sep_char = '';
} else {
$accumulated = $tok;
}
$in_str = not $in_str;
next;
}
# This token is ended with an escaped space
if ( $tok =~ /\\$/ ) {
chop $tok;
$accumulated = ( length $accumulated ? "$accumulated " : '' ) . $tok;
next;
}
# Currently within a string, no boundary in sight
if ($in_str) {
$accumulated .= " $tok";
next;
}
# Nothing specific about this item
if ( length $accumulated ) {
push @result, "$accumulated $tok";
$accumulated = "";
} else {
push @result, $tok;
}
}
die "Expecting end of quote" if $in_str;
return @result;
}
下面是一个使用示例:
print join "#", split_line("a 'b c c d ' 'titi' \"toto\" e\\ f g\\ h ij'k l' m\"n \" 'l '");
print "#\n";
4条答案
按热度按时间7ivaypg91#
你可以在前面是
"
,后面是"
的空白上拆分:disho6za2#
hrysbysz3#
如果你的输入像你建议的那样有两个字符串(而不是任意的 n 字符串),那么这应该可以:
或者,您可以通过将
.
替换为“non-quote”来使其更安全一些,即。[^"]
:pw9qyyiw4#
下面是split_line函数的一个不那么小的实现,它处理引号和转义空格。
下面是一个使用示例:
这将显示以下内容:
这种实现并不完美。以下是一些遗留问题:
为了记录在案,我在我的项目中实现了这一点。github上的代码可能会在未来发展以解决问题。