perl诅咒和unicode:为什么addstr打印很好而addstring打印垃圾?

js81xvg6  于 2023-01-09  发布在  Perl
关注(0)|答案(1)|浏览(121)

addstr-代码,输出:

use Curses;
initscr;
addstr 0, 0, 'Ж 会 र';
addstr 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getch;
endwin;
Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd

addstring-代码,输出:
一个二个一个一个
为什么会观察到这种行为?
既然addstr是遗留的,而addstring是为了支持unicode的,那么它不应该反过来吗?
https://metacpan.org/pod/Curses#Wide-Character-Aware-Functions
https://metacpan.org/pod/Curses#Available-Wide-Character-Aware-Functions

更新:

更广泛的示例,使用unicode字符串:

  • 硬编码,
  • 取自变量
  • 作为CLI参数传递
  • 通过反勾从文件中读取
  • 通过open从文件读取

我们需要一个文件与unicode字符串:

echo -n 'Ж 会 र' > unicode.string.txt

案例1:addstr,无其他声明:

use Curses;

my $unicode_string_variable  = 'Ж 会 र';
my $unicode_string_argv      = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;

# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
  print $hardcoded_handle 'Ж 会 र';
  close $hardcoded_handle;
open my  $variable_handle, '>', 'unicode.string.variable'  || die;
  print  $variable_handle $unicode_string_variable;
  close  $variable_handle;
open my      $argv_handle, '>', 'unicode.string.argv'      || die;
  print      $argv_handle $unicode_string_argv;
  close      $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
  print $backticks_handle $unicode_string_backticks;
  close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
  print $open_pipe_handle $unicode_string_open_pipe;
  close $open_pipe_handle;

# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv     ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;

initscr;

# print unicode to Curses
addstr 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstr 1, 0, 'variable : ' . $unicode_string_variable;
addstr 2, 0, 'argv     : ' . $unicode_string_argv;
addstr 3, 0, 'backticks: ' . $unicode_string_backticks;
addstr 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;

addstr 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";

getchar;
endwin;

运行:

perl curses-unicode.addstr.pl 'Ж 会 र'

诅咒输出,所有工作unicode:

hardcoded: Ж 会 र
variable : Ж 会 र
argv     : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd

STDOUT输出,全工作unicode:

hardcoded: Ж 会 र
variable : Ж 会 र
argv     : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र

文件输出,所有工作unicode:

cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र

案例2:addstring,无其他声明:

use Curses;

my $unicode_string_variable  = 'Ж 会 र';
my $unicode_string_argv      = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;

# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
  print $hardcoded_handle 'Ж 会 र';
  close $hardcoded_handle;
open my  $variable_handle, '>', 'unicode.string.variable'  || die;
  print  $variable_handle $unicode_string_variable;
  close  $variable_handle;
open my      $argv_handle, '>', 'unicode.string.argv'      || die;
  print      $argv_handle $unicode_string_argv;
  close      $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
  print $backticks_handle $unicode_string_backticks;
  close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
  print $open_pipe_handle $unicode_string_open_pipe;
  close $open_pipe_handle;

# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv     ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;

initscr;

# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv     : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;

addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";

getchar;
endwin;

运行:

perl curses-unicode.addstring.pl 'Ж 会 र'

诅咒输出,全断unicode::

hardcoded: Ð~V ä¼~Z र
variable : Ð~V ä¼~Z र
argv     : Ð~V ä¼~Z र
backticks: Ð~V ä¼~Z र
open_pipe: Ð~V ä¼~Z र
Curses 1.43, perl v5.36.0, OS: openbsd

STDOUT输出,所有工作unicode::

hardcoded: Ж 会 र
variable : Ж 会 र
argv     : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र

文件输出,所有工作unicode:

cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र

病例3:addstring、附加声明use utf8-CA:encoding(UTF-8)

use utf8;
use Curses;

my $unicode_string_variable  = 'Ж 会 र';
my $unicode_string_argv      = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|:encoding(UTF-8)', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;

# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
  print $hardcoded_handle 'Ж 会 र';
  close $hardcoded_handle;
open my  $variable_handle, '>', 'unicode.string.variable'  || die;
  print  $variable_handle $unicode_string_variable;
  close  $variable_handle;
open my      $argv_handle, '>', 'unicode.string.argv'      || die;
  print      $argv_handle $unicode_string_argv;
  close      $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
  print $backticks_handle $unicode_string_backticks;
  close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
  print $open_pipe_handle $unicode_string_open_pipe;
  close $open_pipe_handle;

# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv     ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;

initscr;

# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv     : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;

addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";

getchar;
endwin;

运行:

perl -CA curses-unicode.addstring.utf8,CA,encodingUTF8.pl 'Ж 会 र'

诅咒输出,部分工作,部分损坏unicode::

hardcoded: Ж 会 र
variable : Ж 会 र
argv     : Ж 会 र
backticks: Ð~V ä¼~Z र
open_pipe: Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd

STDOUT和STDERR输出,所有工作统一码:

Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 12, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 15, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 18, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 24, <$open_pipe_read_handle> line 1.
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 28, <$open_pipe_read_handle> line 1.
hardcoded: Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 29, <$open_pipe_read_handle> line 1.
variable : Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 30, <$open_pipe_read_handle> line 1.
argv     : Ж 会 र
backticks: Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 32, <$open_pipe_read_handle> line 1.
open_pipe: Ж 会 र

文件输出,所有工作unicode:

cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र
  • 为什么unicode只适用于STDOUT,并且在所有3种情况下都可以写入文件,而Curses却不起作用?Curses有什么特别之处?考虑到STDOUT和文件都可以,这难道不是Curses中的某种bug吗?
  • 是否有一个单独的地方启用unicode或需要您单独指定;式中为均匀度;为什么?:
  • use utf8表示源代码中的unicode;
  • -CA用于cli参数;
  • :encoding(UTF-8)用于open
  • 如何修复unicode的反勾号?
  • STDERR上的Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line ..., <$open_pipe_read_handle> line 1.是什么,如何摆脱这些?
uqxowvwt

uqxowvwt1#

您需要use utf8;杂注:

use utf8;
use Curses;
initscr;
addstring 0, 0, 'Ж 会 र';
addstring 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getch;
endwin;

输出:

Ж 会 र
Curses 1.43, perl v5.34.0, OS: linux

为什么addstr版本可以工作可能是运气的问题(在我的系统上只有第三个字符正确显示)。
如果你想把$ARGV中的命令行参数作为utf8来处理,那么你需要一种不同的方法,一种方法是显式地调用Perl,把-C标志设置为A32(这是一个控制$ARGV编码的特殊设置),或者把终端中的PERL_UNICODE环境变量设置为A。
或者,您可以从代码中重新编码$ARGV

use Encode qw(decode_utf8);
@ARGV = map { decode_utf8($_, 1) } @ARGV;

在这种情况下,您不需要命令行标志。
此替代方法也适用于反记号替换:

use Encode qw(decode_utf8);
my $unicode_string_backticks = decode_utf8(`cat unicode.string.txt`, 1);

来源:https://www.perl.com/pub/2012/04/perlunicookbook-decode-argv-as-utf8.html/
然而,有一个更简单的解决方案,可以同时为硬编码字符串、argv、文件句柄、printf、反勾号等设置utf8,这个解决方案就是utf8::all模块。使用这个模块,你不需要命令行标志或Encode模块。-因为它针对STDOUT,所以关于宽字符的警告也得到了解决。

use utf8::all;
use Curses;

my $unicode_string_variable  = 'Ж 会 र';
my $unicode_string_argv      = $ARGV[0];
#my $unicode_string_backticks = decode_utf8(`cat unicode.string.txt`,1);
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|:encoding(UTF-8)', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;

# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
  print $hardcoded_handle 'Ж 会 र';
  close $hardcoded_handle;
open my  $variable_handle, '>', 'unicode.string.variable'  || die;
  print  $variable_handle $unicode_string_variable;
  close  $variable_handle;
open my      $argv_handle, '>', 'unicode.string.argv'      || die;
  print      $argv_handle $unicode_string_argv;
  close      $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
  print $backticks_handle $unicode_string_backticks;
  close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
  print $open_pipe_handle $unicode_string_open_pipe;
  close $open_pipe_handle;

# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv     ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;

initscr;

# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv     : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;

addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";

getchar;
endwin;

来源:https://blog.ostermiller.org/perl-wide-character-in-print/
如果由于某种原因你不能或者不想安装这个模块,那么use utf8;和命令行标记-CSDA也可以解决所有问题。注意,有了这些命令行标记,你不应该在代码中使用decode_utf8()

相关问题