如何在UTF-8缓冲区和普通 Delphi 字符串之间正确转换？

dtcbnfnu 于 2022-11-23 发布在其他

关注(0)|答案(2)|浏览(217)

我准备了一些来自 Delphi 建议的代码片段，没有编译器警告或隐式转换，但结果不满意我。

procedure Convert;
type
  TUTF8Buf = array [0 .. 5] of byte;
var
  s: string;
  sutf8: UTF8String; // manageable UTF-8 string
  utf8str: TUTF8Buf; // unmanageable buffer
begin
  utf8str := Default (TUTF8Buf); // utf8str = (0,0,0,0,0,0)
  s := UTF8ArrayToString(utf8str); // s = #0#0#0#0#0#0
  s := 'abc'; // s = 'abc'
  sutf8 := UTF8Encode(s); // sutf8 = 'abc'
  Move(sutf8[1], utf8str[0], Min(Length(sutf8), sizeof(utf8str) - 1)); // utf8str = (97, 98, 99, 0, 0)
  s := UTF8ArrayToString(utf8str); // s = 'abc'#0#0#0
  s := UTF8ToString(sutf8); // s = 'abc'
end;

这段代码在处理可管理的UTF-8字符串时工作得非常好，但在处理不可管理的缓冲区时总是产生尾随的零。

delphi

来源：https://stackoverflow.com/questions/74496836/how-to-properly-convert-between-a-utf-8-buffer-and-a-common-delphi-string

2条答案

按热度按时间

fnvucqvd1#

UTF8ArrayToString()函数将整个数组作为一个整体进行转换，如果遇到$0字节，它不会停止。您应该使用不同的例程来指定数组中需要转换的字节数，例如Utf8ToUnicode()、UnicodeFromLocaleChars()或TEncoding.UTF8.GetChars()。
也就是说，处理UTF-8最简单的方法就是使用UTF8String本身。RTL知道如何在UnicodeString和UTF8String之间进行隐式转换，让它来为你做这项工作。你不需要UTF8Encode()和UTF8Decode()，因为它们自2009年以来已经被弃用。

赞(0）回复(0）举报 2022-11-23

okxuctiv2#

UTF8String中的字符长度可变，但#$00-#$7F不会出现在多字节字符中。因此，就像AnsiString一样，您可以通过扫描x1m4 n1.来确定长度。StrLen（在最近的版本中移到了AnsiString单元）会为您完成此操作。

function ToUTF8String(const Buf): UTF8String;
var 
  len: Integer;
begin
  len := StrLen(PAnsiChar(@Buf));
  SetLength(Result, len);
  if len > 0 then Move(Buf, Result[1]);
end;

注：应该工作，但我还没有测试它。

赞(0）回复(0）举报 2022-11-23

我来回答

如何在UTF-8缓冲区和普通 Delphi 字符串之间正确转换？

2条答案

相关问题

热门标签

最新问答