本文整理了Java中com.google.common.base.Utf8
类的一些代码示例,展示了Utf8
类的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Utf8
类的具体详情如下:
包路径:com.google.common.base.Utf8
类名称:Utf8
[英]Low-level, high-performance utility methods related to the Charsets#UTF_8character encoding. UTF-8 is defined in section D92 of The Unicode Standard Core Specification, Chapter 3.
The variant of UTF-8 implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1. One implication of this is that it rejects "non-shortest form" byte sequences, even though the JDK decoder may accept them.
[中]与字符集#UTF_8字符编码相关的低级别、高性能实用方法。UTF-8在The Unicode Standard Core Specification, Chapter 3的D92节中定义。
这个类实现的UTF-8的变体是Unicode 3.1中引入的UTF-8的受限定义。这意味着它拒绝"non-shortest form"字节序列,即使JDK解码器可能会接受它们。
代码示例来源:origin: google/guava
/**
* Returns {@code true} if {@code bytes} is a <i>well-formed</i> UTF-8 byte sequence according to
* Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be
* decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte
* sequences, but encoding never reproduces these. Such byte sequences are <i>not</i> considered
* well-formed.
*
* <p>This method returns {@code true} if and only if {@code Arrays.equals(bytes, new
* String(bytes, UTF_8).getBytes(UTF_8))} does, but is more efficient in both time and space.
*/
public static boolean isWellFormed(byte[] bytes) {
return isWellFormed(bytes, 0, bytes.length);
}
代码示例来源:origin: google/error-prone
private boolean isValidTag(String tag) {
return Utf8.encodedLength(tag) <= 23;
}
代码示例来源:origin: google/guava
utf8Length += ((0x7f - c) >>> 31); // branch free!
} else {
utf8Length += encodedLengthGeneral(sequence, i);
break;
代码示例来源:origin: google/guava
/**
* Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
* {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
* isWellFormed(bytes)} is true.
*
* @param bytes the input buffer
* @param off the offset in the buffer of the first byte to read
* @param len the number of bytes to read from the buffer
*/
public static boolean isWellFormed(byte[] bytes, int off, int len) {
int end = off + len;
checkPositionIndexes(off, end, bytes.length);
// Look for the first non-ASCII character.
for (int i = off; i < end; i++) {
if (bytes[i] < 0) {
return isWellFormedSlowPath(bytes, i, end);
}
}
return true;
}
代码示例来源:origin: google/guava
private static int encodedLengthGeneral(CharSequence sequence, int start) {
int utf16Length = sequence.length();
int utf8Length = 0;
for (int i = start; i < utf16Length; i++) {
char c = sequence.charAt(i);
if (c < 0x800) {
utf8Length += (0x7f - c) >>> 31; // branch free!
} else {
utf8Length += 2;
// jdk7+: if (Character.isSurrogate(c)) {
if (MIN_SURROGATE <= c && c <= MAX_SURROGATE) {
// Check that we have a well-formed surrogate pair.
if (Character.codePointAt(sequence, i) == c) {
throw new IllegalArgumentException(unpairedSurrogateMsg(i));
}
i++;
}
}
}
return utf8Length;
}
代码示例来源:origin: google/j2objc
/**
* Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
* {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
* isWellFormed(bytes)} is true.
*
* @param bytes the input buffer
* @param off the offset in the buffer of the first byte to read
* @param len the number of bytes to read from the buffer
*/
public static boolean isWellFormed(byte[] bytes, int off, int len) {
int end = off + len;
checkPositionIndexes(off, end, bytes.length);
// Look for the first non-ASCII character.
for (int i = off; i < end; i++) {
if (bytes[i] < 0) {
return isWellFormedSlowPath(bytes, i, end);
}
}
return true;
}
代码示例来源:origin: google/j2objc
private static int encodedLengthGeneral(CharSequence sequence, int start) {
int utf16Length = sequence.length();
int utf8Length = 0;
for (int i = start; i < utf16Length; i++) {
char c = sequence.charAt(i);
if (c < 0x800) {
utf8Length += (0x7f - c) >>> 31; // branch free!
} else {
utf8Length += 2;
// jdk7+: if (Character.isSurrogate(c)) {
if (MIN_SURROGATE <= c && c <= MAX_SURROGATE) {
// Check that we have a well-formed surrogate pair.
if (Character.codePointAt(sequence, i) == c) {
throw new IllegalArgumentException(unpairedSurrogateMsg(i));
}
i++;
}
}
}
return utf8Length;
}
代码示例来源:origin: google/j2objc
/**
* Returns {@code true} if {@code bytes} is a <i>well-formed</i> UTF-8 byte sequence according to
* Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be
* decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte
* sequences, but encoding never reproduces these. Such byte sequences are <i>not</i> considered
* well-formed.
*
* <p>This method returns {@code true} if and only if {@code Arrays.equals(bytes, new
* String(bytes, UTF_8).getBytes(UTF_8))} does, but is more efficient in both time and space.
*/
public static boolean isWellFormed(byte[] bytes) {
return isWellFormed(bytes, 0, bytes.length);
}
代码示例来源:origin: springside/springside4
/**
* 计算字符串被UTF8编码后的字节数 via guava
*
* @see Utf8#encodedLength(CharSequence)
*/
public static int utf8EncodedLength(@Nullable CharSequence sequence) {
if (StringUtils.isEmpty(sequence)) {
return 0;
}
return Utf8.encodedLength(sequence);
}
}
代码示例来源:origin: wildfly/wildfly
/**
* Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
* {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
* isWellFormed(bytes)} is true.
*
* @param bytes the input buffer
* @param off the offset in the buffer of the first byte to read
* @param len the number of bytes to read from the buffer
*/
public static boolean isWellFormed(byte[] bytes, int off, int len) {
int end = off + len;
checkPositionIndexes(off, end, bytes.length);
// Look for the first non-ASCII character.
for (int i = off; i < end; i++) {
if (bytes[i] < 0) {
return isWellFormedSlowPath(bytes, i, end);
}
}
return true;
}
代码示例来源:origin: google/j2objc
utf8Length += ((0x7f - c) >>> 31); // branch free!
} else {
utf8Length += encodedLengthGeneral(sequence, i);
break;
代码示例来源:origin: wildfly/wildfly
private static int encodedLengthGeneral(CharSequence sequence, int start) {
int utf16Length = sequence.length();
int utf8Length = 0;
for (int i = start; i < utf16Length; i++) {
char c = sequence.charAt(i);
if (c < 0x800) {
utf8Length += (0x7f - c) >>> 31; // branch free!
} else {
utf8Length += 2;
// jdk7+: if (Character.isSurrogate(c)) {
if (MIN_SURROGATE <= c && c <= MAX_SURROGATE) {
// Check that we have a well-formed surrogate pair.
if (Character.codePointAt(sequence, i) == c) {
throw new IllegalArgumentException(unpairedSurrogateMsg(i));
}
i++;
}
}
}
return utf8Length;
}
代码示例来源:origin: wildfly/wildfly
/**
* Returns {@code true} if {@code bytes} is a <i>well-formed</i> UTF-8 byte sequence according to
* Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be
* decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte
* sequences, but encoding never reproduces these. Such byte sequences are <i>not</i> considered
* well-formed.
*
* <p>This method returns {@code true} if and only if {@code Arrays.equals(bytes, new
* String(bytes, UTF_8).getBytes(UTF_8))} does, but is more efficient in both time and space.
*/
public static boolean isWellFormed(byte[] bytes) {
return isWellFormed(bytes, 0, bytes.length);
}
代码示例来源:origin: google/guava
public void testEncodedLength_validStrings() {
assertEquals(0, Utf8.encodedLength(""));
assertEquals(11, Utf8.encodedLength("Hello world"));
assertEquals(8, Utf8.encodedLength("Résumé"));
assertEquals(
461,
Utf8.encodedLength(
"威廉·莎士比亞(William Shakespeare,"
+ "1564年4月26號—1616年4月23號[1])係隻英國嗰演員、劇作家同詩人,"
+ "有時間佢簡稱莎翁;中國清末民初哈拕翻譯做舌克斯毕、沙斯皮耳、筛斯比耳、"
+ "莎基斯庇尔、索士比尔、夏克思芘尔、希哀苦皮阿、叶斯壁、沙克皮尔、"
+ "狹斯丕爾。[2]莎士比亞編寫過好多作品,佢嗰劇作響西洋文學好有影響,"
+ "哈都拕人翻譯做好多話。"));
// A surrogate pair
assertEquals(4, Utf8.encodedLength(newString(MIN_HIGH_SURROGATE, MIN_LOW_SURROGATE)));
}
代码示例来源:origin: org.jboss.eap/wildfly-client-all
/**
* Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
* {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
* isWellFormed(bytes)} is true.
*
* @param bytes the input buffer
* @param off the offset in the buffer of the first byte to read
* @param len the number of bytes to read from the buffer
*/
public static boolean isWellFormed(byte[] bytes, int off, int len) {
int end = off + len;
checkPositionIndexes(off, end, bytes.length);
// Look for the first non-ASCII character.
for (int i = off; i < end; i++) {
if (bytes[i] < 0) {
return isWellFormedSlowPath(bytes, i, end);
}
}
return true;
}
代码示例来源:origin: wildfly/wildfly
utf8Length += ((0x7f - c) >>> 31); // branch free!
} else {
utf8Length += encodedLengthGeneral(sequence, i);
break;
代码示例来源:origin: com.diffplug.guava/guava-core
private static int encodedLengthGeneral(CharSequence sequence, int start) {
int utf16Length = sequence.length();
int utf8Length = 0;
for (int i = start; i < utf16Length; i++) {
char c = sequence.charAt(i);
if (c < 0x800) {
utf8Length += (0x7f - c) >>> 31; // branch free!
} else {
utf8Length += 2;
// jdk7+: if (Character.isSurrogate(c)) {
if (MIN_SURROGATE <= c && c <= MAX_SURROGATE) {
// Check that we have a well-formed surrogate pair.
if (Character.codePointAt(sequence, i) == c) {
throw new IllegalArgumentException(unpairedSurrogateMsg(i));
}
i++;
}
}
}
return utf8Length;
}
代码示例来源:origin: google/guava
private static void assertWellFormed(int... bytes) {
assertTrue(Utf8.isWellFormed(toByteArray(bytes)));
}
代码示例来源:origin: google/guava
public void testEncodedLength_validStrings2() {
HashMap<Integer, Integer> utf8Lengths = new HashMap<>();
utf8Lengths.put(0x00, 1);
utf8Lengths.put(0x7f, 1);
utf8Lengths.put(0x80, 2);
utf8Lengths.put(0x7ff, 2);
utf8Lengths.put(0x800, 3);
utf8Lengths.put(MIN_SUPPLEMENTARY_CODE_POINT - 1, 3);
utf8Lengths.put(MIN_SUPPLEMENTARY_CODE_POINT, 4);
utf8Lengths.put(MAX_CODE_POINT, 4);
Integer[] codePoints = utf8Lengths.keySet().toArray(new Integer[] {});
StringBuilder sb = new StringBuilder();
Random rnd = new Random();
for (int trial = 0; trial < 100; trial++) {
sb.setLength(0);
int utf8Length = 0;
for (int i = 0; i < 6; i++) {
Integer randomCodePoint = codePoints[rnd.nextInt(codePoints.length)];
sb.appendCodePoint(randomCodePoint);
utf8Length += utf8Lengths.get(randomCodePoint);
if (utf8Length != Utf8.encodedLength(sb)) {
StringBuilder repro = new StringBuilder();
for (int j = 0; j < sb.length(); j++) {
repro.append(" ").append((int) sb.charAt(j)); // GWT compatible
}
assertEquals(repro.toString(), utf8Length, Utf8.encodedLength(sb));
}
}
}
}
代码示例来源:origin: org.kill-bill.billing/killbill-platform-osgi-bundles-logger
/**
* Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
* {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
* isWellFormed(bytes)} is true.
*
* @param bytes the input buffer
* @param off the offset in the buffer of the first byte to read
* @param len the number of bytes to read from the buffer
*/
public static boolean isWellFormed(byte[] bytes, int off, int len) {
int end = off + len;
checkPositionIndexes(off, end, bytes.length);
// Look for the first non-ASCII character.
for (int i = off; i < end; i++) {
if (bytes[i] < 0) {
return isWellFormedSlowPath(bytes, i, end);
}
}
return true;
}
内容来源于网络,如有侵权,请联系作者删除!