bounty将在7天后过期。回答此问题可获得+50的声誉奖励。Mark正在寻找来自声誉良好来源的答案。
我有一个数据集,它主要是一个2D表,但是一个列(Field
)(称为属性)在每个单元格中包含Structs
的List
。每个Struct
有三个Field
:属性标签、属性类型和属性值。
属性Field
的定义是:
/**
* Attribute Tag - Two character tag.
*/
public static final Field ATTRIBUTE_TAG_FIELD =
new Field("AttributeTag", FieldType.notNullable(new ArrowType.FixedSizeBinary(2)), null);
/**
* Attribute Type - One character type.
*/
// todo this could be dictionary encoded but would require building a
dictionary which requires access to the allocator
public static final Field ATTRIBUTE_TYPE_FIELD =
new Field(
"AttributeType",
new FieldType(false,
new ArrowType.FixedSizeBinary(1), null),
null
);
/**
* String representation of the Attribute value.
*/
public static final Field ATTRIBUTE_VALUE_FIELD = new Field("AttributeValue", FieldType.notNullable(new ArrowType.Utf8()), null);
/**
* The field is a nullable List of Structs each with an attribute tag,
type and value.
*/
public static final Field ATTRIBUTES_FIELD =
new Field("Attributes", FieldType.nullable(new ArrowType.List()), List.of(
new Field("Attribute", FieldType.nullable(new ArrowType.Struct()), List.of(
ATTRIBUTE_TAG_FIELD, ATTRIBUTE_TYPE_FIELD, ATTRIBUTE_VALUE_FIELD))));
我有这样一段代码,它试图从一些源数据填充属性。尽管运行时不会产生错误,但它不会在属性向量中产生任何值。
final ListVector attributes = (ListVector)
ATTRIBUTES_FIELD.createVector(allocator);
// this is the source of the attributes that I will populate into the
attributes vector
final List<SAMRecord.SAMTagAndValue> recordAttributes =
samRecord.getAttributes();
if (recordAttributes != null && recordAttributes.size() > 0 ) {
final UnionListWriter listWriter = attributes.getWriter();
listWriter.allocate();
IntStream.range(0, recordAttributes.size()).forEachOrdered(attributeIndex -> {
listWriter.setPosition(attributeIndex);
listWriter.startList();
// put the values of the attribute in the arrow struct
final SAMRecord.SAMTagAndValue samTagAndValue recordAttributes.get(attributeIndex);
// I think the problem is here. In a debugger this seems to create a new writer not related to my Vector??
final BaseWriter.StructWriter structWriter = listWriter.struct("Attribute");
structWriter.start();
final byte[] tagBytes =
samTagAndValue.tag.getBytes(StandardCharsets.UTF_8);
// todo find out the type from the value
final byte[] typeBytes = "S".getBytes(StandardCharsets.UTF_8);
final byte[] valueBytes =
samTagAndValue.value.toString().getBytes(StandardCharsets.UTF_8);
ArrowBuf tempBuf = allocator.buffer(tagBytes.length);
tempBuf.setBytes(0, tagBytes);
structWriter.varChar("AttributeTag").writeVarChar(0, tagBytes.length, tempBuf);
tempBuf.close();
tempBuf = allocator.buffer(typeBytes.length);
structWriter.varChar("AttributeType").writeVarChar(0, typeBytes.length, tempBuf);
tempBuf.close();
tempBuf = allocator.buffer(valueBytes.length);
structWriter.varChar("AttributeValue").writeVarChar(0, valueBytes.length, tempBuf);
tempBuf.close();
structWriter.end();
});
listWriter.setValueCount(recordAttributes.size());
listWriter.end();
}
为什么attributes
ListVector
中没有任何值?正确的方法是什么?
1条答案
按热度按时间rxztt3cl1#
看起来问题可能与列表编写器的使用方式有关。当您调用
listWriter.struct("Attribute")
时,它会创建一个与vector的struct字段无关的新struct writer示例。相反,您应该使用listWriter.struct()
来获取与vector的struct字段关联的struct writer示例。下面是如何修改代码来解决这个问题:
在这个修改后的代码中,
listWriter.struct()
用于获取与vector的struct字段相关联的struct writer示例。代码的其余部分与原始代码类似,但有一些额外的更改,以确保ArrowBuf示例使用正确的字节数组进行初始化。