如何将numpy数组写入avro文件?

8oomwypt  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(91)

我想将一个numpy数组写入一个avro文件。下面是一个numpy数组的小例子:

import numpy as np
import random
np_array = np.zeros((4,3), dtype=np.float32)
    for i in range(4):
        for j in range(3):
            np_array[i, j] = random.gauss(0, 1)
print(np_array)

字符串
输出量:

[[ 0.6490377   0.29544145 -1.109375  ]
 [ 1.0881975  -0.39123887 -0.36691198]
 [-1.2226632   0.8332004   0.2686829 ]
 [ 1.5417658   0.4520132  -0.03081623]]


在我的用例中,numpy数组有500万行和128列,所以如果可能的话,我想直接将数组写入avro,而不花费内存将其转换为字典和/或Pandas DataFrame。

gmol1639

gmol16391#

我回答了我自己的问题!这个解决方案将一个2D numpy数组写入avro而不进行任何转换。

import numpy as np
import random
np_array = np.zeros((4,3), dtype=np.float32)
for i in range(4):
    for j in range(3):
        np_array[i, j] = random.gauss(0, 1)
print(np_array)

字符串
输出量:

[[ 0.6490377   0.29544145 -1.109375  ]
 [ 1.0881975  -0.39123887 -0.36691198]
 [-1.2226632   0.8332004   0.2686829 ]
 [ 1.5417658   0.4520132  -0.03081623]]
import fastavro
schema_dict = {
    "doc": "test",
    "name": "test",
    "namespace": "test",
    "type": "array",
    "items": "float"
}
schema = fastavro.parse_schema(schema_dict)
with open(<filepath>, "wb") as f:
    fastavro.writer(f, schema, np_array)

with open(<filepath>, "rb") as f:
    reader = fastavro.reader(f)
    for record in reader:
        print(record)

的数据
输出量:

[ 0.6490377   0.29544145 -1.109375  ]
[ 1.0881975  -0.39123887 -0.36691198]
[-1.2226632   0.8332004   0.2686829 ]
[ 1.5417658   0.4520132  -0.03081623]

相关问题