Python将numpy数组插入sqlite3数据库

lb3vh1jj  于 2023-03-08  发布在  Python
关注(0)|答案(6)|浏览(212)

我尝试在sqlite3数据库中存储一个约1000个浮点数的numpy数组,但一直收到错误"InterfaceError:绑定参数1时出错-类型""可能不受支持。
我的印象是BLOB数据类型可以是任何类型,但它绝对不能用于numpy数组。

import sqlite3 as sql
import numpy as np
con = sql.connect('test.bd',isolation_level=None)
cur = con.cursor()
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None,np.arange(0,500,0.5)))
con.commit()

有没有其他模块可以让我把numpy数组放到表中?或者我可以把numpy数组转换成sqlite可以接受的Python形式(比如可以拆分的列表或字符串)?性能不是优先考虑的,我只想让它工作!
谢谢!

7rfyedvj

7rfyedvj1#

您可以使用sqlite3注册一个新的array数据类型:

import sqlite3
import numpy as np
import io

def adapt_array(arr):
    """
    http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
    """
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read())

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    return np.load(out)

# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)

# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)

x = np.arange(12).reshape(2,6)

con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (arr array)")

使用此设置,您可以简单地插入NumPy数组,而无需更改语法:

cur.execute("insert into test (arr) values (?)", (x, ))

并直接从sqlite中检索数组作为NumPy数组:

cur.execute("select arr from test")
data = cur.fetchone()[0]

print(data)
# [[ 0  1  2  3  4  5]
#  [ 6  7  8  9 10 11]]
print(type(data))
# <type 'numpy.ndarray'>
j9per5c4

j9per5c42#

我认为matlab是存储和检索numpy数组的一种非常方便的方式。它真的非常,而且磁盘和内存占用量是相当相同的。

(图片来自mverleg benchmarks
但是如果出于任何原因需要将numpy数组存储到SQLite中,我建议添加一些压缩功能。
unutbu代码中的额外代码行非常简单

compressor = 'zlib'  # zlib, bz2

def adapt_array(arr):
    """
    http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
    """
    # zlib uses similar disk size that Matlab v5 .mat files
    # bz2 compress 4 times zlib, but storing process is 20 times slower.
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read().encode(compressor))  # zlib, bz2

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    out = io.BytesIO(out.read().decode(compressor))
    return np.load(out)

使用MNIST数据库测试的结果如下:

$ ./test_MNIST.py
[69900]:  99% remain: 0 secs   
Storing 70000 images in 379.9 secs
Retrieve 6990 images in 9.5 secs
$ ls -lh example.db 
-rw-r--r-- 1 agp agp 69M sep 22 07:27 example.db
$ ls -lh mnist-original.mat 
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat

使用`zlib`,以及

$ ./test_MNIST.py
[69900]: 99% remain: 12 secs
Storing 70000 images in 8536.2 secs
Retrieve 6990 images in 37.4 secs
$ ls -lh example.db
-rw-r--r-- 1 agp agp 19M sep 22 03:33 example.db
$ ls -lh mnist-original.mat
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat


使用`bz2`
在SQLite上比较`Matlab V5`和`bz2`,bz2压缩大约是2.8,但是访问时间比Matlab格式长(几乎是瞬间的,而不是超过30秒)。也许只适用于真正巨大的数据库,在这些数据库中,学习过程比访问时间耗时更长,或者数据库占用空间需要尽可能小。
最后请注意,`bipz/zlib`比率大约为3.7,`zlib/matlab`需要多30%的空间。
如果你想玩自己的完整代码是:

import sqlite3
import numpy as np
import io

compressor = 'zlib' # zlib, bz2

def adapt_array(arr):
"""
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
"""
# zlib uses similar disk size that Matlab v5 .mat files
# bz2 compress 4 times zlib, but storing process is 20 times slower.
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read().encode(compressor)) # zlib, bz2

def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
out = io.BytesIO(out.read().decode(compressor))
return np.load(out)

sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", convert_array)

dbname = 'example.db'
def test_save_sqlite_arrays():
"Load MNIST database (70000 samples) and store in a compressed SQLite db"
os.path.exists(dbname) and os.unlink(dbname)
con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (idx integer primary key, X array, y integer );")

mnist = fetch_mldata('MNIST original')

X, y =  mnist.data, mnist.target
m = X.shape[0]
t0 = time.time()
for i, x in enumerate(X):
    cur.execute("insert into test (idx, X, y) values (?,?,?)",
                (i, y, int(y[i])))
    if not i % 100 and i > 0:
        elapsed = time.time() - t0
        remain = float(m - i) / i * elapsed
        print "\r[%5d]: %3d%% remain: %d secs" % (i, 100 * i / m, remain),
        sys.stdout.flush()

con.commit()
con.close()
elapsed = time.time() - t0
print
print "Storing %d images in %0.1f secs" % (m, elapsed)

def test_load_sqlite_arrays():
"Query MNIST SQLite database and load some samples"
con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()

# select all images labeled as '2'
t0 = time.time()
cur.execute('select idx, X, y from test where y = 2')
data = cur.fetchall()
elapsed = time.time() - t0
print "Retrieve %d images in %0.1f secs" % (len(data), elapsed)

if name == 'main':
test_save_sqlite_arrays()
test_load_sqlite_arrays()

drnojrws

drnojrws3#

这对我很有效:

import sqlite3 as sql
import numpy as np
import json
con = sql.connect('test.db',isolation_level=None)
cur = con.cursor()
cur.execute("DROP TABLE FOOBAR")
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None, json.dumps(np.arange(0,500,0.5).tolist())))
con.commit()
cur.execute("SELECT * FROM FOOBAR")
data = cur.fetchall()
print data
data = cur.fetchall()
my_list = json.loads(data[0][1])
aelbi1ox

aelbi1ox4#

Happy Leap Second已经很接近了,但是我一直在自动转换为字符串。另外,如果你看看这个其他的帖子:a fun debate on using buffer or Binary to push non text data into sqlite您可以看到,文档中说明的方法是完全避免使用缓冲区,而是使用这段代码。

def adapt_array(arr):
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read())

我还没有在python3中进行大量测试,但它似乎可以在python2.7中工作

r7knjye2

r7knjye25#

其他指定的方法对我不起作用。现在似乎有一个numpy.tobytes方法和一个numpy.fromstring(适用于字节字符串),但已被弃用,推荐的方法是numpy.frombuffer

import sqlite3
import numpy as np

sqlite3.register_adapter(np.array, lambda arr: arr.tobytes())    
sqlite3.register_converter("array", np.frombuffer)

我已经在我的应用程序中进行了测试,它在Python 3.7.3numpy 1.16.2上运行良好
numpy.fromstring提供沿着DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead相同的输出

qgzx9mmu

qgzx9mmu6#

准备使用基于@unutbu答案的代码(稍微清理了一下,不需要查找等),并使用2D ndarray进行测试:

import sqlite3, numpy as np, io

def adapt_array(arr):
    out = io.BytesIO()
    np.save(out, arr)
    return sqlite3.Binary(out.getvalue())

sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", lambda x: np.load(io.BytesIO(x)))

x = np.random.rand(100, 100)
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
con.execute("create table test (arr array)")
con.execute("insert into test (arr) values (?)", (x, ))
for r in con.execute("select arr from test"):
    print(r[0])

当且 * 仅当 * 您只使用一维数组时,您可以使用此函数(参见@gavin 's answer):

sqlite3.register_adapter(np.ndarray, lambda arr: arr.tobytes())
sqlite3.register_converter("array", np.frombuffer)

相关问题