我如何创建一个带有JSON对象的字典的sqlite3表,其中的键可能不匹配每一个项目?

6fe3ivhb  于 12个月前  发布在  SQLite
关注(0)|答案(1)|浏览(99)

我目前正在尝试创建一系列表。前两个表填充良好,因为每个条目都有与创建表时匹配的键。
第一个表:1行,包含3列(object、has_more、data)
第二个表:几行,11列(对象,id,类型,updated_at,uri,名称,描述,大小,download_uri,内容类型,内容编码)
第三张table:使用与前面相同的方法,我提取了键并连接为字符串,用于“CREATE TABLE”语句。这对第一对条目有效,但接下来的JSON对象有不同的键,这导致了错误,“table有64列,但提供了60个值”。我假设这可能会发生在更多条目,无论是更多还是更少的列/值。
代码如下:

import requests
import urllib
import json
import sqlite3
from os import path, replace

BulkDATAurl = "https://api.scryfall.com/bulk-data"      # Store URL path as (str)
BulkDataJSON = json.load(urllib.request.urlopen(BulkDATAurl))

#OPEN DATABASE CONNECTION AND SET CURSOR
connection = sqlite3.connect("JSONTestDB")
cursor = connection.cursor()

#CREATE MAIN BULKDATA TABLE - this works but open to other suggestions
columns = ", ".join('"' + str(x) + '"' for x in BulkDataJSON.keys())
values = ", ".join('"' + str(x) + '"' for x in BulkDataJSON.values())
CreateTableCmd = "CREATE TABLE IF NOT EXISTS TestTable (" + columns + ")"
InsertTableCmd = "INSERT INTO %s (%s) VALUES (%s);" % ('TestTable', columns, values) 
cursor.execute(CreateTableCmd)
cursor.execute(InsertTableCmd)
connection.commit()

#CREATE BULKDATA DATA TABLE - also works like previous
columns = ", ".join('"' + str(x) + '"' for x in  BulkDataJSON['data'][0].keys())
CreateTableCmd = "CREATE TABLE IF NOT EXISTS TestTableData (" + columns + ")"
cursor.execute(CreateTableCmd)
i=0
for item in BulkDataJSON['data']:    
    values = ", ".join('"' + str(x) + '"' for x in  BulkDataJSON['data'][i].values())
    i += 1
    InsertTableCmd = "INSERT INTO %s (%s) VALUES (%s);" % ('TestTableData', columns, values) 
    cursor.execute(InsertTableCmd)
connection.commit()

#CREATE TABLE FROM EXTRACTED URL - this is where my problem happens, 3rd item has less keys, some of which dont even match
oracleURL = BulkDataJSON['data'][2]['download_uri']
oracleJSON = json.load(urllib.request.urlopen(oracleURL))
columns = ", ".join('"' + str(x) + '"' for x in oracleJSON[00000].keys())
CreateTableCmd = "CREATE TABLE IF NOT EXISTS TestTableOracle (" + columns + ")"
cursor.execute(CreateTableCmd)

i=0
for item in oracleJSON[00000]:    
    values =  ", ".join('"' + str(x).replace('"', '-') + '"' for x in  oracleJSON[i].values())
    i += 1
    InsertTableCmd = "INSERT INTO %s (%s) VALUES (%s);" % ('TestTableOracle', columns, values) 
    cursor.execute(InsertTableCmd)
    connection.commit()

connection.commit()
connection.close()

字符串
从这里我不完全确定要去哪里,或者如果有一个更有效的方法来执行我试图做什么,任何帮助将不胜感激.也道歉,如果有帖子,涵盖这一点,我可能错过了它或误解了它.

qv7cva1a

qv7cva1a1#

现在我已经看到了一个实际的终端,下面是对代码的重构,它运行得很好。
无论如何,我们的想法是,一旦我们从API下载了一个文件,我们就可以通过它来获得完整的列列表,基于此创建一个表,然后使用相同的列列表来生成行(其中包含None s,用于特定原始行中不存在的列;并且在默认情况下,SQLite无法处理的数据将被转换回JSON存储)。

import json
import sqlite3

import requests

connection = sqlite3.connect("JSONTestDB.sqlite3")

def jsonify_complex_value(val):
    if isinstance(val, (dict, list)):
        return json.dumps(val)
    return val

def create_table_from_list(table_name, list_url):
    print(f"Downloading {table_name} from {list_url}")
    data_resp = requests.get(list_url)
    data_resp.raise_for_status()
    data = data_resp.json()
    assert isinstance(data, list)
    # Get all the columns
    all_columns = set()
    for row in data:
        all_columns.update(row.keys())

    # Some columns may be reserved words in SQLite, so we need to quote them
    # (but here's hoping there's nothing that needs more escaping)
    columns_for_sql = ",".join(f'"{col}"' for col in all_columns)
    qmarks_for_sql = ", ".join("?" * len(all_columns))

    print(f"{table_name} has {len(all_columns)} columns and {len(data)} rows")

    cursor = connection.cursor()
    # Do everything here in a transaction for speed
    cursor.execute("BEGIN TRANSACTION")
    # Create the table
    create_table_sql = (
        f"CREATE TABLE IF NOT EXISTS {table_name} ({columns_for_sql})"
    )
    cursor.execute(create_table_sql)

    # Create the insert statement
    insert_sql = f"INSERT INTO {table_name} ({columns_for_sql}) VALUES ({qmarks_for_sql})"

    # Define a generator that will replace missing columns with None;
    # also, convert complex values to JSON.
    row_gen = (
        tuple(jsonify_complex_value(row.get(col)) for col in all_columns)
        for row in data
    )

    # Insert the data by running `insert_sql` in bulk
    cursor.executemany(insert_sql, row_gen)

    # Commit the transaction, we're done
    connection.commit()

def main():
    bulk_data_resp = requests.get("https://api.scryfall.com/bulk-data")
    bulk_data_resp.raise_for_status()
    for obj in bulk_data_resp.json()["data"]:
        table_name = obj["type"]
        url = obj["download_uri"]
        if table_name not in ("oracle_cards", "default_cards"):
            # Skip things we don't care about for this example
            print("Skipping", table_name)
            continue
        create_table_from_list(table_name, url)

if __name__ == "__main__":
    main()

字符串
要验证数据是否存在,并且可能包含蛇发女妖:

$ sqlite3 JSONTestDB.sqlite3 'select name from oracle_cards order by random() limit 1;'
Vraska the Unseen

相关问题