如何保存utf-8json值并将其正确写入python3.8.2中的utf-8txt文件?

mbyulnm0  于 2021-07-24  发布在  Java
关注(0)|答案(1)|浏览(303)

我最近编写了一个python脚本,从json文件中提取一些数据,并使用它为以下语句生成一些sql插入值:

INSERT INTO `card`(`artist`,`class_pid`,`collectible`,`cost`, `dbfid`, `api_db_id`, `name`, `rarity`, `cardset_pid`, `cardtype`, `attack`, `health`, `race`, `durability`, `armor`,`multiclassgroup`, `text`) VALUES ("generated entry goes here")

在我的sql表中,某些属性的名称不同,但使用相同的值(json文件/python脚本中的示例cardclass在sql表中称为class\u pid)。从脚本生成的值是有效的sql,可以成功地插入到数据库中,但是我注意到,在生成的export.txt文件中,一些值与原来的值不同。例如,utf-8编码的json文件中的以下json条目:

[{"artist":"Arthur Bozonnet","attack":3,"cardClass":8,"collectible":1,"cost":2,"dbfId":2545,"flavor":"And he can't get up.","health":2,"id":"AT_003","mechanics":["HEROPOWER_DAMAGE"],"name":"Fallen Hero","rarity":"RARE","set":1,"text":"Your Hero Power deals 1 extra damage.","type":"MINION"},{"artist":"Ivan Fomin","attack":2,"cardClass":11,"collectible":1,"cost":2,"dbfId":54256,"flavor":"Were you expectorating another bad pun?","health":4,"id":"ULD_182","mechanics":["TRIGGER_VISUAL"],"name":"Spitting Camel","race":"BEAST","rarity":"COMMON","set":22,"text":"[x]At the end of your turn,\n  deal 1 damage to another  \nrandom friendly minion.","type":"MINION"}]

生成此输出:

('Arthur Bozonnet',8,1,2,'2545','AT_003','Fallen Hero','RARE',1,'MINION',3,2,'NULL',0,0,'NULL','Your Hero Power deals 1\xa0extra damage.'),('Ivan Fomin',11,1,2,'54256','ULD_182','Spitting Camel','COMMON',22,'MINION',2,4,'BEAST',0,0,'NULL','[x]At the end of your turn,\n\xa0\xa0deal 1 damage to another\xa0\xa0\nrandom friendly minion.')

如您所见,json条目中的一些值以某种方式被更改,就好像文本编码在某个地方被更改一样,尽管在我的脚本中,我确保json文件是用utf-8编码打开的,并且生成的文本文件也是用utf-8打开并写入的,以匹配json文件。我的目标是保持值与json文件中的值完全相同,并将这些值与json中的值完全相同地传输到生成的sql值条目中。例如,在生成的sql中,我希望第二个条目的“text”值为:

"[x]At the end of your turn,\n  deal 1 damage to another  \nrandom friendly minion."

而不是:

"[x]At the end of your turn,\n\xa0\xa0deal 1 damage to another\xa0\xa0\nrandom friendly minion."

我尝试使用unicodedata.normalize()等函数,但不幸的是,它似乎没有以任何方式更改输出。这是我编写的用于生成sql值的脚本:

import json
import io

chosen_keys = ['artist','cardClass','collectible','cost',
'dbfId','id','name','rarity','set','type','attack','health',
'race','durability','armor',
'multiClassGroup','text']

defaults = ['NULL','0','0','0',
'NULL','NULL','NULL','NULL','0','NULL','0','0',
'NULL','0','0',
'NULL','NULL']

def saveChangesString(dataList, filename):
  with io.open(filename, 'w', encoding='utf-8') as f:
    f.write(dataList)
    f.close()

def generateSQL(json_dict):
    count = 0
    endCount = 1
    records = ""
    finalState = ""
    print('\n'+str(len(json_dict))+' records will be processed\n')
    for i in json_dict:
        entry = "("
        jcount = 0
        for j in chosen_keys:
            if j in i.keys():
                if str(i.get(j)).isdigit() and j != 'dbfId':
                    entry = entry + str(i.get(j))
                else:
                    entry = entry + repr(str(i.get(j)))
            else:
                if str(defaults[jcount]).isdigit() and j != 'dbfId':
                    entry = entry + str(defaults[jcount])
                else:
                    entry = entry + repr(str(defaults[jcount]))
            if jcount != len(chosen_keys)-1:
                entry = entry+","
            jcount = jcount + 1
        entry = entry + ")"
        if count != len(json_dict)-1:
                entry = entry+","
        count = count + 1
        if endCount % 100 == 0 and endCount >= 100 and endCount < len(json_dict):
            print('processed records '+str(endCount - 99)+' - '+str(endCount))
            if endCount + 100 > len(json_dict):
                finalState = 'processed records '+str(endCount+1)+' - '+str(len(json_dict))
        if endCount == len(json_dict):
            print(finalState)
        records = records + entry
        endCount = endCount + 1
    saveChangesString(records,'export.txt')
    print('done')

with io.open('cards.collectible.sample.example.json', 'r', encoding='utf-8') as f:
    json_to_dict = json.load(f)
    f.close()

generateSQL(json_to_dict)

任何帮助都将不胜感激,因为我实际使用的json文件包含2000多个条目,所以我更希望避免手动编辑。谢谢您。
sql表结构代码也是:

-- phpMyAdmin SQL Dump
CREATE TABLE `card` (
  `pid` int(10) NOT NULL,
  `api_db_id` varchar(50) NOT NULL,
  `dbfid` varchar(50) NOT NULL,
  `name` varchar(50) NOT NULL,
  `cardset_pid` int(10) NOT NULL,
  `cardtype` varchar(50) NOT NULL,
  `rarity` varchar(20) NOT NULL,
  `cost` int(3) NOT NULL,
  `attack` int(10) NOT NULL,
  `health` int(10) NOT NULL,
  `artist` varchar(50) NOT NULL,
  `collectible` tinyint(1) NOT NULL,
  `class_pid` int(10) NOT NULL,
  `race` varchar(50) NOT NULL,
  `durability` int(10) NOT NULL,
  `armor` int(10) NOT NULL,
  `multiclassgroup` varchar(50) NOT NULL,
  `text` text NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

ALTER TABLE `card`
  ADD PRIMARY KEY (`pid`);

ALTER TABLE `card`
  MODIFY `pid` int(10) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=1;
COMMIT;
f5emj3cl

f5emj3cl1#

\xa0 是太空的变种。它是来自文字吗?
但是,更相关的是,它不是utf8;它是 latin1 或其他非utf8编码。你需要回到它的来源,把它改成utf8。
或者,如果下一步只是将其放入mysql表中,那么您需要告诉客户机的真相——即它是用拉丁文1(而不是utf8)编码的。一旦您完成了这一点,mysql将在测试期间为您处理转换 INSERT .

相关问题