MySQL -手动CSV导入-列数据类型错误

2izufjch  于 2022-12-15  发布在  Mysql
关注(0)|答案(1)|浏览(205)

我尝试在MySQL 8.0上使用CREATE TABLE手动将COVID-19数据集导入MySQL Workbench,然后将数据加载到文件中。
例如,对于每百万住院患者数等列,数据的位数(最多8位)和小数位数(最多3位)各不相同。
此外,其他列中的数据(如新病例)为负数(与以前的记录相比,新病例数开始下降)。
当我运行下面的CREATE TABLE查询时,表将填充所有列。然后,当我使用LOAD DATA INFILE导入CSV文件时,列数据类型“decimal”将不断给我错误,如Error 1366第1行的十进制值不正确。
所以我把它改为“float”,它给我错误1265数据在第1行被截断。现在我把它改为“double”,正如另一个堆栈溢出用户建议的那样,仍然得到错误1265数据在第1行被截断。
下面是CREATE TABLE查询:

CREATE TABLE covid_deaths (
    iso_code text,
    continent text,
    location text,
    date date,
    population bigint,
    total_cases int,
    new_cases int,
    new_cases_smoothed double,
    total_deaths int,
    new_deaths int,
    new_deaths_smoothed double,
    total_cases_per_million double,
    new_cases_per_million double,
    new_cases_smoothed_per_million double,
    total_deaths_per_million double,
    new_deaths_per_million double,
    new_deaths_smoothed_per_million double,
    reproduction_rate double,
    icu_patients int,
    icu_patients_per_million double,
    hosp_patients int,
    hosp_patients_per_million double,
    weekly_icu_admissions double,
    weekly_icu_admissions_per_million double,
    weekly_hosp_admissions double,
    weekly_hosp_admissions_per_million double
);

下面是LOAD DATA INFILE查询:

LOAD DATA INFILE 'CovidDeaths.csv' 
    INTO TABLE covid_deaths 
    FIELDS TERMINATED BY ',' 
    ENCLOSED BY '"'
    LINES TERMINATED BY '\n'
    IGNORE 1 ROWS;

CSV的前3列和前3行(26列和85,171行):

iso_code, continent, location, date, population, total_cases
    AFG, Asia, Afghanistan, 2020-02-24, 38928341, 1
    AFG, Asia, Afghanistan, 2020-02-25, 38928341, 1

其他信息:
我试着用数据类型“varchar”导入上面所有的“double”列,它工作。但是,当我使用ORDER BY DESC时,它会给予我例如99.4后跟887.2,因为它是按字符(9在8之前)而不是数值(887.2大于99.4)降序排序。
所以我需要一个numeric数据类型而不是varchar,以便分析数据,但是我只是提到当我使用“varchar”导入时,所有85,171行都被完美地导入了,并且我还能够查询数据。
我的CSV文件已经在工作目录中。
使用表数据导入向导完成时,0行受影响(即,除了列标题外,没有导入任何数据)。这就是我使用CREATE TABLE then LOAD DATA INFILE路由的原因。
我的数据有一个标题行(列标题),所以我想这也可能是一个问题。我包括忽略1行,但它没有帮助。我还从CSV文件中删除了标题行,只留下数据。这也没有帮助。
secure-file-priv已关闭,如果需要,我有权编辑my.ini。
我是新的堆栈溢出,我做了一些研究,但我道歉,如果我的问题仍然没有达到标准。请让我知道,如果任何额外的信息是必要的。谢谢你,并有一个愉快的一天。

91zkwejq

91zkwejq1#

我的第一个观察结果是发布的表没有包含文件中的所有列(至少从今天的版本开始)--它需要包含。第二,文件提供程序没有告诉您要使用的数据类型,这有点顽皮。我将下载xslx文件并检查它以确定适当的数据类型。
如果你不想要数据中的所有列,你可以通过将它们推到用户定义的变量中来丢弃它们。这也是一种查找和/或转换读取列的有用方法,请参见输入预处理https://dev.mysql.com/doc/refman/8.0/en/load-data.html一节。
下面的代码加载了我要求的列

DROP TABle if exists covid_deaths;
CREATE TABLE covid_deaths (
iso_code    varchar(20),
continent   varchar(20),
location    varchar(50),
date    varchar(20),
total_cases varchar(20),
new_cases   varchar(20),
new_cases_smoothed  varchar(20),
total_deaths    varchar(20),
new_deaths  varchar(20),
new_deaths_smoothed varchar(20),
total_cases_per_million varchar(20),
new_cases_per_million   varchar(20),
new_cases_smoothed_per_million  varchar(20),
total_deaths_per_million    varchar(20),
new_deaths_per_million  varchar(20),
new_deaths_smoothed_per_million varchar(20),
reproduction_rate   varchar(20),
icu_patients    varchar(20),
icu_patients_per_million    varchar(20),
hosp_patients   varchar(20),
hosp_patients_per_million   varchar(20),
weekly_icu_admissions   varchar(20),
weekly_icu_admissions_per_million   varchar(20),
weekly_hosp_admissions  varchar(20),
weekly_hosp_admissions_per_million  varchar(20),
total_tests varchar(20),
new_tests   varchar(20),
total_tests_per_thousand    varchar(20),
new_tests_per_thousand  varchar(20),
new_tests_smoothed  varchar(20),
new_tests_smoothed_per_thousand varchar(20),
positive_rate   varchar(20),
tests_per_case  varchar(20),
tests_units varchar(20),
total_vaccinations  varchar(20),
people_vaccinated   varchar(20),
people_fully_vaccinated varchar(20),
total_boosters  varchar(20),
new_vaccinations    varchar(20),
new_vaccinations_smoothed   varchar(20),
total_vaccinations_per_hundred  varchar(20),
people_vaccinated_per_hundred   varchar(20),
people_fully_vaccinated_per_hundred varchar(20),
total_boosters_per_hundred  varchar(20),
new_vaccinations_smoothed_per_million   varchar(20),
new_people_vaccinated_smoothed  varchar(20),
new_people_vaccinated_smoothed_per_hundred  varchar(20),
stringency_index    varchar(20),
population_density  varchar(20),
median_age  varchar(20),
aged_65_older   varchar(20),
aged_70_older   varchar(20),
gdp_per_capita  varchar(20),
extreme_poverty varchar(20),
cardiovasc_death_rate   varchar(20),
diabetes_prevalence varchar(20),
female_smokers  varchar(20),
male_smokers    varchar(20),
handwashing_facilities  varchar(20),
hospital_beds_per_thousand  varchar(20),
life_expectancy varchar(20),
human_development_index varchar(20),
population  varchar(20),
excess_mortality_cumulative_absolute    varchar(20),
excess_mortality_cumulative varchar(20),
excess_mortality    varchar(20),
excess_mortality_cumulative_per_million varchar(20)
);

LOAD DATA INFILE 'C:\\Program Files\\MariaDB 10.4\\data\\sandbox\\owid-covid-data.csv' 
    INTO TABLE covid_deaths 
    FIELDS TERMINATED BY ',' 
    ENCLOSED BY '"'
    LINES TERMINATED BY '\n'
    IGNORE 1 ROWS
     (iso_code,
    continent,
    location,
    date,
    population ,
    total_cases,
     @new_cases ,
    @new_cases_smoothed ,
    @total_deaths ,
    @new_deaths ,
    @new_deaths_smoothed ,
    @total_cases_per_million ,
    @new_cases_per_million ,
    @new_cases_smoothed_per_million ,
    @total_deaths_per_million ,
    @new_deaths_per_million ,
    @new_deaths_smoothed_per_million ,
    @reproduction_rate ,
    @icu_patients ,
    @icu_patients_per_million ,
    @hosp_patients ,
    @hosp_patients_per_million ,
    @weekly_icu_admissions ,
    @weekly_icu_admissions_per_million ,
    @weekly_hosp_admissions ,
    @weekly_hosp_admissions_per_million ,
    @total_tests    ,
    @new_tests  ,
    @total_tests_per_thousand   ,
    @new_tests_per_thousand ,
    @new_tests_smoothed ,
    @new_tests_smoothed_per_thousand    ,
    @positive_rate  ,
    @tests_per_case ,
    @tests_units    ,
    @total_vaccinations ,
    @people_vaccinated  ,
    @people_fully_vaccinated    ,
    @total_boosters ,
    @new_vaccinations   ,
    @new_vaccinations_smoothed  ,
    @total_vaccinations_per_hundred ,
    @people_vaccinated_per_hundred  ,
    @people_fully_vaccinated_per_hundred    ,
    @total_boosters_per_hundred ,
    @new_vaccinations_smoothed_per_million  ,
    @new_people_vaccinated_smoothed ,
    @new_people_vaccinated_smoothed_per_hundred ,
    @stringency_index   ,
    @population_density ,
    @median_age ,
    @aged_65_older  ,
    @aged_70_older  ,
    @gdp_per_capita ,
    @extreme_poverty    ,
    @cardiovasc_death_rate  ,
    @diabetes_prevalence    ,
    @female_smokers ,
    @male_smokers   ,
    @handwashing_facilities ,
    @hospital_beds_per_thousand ,
    @life_expectancy    ,
    @human_development_index    ,
    @population ,
    @excess_mortality_cumulative_absolute   ,
    @excess_mortality_cumulative    ,
    @excess_mortality   ,
    @excess_mortality_cumulative_per_million    
     );
    

select count(*) from covid_deaths;
+----------+
| count(*) |
+----------+
|   242447 |
+----------+
1 row in set (0.339 sec)

相关问题