我正在尝试从apachedrill查询hdfs文件系统。我已经成功地能够查询配置单元表,csv文件,但部分文件不工作。
hadoop fs -cat BANK_FINAL/2015-11-02/part-r-00000 | head -1
给出结果:
028 | s80306432 | 2015-11-02 | brn clg chq支付给银岩班德拉合作社| 485 |区域序列号[485]| l | i |马哈拉施特拉邦合作银行有限公司| 3320.0 |汇入clg | d11528 | sbprm
select * from dfs.`/user/ituser1/e.csv` limit 10
工作正常,效果良好。
但当我尝试查询时
select * from dfs.`/user/ituser1/BANK_FINAL/2015-11-02/part-r-00000` limit 10
给出错误:
org.apache.drill.common.exceptions.userremoteexception:验证错误:从第1行第15列到第1行第17列:未找到表“dfs./user/ituser1/bank\u final/2015-11-02/part-r-00000”[错误id:6f80392a-51af-4b61-94d8-335b33b0048c on genome-dev13]。axs:31010]
apache drill dfs存储插件json如下:
{
"type": "file",
"enabled": true,
"connection": "hdfs://10.9.1.33:8020/",
"workspaces": {
"root": {
"location": "/",
"writable": true,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"psv"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json"
},
"avro": {
"type": "avro"
},
"sequencefile": {
"type": "sequencefile",
"extensions": [
"seq"
]
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
}
}
1条答案
按热度按时间rpppsulh1#
drill使用文件扩展名来确定文件类型,而parquet文件则尝试从文件中读取一个幻数。在您的例子中,您需要定义“defaultinputformat”,以指示默认情况下,没有扩展名的任何文件都是csv文件。您可以在此处找到更多信息:
https://drill.apache.org/docs/drill-default-input-format/