我在配置单元中创建了一个表,并从外部csv文件加载了数据。当我试图从python打印数据时,会得到类似“['\x00”\x00m\x00e\x00s\x00s\x00a\x00g\x00e\x00“\x00']”的输出。当我查询hivegui时,结果是正确的。请告诉我如何通过python程序得到相同的结果。
我的python代码:
import pyhs2
with pyhs2.connect(host='192.168.56.101',
port=10000,
authMechanism='PLAIN',
user='hiveuser',
password='password',
database='anuvrat') as conn:
with conn.cursor() as cur:
cur.execute('SELECT message FROM ABC_NEWS LIMIT 5')
print cur.fetchone()
输出为:
/usr/bin/python2.7 /home/anuvrattiku/SPRING_2017/CMPE239/Facebook_Fake_news_detection/code_fake_news/code.py
['\x00"\x00m\x00e\x00s\x00s\x00a\x00g\x00e\x00"\x00']
Process finished with exit code 0
在配置单元中查询同一个表时,会得到以下输出:
我就是这样创建表的:
CREATE TABLE ABC_NEWS(
ID STRING,
PAGE_ID INT,
NAME STRING,
MESSAGE STRING,
DESCRIPTION STRING,
CAPTION STRING,
POST_TYPE STRING,
STATUS_TYPE STRING,
LIKES_COUNT SMALLINT,
COMMENTS SMALLINT,
SHARES_COUNT SMALLINT,
LOVE_COUNT SMALLINT,
WOW_COUNT SMALLINT,
HAHA_COUNT SMALLINT,
SAD_COUNT SMALLINT,
THANKFUL_COUNT SMALLINT,
ANGRY_COUNT SMALLINT,
LINK STRING,
IMAGE_LINK STRING,
POSTED_AT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';
用于加载表的csv文件位于以下路径中:https://www.dropbox.com/s/fiwygyqt8u9eo5s/-news-86680728811.csv?dl=0
1条答案
按热度按时间tkqqtvp11#
既然文本是限定的(
"
)在限定文本中出现分隔符(,
),您应该使用csv serde你在试着打印
cur.fetchone()
它是一个列表而不是一个字符串,因此得到了一个字节数组,而您应该打印列表的第一个元素-cur.fetchone()[0]
```create external table abc_news
(
id string
,page_id int
,name string
,message string
,description string
,caption string
,post_type string
,status_type string
,likes_count smallint
,comments smallint
,shares_count smallint
,love_count smallint
,wow_count smallint
,haha_count smallint
,sad_count smallint
,thankful_count smallint
,angry_count smallint
,link string
,image_link string
,posted_at string
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties
(
'separatorChar' = ','
,'quoteChar' = '"'
)
stored as textfile
;