如何在ipython笔记本中为配置单元查询设置最大分区

x6h2sr28 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(443)

我正在用ipython笔记本写一个脚本。

import pandas as pd
import pyhs2
import os
import datetime

q1= "set hive.query.max.partition = 3000 ; 
select 'Device_id' as key,
'All Time' as type,
count(distinct a.dev_id) as count
from (select distinct dev_id from DevID
where dev_type = '*****' 
union all 
    select distinct
    key_value_lookup(raw_url, '*****',  '&', '=') as dev_id
    from actions 
    where raw_url like '%*****%'
    and raw_url like '%*****%' 
    and data_date >= '20150901' and data_date <= '20151231') a"

def read_hive(query):
conn = pyhs2.connect(host='*****',
                   port=*****,
                   authMechanism="*****",
                   user='*****',
                   password='*****',
                   database='*****')
cur = conn.cursor()
cur.execute(query)
    #Return column info from query
if cur.getSchema() is None:
    cur.close()
    conn.close()
    return Nonea

columnNames = [a['columnName'] for a in  cur.getSchema()] 
print columnNames
columnNamesStrings = [a['columnName'] for a in  cur.getSchema() if a['type']=='STRING_TYPE'] 
output =  pd.DataFrame(cur.fetch(),columns=columnNames)   

cur.close()
conn.close()
return output

打电话的时候 read_hive(q1) ，我收到以下错误：
失败，因为hive.query.max.partition需要int值
我认为这是因为我将查询存储在一个字符串中，但不完全确定。这个查询在hue中运行得非常好。
有没有人能凭直觉找到改变分区最大数量的最佳方法？这能在我的职责范围内完成吗？

hadoop Hive python ipython ipython-notebook

来源：https://stackoverflow.com/questions/34666732/how-set-max-partitions-for-hive-query-in-ipython-notebook

1条答案

按热度按时间

0ejtzxu11#

配置单元配置设置应该作为字典传递给pyhs2连接对象，而不是作为要执行的查询字符串的一部分。
就你而言：

conn = pyhs2.connect(host='*****',
               port=*****,
               authMechanism="*****",
               user='*****',
               password='*****',
               database='*****',
               configuration={'hive.query.max.partition': '3000'})

赞(0）回复(0）举报 2021-05-30

我来回答

如何在ipython笔记本中为配置单元查询设置最大分区

1条答案

相关问题

热门标签

最新问答