用python自动化配置单元

pwuypxnk 于 2021-06-02 发布在 Hadoop

关注(0)|答案(3)|浏览(298)

我正在运行hive0.12，我想运行几个查询并将结果作为python数组返回。
例如：

result=[]
for col in columns:
  sql='select {c} as cat,count(*) as cnt from {t} group by {c} having cnt > 100;'.format(t=table,c=col)
  result.append(hive.query(sql))
result=dict(result)

我缺少的是 hive 类来运行sql查询。
如何做到这一点？

hadoop Hive python python-2.7

来源：https://stackoverflow.com/questions/29938124/automating-hive-with-python

3条答案

按热度按时间

osh3o9ms1#

你也可以使用节俭进入Hive。https://cwiki.apache.org/confluence/display/hive/hiveclient#hiveclient-Python。看起来pyhs2主要是一个直接使用节俭的 Package 器。

赞(0）回复(0）举报 2021-06-03

2uluyalo2#

一种快速而肮脏的方法是从命令行自动化配置单元

hive -e "sql command"

像这样的东西应该管用

def query(self,cmd):
    """Run a hive expression"""
    cmd='hive -e "'+cmd+'"';
    prc = subprocess.Popen(cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
    ret=stdout.split('\n')
    ret=[r for r in ret if len(r)]
    if (len(ret)==0):
         return []
    if (ret[0].find('\t')>0):
         return [[t.strip() for t in r.split('\t')] for r in ret]
    return ret

赞(0）回复(0）举报 2021-06-03

uurity8g3#

另一种方法是使用pyhs2库从python进程中打开到配置单元的本地连接。下面是我拼凑的一些示例代码，以测试不同的用例，但希望它能说明这个库的用法。


# Python 2.7

import pyhs2
from pyhs2.error import Pyhs2Exception

hql = "SELECT * FROM my_table"
with pyhs2.connect(
  host='localhost', port=10000, authMechanism="PLAIN", user="root" database="default"
  # Use your own credentials and connection info here of course
) as db:
  with db.cursor() as cursor:

    try:
      print "Trying default database"
      cursor.execute(hql)
      for row in cursor.fetch(): print row
    except Pyhs2Exception as error:
      print(str(error))

根据您的设备上已安装或未安装的内容，您可能还需要同时安装这两个设备的开发标头 libpython 以及 libsasl2 .

赞(0）回复(0）举报 2021-06-03

我来回答

用python自动化配置单元

3条答案

相关问题

热门标签

最新问答