cloudera impala中的多查询执行

icnyk63a  于 2021-06-26  发布在  Impala
关注(0)|答案(4)|浏览(519)

在impala中是否可以同时执行多个查询?如果是, Impala 是怎么处理的?

hts6caw3

hts6caw31#

impala可以同时执行多个查询,只要它没有达到内存上限。

uhry853o

uhry853o2#

我当然会自己做一些测试,但我无法执行多个查询:我使用impala连接,并从.sql文件读取查询。这适用于单个命令。

from impala.dbapi import connect

# actual server and port changed for this post for security

conn=connect(host='impala server', port=11111,auth_mechanism="GSSAPI")
cursor = conn.cursor()
cursor.execute((open("sandbox/z_temp.sql").read()))

这是我收到的错误。

HiveServer2Error: AnalysisException: Syntax error in line 2:

这就是sql在.sql文件中的样子。

Select * FROM database1.table1;
Select * FROM database1.table2;

我能够在单独的.sql文件中运行多个sql命令,这些文件在指定文件夹中迭代所有的.sql文件。


# Create list of file names for recon .sql files this will be sorted

# Numbers at begining of filename are important to sort so that files will be executed in correct order

file_names = glob.glob('folder/.sql')

asc_names = sorted(file_names, reverse = False)
filename = ""
for file_name in asc_names:
  str_filename = str(file_name)
  print(filename)
  query = (open(str_filename).read())

  cursor = conn.cursor()

# creates an error log dataframe to print, or write to file at end of job.

  try:

# Each SQL command must be executed seperately

    cursor.execute(query)
    df_id= pd.DataFrame([{'test_name': str_filename[-40:], 'test_status': 'PASS'}])
    df_log = df_log.append(df_id, ignore_index=True)

  except:
    df_id= pd.DataFrame([{'test_name': str_filename[-40:], 'test_status': 'FAIL'}])
    df_log = df_log.append(df_id, ignore_index=True)
    continue

另一种方法是将一个.sql文件中的所有sql语句用;然后循环通过.sql文件将语句拆分出来;一次运行一个。

from impala.dbapi import connect
from impala.util import as_pandas

conn=connect(host='impalaserver', port=11111, auth_mechanism='GSSAPI')
cursor = conn.cursor()

# split SQL statements from one file seperated by ';', Note: last command will not have semicolon at end.

sql_file = open("sandbox/temp.sql").read()
sql = sql_file.split(';')
for cmd in sql:

# This gets rid of the non printing characters you may have

      cmd = cmd.replace('/r','')
      cmd = cmd.replace('/n','')

# This runs your SQL commands one at a time.

      cursor.execute(cmd)
  print(cmd)
hrirmatl

hrirmatl3#

你可以发出如下命令 impala-shell -f <<file_name>> ,其中文件有多个查询每个完整的查询用分号(;)分隔

g2ieeal7

g2ieeal74#

如果您是一个python极客,您甚至可以尝试使用impyla包来创建多个连接并一次运行所有查询。 pip install impyla

相关问题