我正在使用一个sql炼金术会话在azuredatabricks中的数据库上运行sql查询。我的查询包含一个用户定义的函数,但是当我运行查询时它会返回
infomessages=[“*org.apache.hive.service.cli.hivesqlexception:error running 查询:org.apache.spark.sql.analysisexception:未定义function:my_method
下面的例子
class DatabaseQuery(DatabaseLibrary):
def __init__(self):
self.table_model = None
self.column_names = []
self.conn = None
self.meta = None
self.engine = None
self.session = None
self.query_list = []
def connect_database(self, region, token, database, http_path):
try:
dbfs_engine = create_engine(
"databricks+pyhive://token:"
+ token
+ "@"
+ region
+ "xxxxxx/"
+ database,
connect_args={"http_path": http_path},
echo=True,
)
self._set_metadata_databricks(dbfs_engine)
Session = sessionmaker(bind=dbfs_engine)
self.session = Session()
self.engine = dbfs_engine
self.conn = dbfs_engine.connect()
except Exception as e:
traceback.print_exc()
raise
def my_method(name):
return Upper(name)
query =("select my_method(names.NAME) from db.names").fetchall()
result = self.session.execute(query).fetchall()
使用pyspark,我可以简单地通过向spark对象注册udf来做到这一点
convert_maximo_date = udf(common.convert_maximo_date)
self.spark.udf.register("convert_maximo_date", convert_maximo_date)
这样做是否可能与sql alchemy连接类似,以便可以执行带有用户定义函数的查询?
暂无答案!
目前还没有任何答案,快来回答吧!