I've created some hive tables using a JDBC in a python notebook on Databricks. This was on Data Science and Engineering UI. I'm able to query the tables in a Databricks Notebook and user direct SQL with the magic command %
When switching to Databricks SQL UI, I'm still able to see the tables in Hive metastore explorer. However I'm not able to read the data. A very clear message says that only csv, parquet and so are supported.
Even though, I found this surprising, since I can use the data on DS and Engineering UI why it's not the case on Databricks SQL? Is there any solution to overcome that?
2条答案
按热度按时间flvlnr441#
Update Nov 2023: Databricks SQL now supports query federation that allows to connect to different databases
Yes, it's a known limitation that Databricks SQL right now supports only file-based formats. As I remember it's related to a security model, plus the fact that DBSQL is using Photon under the hood where JDBC integration could be not so performant. You may reach your solution architect or customer success engineer to get information on if it will be supported in the future.
The current workaround would be only to have a job that will periodically read all data from database via JDBC and dump into Delta table - it could be even more performant compared to JDBC, the only issue is the freshness of data.
tf7tbtn22#
You can import a Hive table from cloud storage into Databricks using an external table and query it using Databricks SQL.
Step 1: Show the CREATE TABLE statement
Issue a
SHOW CREATE TABLE <tablename>
command on your Hive command line to see the statement that created the table.Refer below example:
Step 2: Issue a CREATE EXTERNAL TABLE statement
If the statement that is returned uses a CREATE TABLE command, copy the statement and replace CREATE TABLE with CREATE EXTERNAL TABLE.
DROP TABLE wikicc
Step 3: Issue SQL commands on your data
SELECT * FROM wikicc
Source: https://docs.databricks.com/data/data-sources/hive-tables.html