Databricks SQL doesn't seem to support SQL Server

pbpqsu0x  于 12个月前  发布在  SQL Server
关注(0)|答案(2)|浏览(138)

I've created some hive tables using a JDBC in a python notebook on Databricks. This was on Data Science and Engineering UI. I'm able to query the tables in a Databricks Notebook and user direct SQL with the magic command %

When switching to Databricks SQL UI, I'm still able to see the tables in Hive metastore explorer. However I'm not able to read the data. A very clear message says that only csv, parquet and so are supported.

Even though, I found this surprising, since I can use the data on DS and Engineering UI why it's not the case on Databricks SQL? Is there any solution to overcome that?

flvlnr44

flvlnr441#

Update Nov 2023: Databricks SQL now supports query federation that allows to connect to different databases

Yes, it's a known limitation that Databricks SQL right now supports only file-based formats. As I remember it's related to a security model, plus the fact that DBSQL is using Photon under the hood where JDBC integration could be not so performant. You may reach your solution architect or customer success engineer to get information on if it will be supported in the future.

The current workaround would be only to have a job that will periodically read all data from database via JDBC and dump into Delta table - it could be even more performant compared to JDBC, the only issue is the freshness of data.

tf7tbtn2

tf7tbtn22#

You can import a Hive table from cloud storage into Databricks using an external table and query it using Databricks SQL.

Step 1: Show the CREATE TABLE statement

Issue a SHOW CREATE TABLE <tablename> command on your Hive command line to see the statement that created the table.

Refer below example:

hive> SHOW CREATE TABLE wikicc;
OK
CREATE  TABLE `wikicc`(
  `country` string,
  `count` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/user/hive/warehouse/wikicc'
TBLPROPERTIES (
  'totalSize'='2335',
  'numRows'='240',
  'rawDataSize'='2095',
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='1',
  'transient_lastDdlTime'='1418173653')

Step 2: Issue a CREATE EXTERNAL TABLE statement

If the statement that is returned uses a CREATE TABLE command, copy the statement and replace CREATE TABLE with CREATE EXTERNAL TABLE.

  • EXTERNAL ensures that Spark SQL does not delete your data if you drop the table.
  • You can omit the TBLPROPERTIES field.

DROP TABLE wikicc

CREATE EXTERNAL TABLE `wikicc`(
  `country` string,
  `count` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/user/hive/warehouse/wikicc'

Step 3: Issue SQL commands on your data

SELECT * FROM wikicc

Source: https://docs.databricks.com/data/data-sources/hive-tables.html

相关问题