我的任务是列出名为"trial_db"的数据库中每个表的所有列名
我能够从以下代码中获取所有表名,但我尝试包括列名
AnalysisException: Table or view not found: countrycurrency_csv; line 1 pos 9;
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
<command-1607350> in <module>
16 # For each table, get list of columns
17 for table in tables:
---> 18 columns = [c.name for c in spark.sql(f"describe {table}").collect()]
19 # Create a dataframe with database, table, and columns information
20 df = spark.createDataFrame([(db.databaseName, table, columns)], schema=['database', 'table', 'columns'])
/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery)
775 [Row(f1=1, f2='row1'), Row(f1=2, f2='row2'), Row(f1=3, f2='row3')]
776 """
--> 777 return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
778
779 def table(self, tableName):
/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
#Creating an empty DF (This is kind of an Hack...)
tbl_df = spark.sql("show tables in trial_db like 'xxx'")
#Loop through all databases
for db in spark.sql("show databases like 'trial_db'").collect():
#create a dataframe with list of tables from the database
df = spark.sql(f"show tables in {db.databaseName}")
#union the tables list dataframe with main dataframe
tbl_df = tbl_df.union(df)
#After the loop, show the results
tbl_df.show()
我希望在每个表中包含列名。
我尝试了以下操作,但出现错误:
def create_df(table_df):
table_df= table_df.withColumn("database", df.database.cast(String))
table_df= table_df.withColumn("table", df.table.cast(String))
table_df= table_df.withColumn("columns", df.columns.cast(string))
return table_df
# Loop through all databases
for db in spark.sql("show databases like 'trial_db'").collect():
# Get list of tables in the database
tables = spark.sql(f"show tables in {db.databaseName}").rdd.map(lambda row: row.tableName).collect()
# For each table, get list of columns
for table in tables:
columns = [c.name for c in spark.sql(f"describe {table}").collect()]
# Create a dataframe with database, table, and columns information
df = spark.createDataFrame([(db.databaseName, table, columns)], schema=['database', 'table', 'columns'])
# Union the dataframe with main dataframe
tbl_df = tbl_df.union(df)
# After the loop, show the results
tbl_df.show()
1条答案
按热度按时间ogq8wdun1#
您离解决方案不远了。您遇到了一个错误,因为您没有指定要在哪个数据库上使用
DESCRIBE
查询表。此外,您的第一个 hack 查询提供了以下架构:而目标架构为:
因为列中的列将是一个字符串数组,这些字符串是列名称。
我的解决方案是:
下面是一个使用公共“databricks”数据库的结果示例。
此外,您没有使用create_df定义的函数。