意外类型:< class 'pyspark.sql.types. DataTypeSingleton'>在ApacheSpark数据框架上转换为Int时

emeijp43  于 2023-06-21  发布在  Spark
关注(0)|答案(1)|浏览(104)

我在pyspark dataframe上尝试将StringType转换为IntType时出错:

joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
    .select(aggregates.year,'Production')\
    .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
    .drop("Production")\
    .withColumnRenamed("ProductionTmp", "Production")

我得到了:
TypeErrorTraceback(most recent call last)in()1 joint = aggregates.join(df_data_3,aggregates. year ==df_data_3.year)----> 2 joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production ').withColumn(“ProductionTmp”,df_data_3.Production.cast(IntegerType)).drop(“Production”)
.withColumnRenamed(“ProductionTmp”,“Production”)
/usr/local/src/spark20master/spark/python/pyspark/sql/column.py in cast(self,dataType)335 jc = self._jc.cast(jdt)336 else:--> 337 raise TypeError(“unexpected type:%s”% type(dataType))338 return Column(jc)339
TypeError:意外类型:

5kgi1eie

5kgi1eie1#

PySpark SQL数据类型不再是(1.3之前是)单例。你需要创建一个示例:

from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col

col("foo").cast(IntegerType())
Column<b'CAST(foo AS INT)'>

与之相反:

col("foo").cast(IntegerType)
TypeError  
   ...
TypeError: unexpected type: <class 'type'>

cast方法也可以使用字符串描述:

col("foo").cast("integer")
Column<b'CAST(foo AS INT)'>

有关Spark SQL和Dataframes中支持的数据类型的概述,可以单击此link

相关问题