我在pyspark dataframe上尝试将StringType转换为IntType时出错:
joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
.select(aggregates.year,'Production')\
.withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
.drop("Production")\
.withColumnRenamed("ProductionTmp", "Production")
我得到了:
TypeErrorTraceback(most recent call last)in()1 joint = aggregates.join(df_data_3,aggregates. year ==df_data_3.year)----> 2 joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production ').withColumn(“ProductionTmp”,df_data_3.Production.cast(IntegerType)).drop(“Production”)
.withColumnRenamed(“ProductionTmp”,“Production”)
/usr/local/src/spark20master/spark/python/pyspark/sql/column.py in cast(self,dataType)335 jc = self._jc.cast(jdt)336 else:--> 337 raise TypeError(“unexpected type:%s”% type(dataType))338 return Column(jc)339
TypeError:意外类型:
1条答案
按热度按时间5kgi1eie1#
PySpark SQL数据类型不再是(1.3之前是)单例。你需要创建一个示例:
与之相反:
cast
方法也可以使用字符串描述:有关Spark SQL和Dataframes中支持的数据类型的概述,可以单击此link。