当我尝试使用delta.io库在AWS Glue中创建DeltaTable时,我收到此错误:
Execution error : field item_size: ArrayType(DoubleType,true) can not accept object array([19. , 5. , 6.5]) in type <class 'numpy.ndarray'>
下面是我的一些代码:
df = wr.athena.read_sql_query(sql=query, database="db_name")
registration_schema = --more code--
StructField("item_size",DoubleType(),nullable=True)
--more code--
df_delta = spark.createDataFrame(df,schema=registration_schema)
df包含:
item_id item_size ... acquiredate acquiredate_tz
74 3041 [7.0, 5.0, 5.0] ... 2022-10-25 10:30:15.974 2022-10-25 12:30:15
152 3142 [19.0, 7.0, 5.0] ... 2022-10-25 10:29:47.985 2022-10-25 12:29:47
154 2696 [31.0, 2.5, 10.0] ... 2022-10-25 10:29:50.838 2022-10-25 12:29:50
158 2198 [22.1, 6.1, 6.1] ... 2022-10-25 10:29:54.353 2022-10-25 12:29:54
251 2593 [4.0, 15.0, 20.0] ... 2022-10-25 10:29:51.636 2022-10-25 12:29:51
df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12069 entries, 0 to 12068
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 item_id 12068 non-null Int64
1 item_size 324 non-null object
2 box_size 178 non-null object
我该如何解决这个问题?
1条答案
按热度按时间6kkfgxo01#
我不是PandasMaven,但看起来你想把一个数组放在一个标量类型中。
所以我觉得你需要改变
到