我有一个如下模式的Dataframe。
df.show()
circleCenter circleInfo biggerCircleCenter biggerCircleInfo smallCircleCenter smallCircleInfo
5.3. [4.2, 1.1]. 2.3. [5.3, 3.1]. 6.3. [44.2, 1.1]
. . . . . .
. . . . . .
. . . . . .
|-- circleCenter: double (nullable = true)
|-- circleInfo: array (nullable = true)
| |-- element: double (containsNull = true)
|-- biggerCircleCenter: double (nullable = true)
|-- biggerCircleInfo: array (nullable = true)
| |-- element: double (containsNull = true)
|-- smallCircleCenter: double (nullable = true)
|-- smallCircleInfo: array (nullable = true)
| |-- element: double (containsNull = true)
我想创建一个包含以下结构的新列:
{
name: 'circle'
center: 5.3
info: [4.2, 1.1]
}
name: 'biggerCircle'
center: 2.3
info: [5.3, 3.1]
{
name: 'smallCricle'
center: 6.3
info: [44.2, 1.1]
}
基本上,以某种方式剥离列的第一部分并将其用作名称,每个列名称的其余剥离部分应用作每个值的键。
到目前为止,我已经尝试了以下方法,但没有成功:从itertools导入链
相关\u名称=['circlecenter','biggercirclecenter','smallcirclecenter']
final_df = sql_fn.create_map(list(chain(*(
(lit('name'),
co(name.partition(name)),
col(name.partition(name)[2]+'Center'),
col(name.partition(name)[2]+'Info')) for name in
df.columns if name in relevant_names
)))).alias("cirlceInformation")
暂无答案!
目前还没有任何答案,快来回答吧!