我有下面模式的Dataframe。我想所有的列包括嵌套字段都应该按字母顺序排序。我想把它放在scala spark里。
root
|-- metadata2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- attribute2: string (nullable = true)
| | |-- attribute1: string (nullable = true)
|-- metadata3: string (nullable = true)
|-- metadata1: struct (containsNull = true)
| |-- attribute2: string (nullable = true)
| |-- attribute1: string (nullable = true)
当我使用schema.sortby(\ u.name)排序时,我会在schema下面(嵌套的数组和结构类型字段没有排序)
root
|-- metadata1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- attribute2: string (nullable = true)
| | |-- attribute1: string (nullable = true)
|-- metadata2: struct (containsNull = true)
| |-- attribute2: string (nullable = true)
| |-- attribute1: string (nullable = true)
|-- metadata3: string (nullable = true)
我想要的模式如下(甚至metadata1(arraytype)和metadata2(structtype)中的列也应该排序)
root
|-- metadata1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- attribute1: string (nullable = true)
| | |-- attribute2: string (nullable = true)
|-- metadata2: struct (containsNull = true)
| |-- attribute1: string (nullable = true)
| |-- attribute2: string (nullable = true)
|-- metadata3: string (nullable = true)
提前谢谢。
1条答案
按热度按时间7lrncoxx1#
结构类型的版本: