python—在将表头转换为列之后，需要获取每个指定列的最大值

798qvoo8 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(341)

下面的问题陈述我需要一个指针/线索
问题陈述：我需要将所有的表头转换成列（col\u name）并获得所有这些列的最大值，我尝试了下面的逻辑，但被卡住了任何建议/想法都会很有帮助。


**from pyspark.sql import Row

    from pyspark.sql.types import *
    from pyspark.sql.functions import col,lit,max
    df = sc.parallelize([ \
        Row(name='Alice', age=5, height=80), \
        Row(name='Mujkesh', age=10, height=90), \
        Row(name='Ganesh', age=15, height=100)]).toDF().createOrReplaceTempView("Test")
    df3 = spark.sql("Describe Test" )
    df4= df3.withColumn("Max_val",max(col(age))).show()
    given input :
    +---+------+-------+
    |age|height|   name|
    +---+------+-------+
    |  5|    80|  Alice|
    | 10|    90|Mujkesh|
    | 15|   100| Ganesh|
    +---+------+-------+
    expected output:
    +--------+---------+-------+-------+
    |col_name|data_type|comment|Max_val|
    +--------+---------+-------+-------+
    |     age|   bigint|   null|     15|
    |  height|   bigint|   null|    100|
    |    name|   string|   null|   null|
    +--------+---------+-------+-------+**

python apache-spark pyspark pyspark-dataframes python-3.x

来源：https://stackoverflow.com/questions/62734945/need-to-get-max-value-for-each-specified-column-after-converting-table-header-in

1条答案

按热度按时间

sczxawaw1#

尝试 stack 函数，然后按分组以获取组的最大值。
然后加入descDataframe。 Example: ```
from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.sql.functions import *
df = sc.parallelize([
Row(name='Alice', age=5, height=80),
Row(name='Mujkesh', age=10, height=90),
Row(name='Ganesh', age=15, height=100)]).toDF()
df.createOrReplaceTempView("Test")
df3 = spark.sql("desc Test" )
df4=df.selectExpr("stack(3,'name',bigint(name),'age',age,'height',height) as (col_name,data)").groupBy(col("col_name")).agg(max(col("data")).alias("Max_val"))
df5=df3.join(df4,['col_name'],'inner').orderBy("col_name")
df5.show()

+--------+---------+-------+-------+

|col_name|data_type|comment|Max_val|

+--------+---------+-------+-------+

| age| bigint| null| 15|

| height| bigint| null| 100|

| name| string| null| null|

+--------+---------+-------+-------+

赞(0）回复(0）举报 2021-05-27

我来回答

python—在将表头转换为列之后，需要获取每个指定列的最大值

1条答案

+--------+---------+-------+-------+

|col_name|data_type|comment|Max_val|

+--------+---------+-------+-------+

| age| bigint| null| 15|

| height| bigint| null| 100|

| name| string| null| null|

+--------+---------+-------+-------+

相关问题

热门标签

最新问答