pyspark

hi3rlvi2 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(264)

我正在将csv文件读入sparkDataframe。csv在许多列中有空格“”，我想删除这些空格。csv中有500列，因此我无法在代码中手动指定列
样本数据：

ADVANCE_TYPE  CHNG_DT    BU_IN
     A          20190718    1
                20190728    2 
                20190714     
     B          20190705     
                20190724    4

代码：

from pyspark.sql.functions import col,when,regexp_replace,trim

    df_csv = spark.read.options(header='true').options(delimiter=',').options(inferSchema='true').options(nullValue="None").csv("test41.csv")  

    for col_name in df_csv.columns:
       df_csv = df_csv.select(trim(col(col_name)))

但这些代码并没有删除空的空格。请帮帮我！

apache-spark pyspark apache-spark-sql

来源：https://stackoverflow.com/questions/61755745/error-while-removing-empty-spaces-in-spark-dataframe-pyspark

1条答案

按热度按时间

b4wnujal1#

可以使用列表理解对所有必需的列应用trim。 Example: ```
df=spark.createDataFrame([(" ","12343"," ","9 "," 0")])

finding length of each column

expr=[length(col(col_name)).name('length'+ col_name) for col_name in df.columns]

df.select(expr).show()

+--------+--------+--------+--------+--------+

|length_1|length_2|length_3|length_4|length_5|

+--------+--------+--------+--------+--------+

| 3| 5| 3| 3| 4|

+--------+--------+--------+--------+--------+

trim on all the df columns

expr=[trim(col(col_name)).name(col_name) for col_name in df.columns]

df1=df.select(expr)
df1.show()

+---+-----+---+---+---+

| _1| _2| _3| _4| _5|

+---+-----+---+---+---+

| |12343| | 9| 0|

+---+-----+---+---+---+

length on df1 columns

expr=[length(col(col_name)).name('length'+ col_name) for col_name in df.columns]
df1.select(expr).show()

+--------+--------+--------+--------+--------+

|length_1|length_2|length_3|length_4|length_5|

+--------+--------+--------+--------+--------+

| 0| 5| 0| 1| 1|

+--------+--------+--------+--------+--------+

赞(0）回复(0）举报 2021-05-27

我来回答

pyspark

1条答案

finding length of each column

+--------+--------+--------+--------+--------+

|length_1|length_2|length_3|length_4|length_5|

+--------+--------+--------+--------+--------+

| 3| 5| 3| 3| 4|

+--------+--------+--------+--------+--------+

trim on all the df columns

+---+-----+---+---+---+

| _1| _2| _3| _4| _5|

+---+-----+---+---+---+

| |12343| | 9| 0|

+---+-----+---+---+---+

length on df1 columns

+--------+--------+--------+--------+--------+

|length_1|length_2|length_3|length_4|length_5|

+--------+--------+--------+--------+--------+

| 0| 5| 0| 1| 1|

+--------+--------+--------+--------+--------+

相关问题

热门标签

最新问答