pyspark创建字符串类型的测试数据

3df52oht 于 2021-07-13 发布在 Spark

关注(0)|答案(2)|浏览(304)

我正在尝试创建测试Dataframe，其中一列为int，另一列为string类型。输出如下。我想我们可以用

data = spark.range(1, 5)
output = dataset.withColumnRenamed('id','myid')

我们如何处理那个字符串列？非常感谢你的帮助！
预期产量：

id.     ordernum
       1       0032
       2       0033
       3       0034
       4       0035
       5       0036

python DataFrame Dataset apache-spark pyspark

来源：https://stackoverflow.com/questions/66329474/pyspark-create-testing-data-with-string-type

2条答案

按热度按时间

xeufq47z1#

您可以从列表列表创建sparkDataframe。举个例子：

data = [[i, '%04d' % (i+31)] for i in range(1,6)]

# [[1, '0032'], [2, '0033'], [3, '0034'], [4, '0035'], [5, '0036']]

df = spark.createDataFrame(data, ['id', 'ordernum'])
df.show()
+---+--------+
| id|ordernum|
+---+--------+
|  1|    0032|
|  2|    0033|
|  3|    0034|
|  4|    0035|
|  5|    0036|
+---+--------+

如果你喜欢Spark范围，你可以使用 format_string :

import pyspark.sql.functions as F
df = spark.range(1, 6).withColumn(
    'ordernum',
    F.format_string('%04d', F.col('id') + 31)
)

df.show()
+---+--------+
| id|ordernum|
+---+--------+
|  1|    0032|
|  2|    0033|
|  3|    0034|
|  4|    0035|
|  5|    0036|
+---+--------+

赞(0）回复(0）举报 2021-07-13

n3ipq98p2#

你可以用 lpad 要创建的函数 ordernum 列自 id + 31 列左填充0以获得一个包含4位数字的字符串编号：

from pyspark.sql import functions as F

output = spark.range(1, 6).withColumn("ordernum", F.lpad(col("id") + 31, 4, '0'))

output.show()

# +---+--------+

# | id|ordernum|

# +---+--------+

# |  1|    0032|

# |  2|    0033|

# |  3|    0034|

# |  4|    0035|

# |  5|    0036|

# +---+--------+

赞(0）回复(0）举报 2021-07-13

我来回答

pyspark创建字符串类型的测试数据

2条答案

相关问题

热门标签

最新问答