如何删除spark上的特定值?

o8x7eapl  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(303)

我已经尝试过删除,但是,值仍然存在,解决方法是我创建另一个这样的Dataframe

df_trans_new = df_transactional.filter("Quantity>=0")
``` `df_trans_new.show()` 但是我想从那列中去掉负的entries。谢谢
Python:

df_transactional = spark.read.option("sep", ",")
.option("inferSchema", "true")
.option("header", "true")
.csv("dbfs:/FileStore/tables/transactional_dataset.csv")
df_trans_new = df_transactional.filter("Quantity>=0")
df_trans_new.show()

---------+---------+--------------------+--------+--------------+---------+----------+--------------+
|InvoiceNo|StockCode| Description|Quantity| InvoiceDate|UnitPrice|CustomerID| Country|
+---------+---------+--------------------+--------+--------------+---------+----------+--------------+
| 536365| 85123A|WHITE HANGING HEA...| 6|12/1/2010 8:26| 2.55| 17850|United Kingdom|
| 536365| 71053| WHITE METAL LANTERN| 6|12/1/2010 8:26| 3.39| 17850|United Kingdom|
| 536365| 84406B|CREAM CUPID HEART...| 8|12/1/2010 8:26| 2.75| 17850|United Kingdom|
| 536365| 84029G|KNITTED UNION FLA...| 6|12/1/2010 8:26| 3.39| 17850|United Kingdom|
| 536365| 84029E|RED WOOLLY HOTTIE...| 6|12/1/2010 8:26| 3.39| 17850|United Kingdom|
| 536365| 22752|SET 7 BABUSHKA NE...| -2|12/1/2010 8:26| 7.65|

我需要去掉数量栏上所有的负数
1tuwyuhd

1tuwyuhd1#

我怀疑您正在使用python,我试过了,它在pyspark和scala中都能工作:
Python:

df_transactional = spark.createDataFrame([("a", -1), ("b", 1), ("c", 0)], ["Name", "Quantity"])
df_trans_new = df_transactional.filter("Quantity>=0")
df_trans_new.show()

斯卡拉:

val df_transactional = Seq(("a", -1), ("b", 1), ("c", 0)).toDF("Name", "Quantity")
val df_trans_new = df_transactional.filter("Quantity>=0") 
df_trans_new.show()

两种结果都是:

+----+--------+
|Name|Quantity|
+----+--------+
|   b|       1|
|   c|       0|
+----+--------+
niwlg2el

niwlg2el2#

尝试在scala中使用您的数据(在python中的功能也一样),效果很好-

val data1 =
      """
        |InvoiceNo|StockCode|         Description|Quantity|   InvoiceDate|UnitPrice|CustomerID|       Country
        |   536365|   85123A|WHITE HANGING HEA...|       6|12/1/2010 8:26|     2.55|     17850|United Kingdom
        |   536365|    71053| WHITE METAL LANTERN|       6|12/1/2010 8:26|     3.39|     17850|United Kingdom
        |   536365|   84406B|CREAM CUPID HEART...|       8|12/1/2010 8:26|     2.75|     17850|United Kingdom
        |   536365|   84029G|KNITTED UNION FLA...|       6|12/1/2010 8:26|     3.39|     17850|United Kingdom
        |   536365|   84029E|RED WOOLLY HOTTIE...|       6|12/1/2010 8:26|     3.39|     17850|United Kingdom
        |   536365|    22752|SET 7 BABUSHKA NE...|       -2|12/1/2010 8:26|     7.65|    17850|United Kingdom
      """.stripMargin

    val stringDS = data1.split(System.lineSeparator())
      .map(_.split("\\|").map(_.replaceAll("""^[ \t]+|[ \t]+$""", "")).mkString(","))
      .toSeq.toDS()
    val df = spark.read
      .option("sep", ",")
      .option("inferSchema", "true")
      .option("header", "true")
      .csv(stringDS)
    df.show(false)
    df.printSchema()

    df.filter("Quantity>=0").show(false)

输出-

+---------+---------+--------------------+--------+--------------+---------+----------+--------------+
|InvoiceNo|StockCode|Description         |Quantity|InvoiceDate   |UnitPrice|CustomerID|Country       |
+---------+---------+--------------------+--------+--------------+---------+----------+--------------+
|536365   |85123A   |WHITE HANGING HEA...|6       |12/1/2010 8:26|2.55     |17850     |United Kingdom|
|536365   |71053    |WHITE METAL LANTERN |6       |12/1/2010 8:26|3.39     |17850     |United Kingdom|
|536365   |84406B   |CREAM CUPID HEART...|8       |12/1/2010 8:26|2.75     |17850     |United Kingdom|
|536365   |84029G   |KNITTED UNION FLA...|6       |12/1/2010 8:26|3.39     |17850     |United Kingdom|
|536365   |84029E   |RED WOOLLY HOTTIE...|6       |12/1/2010 8:26|3.39     |17850     |United Kingdom|
|536365   |22752    |SET 7 BABUSHKA NE...|-2      |12/1/2010 8:26|7.65     |17850     |United Kingdom|
+---------+---------+--------------------+--------+--------------+---------+----------+--------------+

root
 |-- InvoiceNo: integer (nullable = true)
 |-- StockCode: string (nullable = true)
 |-- Description: string (nullable = true)
 |-- Quantity: integer (nullable = true)
 |-- InvoiceDate: string (nullable = true)
 |-- UnitPrice: double (nullable = true)
 |-- CustomerID: integer (nullable = true)
 |-- Country: string (nullable = true)

+---------+---------+--------------------+--------+--------------+---------+----------+--------------+
|InvoiceNo|StockCode|Description         |Quantity|InvoiceDate   |UnitPrice|CustomerID|Country       |
+---------+---------+--------------------+--------+--------------+---------+----------+--------------+
|536365   |85123A   |WHITE HANGING HEA...|6       |12/1/2010 8:26|2.55     |17850     |United Kingdom|
|536365   |71053    |WHITE METAL LANTERN |6       |12/1/2010 8:26|3.39     |17850     |United Kingdom|
|536365   |84406B   |CREAM CUPID HEART...|8       |12/1/2010 8:26|2.75     |17850     |United Kingdom|
|536365   |84029G   |KNITTED UNION FLA...|6       |12/1/2010 8:26|3.39     |17850     |United Kingdom|
|536365   |84029E   |RED WOOLLY HOTTIE...|6       |12/1/2010 8:26|3.39     |17850     |United Kingdom|
+---------+---------+--------------------+--------+--------------+---------+----------+--------------+

相关问题