我有一个Dataframe df
带以下格式
|constraint |constraint_status |constraint_msg
+----------------------------------------------------------------------------------------------------------------+--------------------------------+
|CompletenessConstraint |Success |Value: 1.0 Notnull condition should be satisfied
|UniquenessConstraint |Success |Value: 1.0 Uniqueness condition should be satisfied |
|PatternMatchConstraint |Failure |Expected type of column CHD_ACCOUNT_NUMBER to be StringType |
|MinimumConstraint |Success |Value: 5.1210650000005 Minimum value should be greater than 10.000000
|HistogramConstraint |Failure |Can't execute the assertion: key not found: 1242.0!Percentage should be greater than 10.000000|
我想得到后面的数值 Value:
字符串并创建新列 Value
.
预期产量
|constraint |constraint_status |constraint_msg |Value
+----------------------------------------------------------------------------------------------------------------+--------------------------------+
|CompletenessConstraint |Success |Value: 1.0 Notnull condition should be satisfied | 1.0
|UniquenessConstraint |Success |Value: 1.0 Uniqueness condition should be satisfied | 1.0
|PatternMatchConstraint |Failure |Expected type of column CHD_ACCOUNT_NUMBER to be StringType | null
|MinimumConstraint |Success |Value: 5.1210650000005 Minimum value should be greater than 10.000000 | 5.1210650000005
|HistogramConstraint |Failure |Can't execute the assertion: key not found: 1242.0!Percentage should be greater than 10.000000| null
我试过以下代码:
df = df.withColumn("Value",split(df("constraint_msg"), "Value\\: (\\d+)").getItem(0))
但是有个错误。需要帮助!
org.apache.spark.sql.analysisexception:无法解析“split”( constraint_msg
,'value\:(\d+)'由于数据类型不匹配:参数1需要字符串类型,但是,' constraint_msg
'是数组类型。;;
2条答案
按热度按时间1zmg4dgp1#
when..otherwise
将帮助您首先筛选那些不包含Value:
. 假设约束总是以Value:
,我选择分割后的第二个元素作为所需值。k4ymrczo2#
检查以下代码。