我最近了解了擎天柱,我试图用它来清理推文。
我将我的tweet导入csv数据框,然后将数据保存到optimus。
我偶然发现了以下清理tweets的代码,并在jupyter中运行了它:
clean_tweets = df.cols.remove_accents("tweet") \
.cols.remove_special_chars("tweet")
我收到以下错误消息:
ValueError Traceback(most recent call last)
<ipython-input-32-156da3e90955> in <module>
----> 1 clean_tweets = df.cols.remove_accents("tweet") \
2 .cols.remove_special_chars("tweet")
~\AppData\Roaming\Python\Python36\site-packages\optimus\helpers\decorators.py in wrapper(args, *kwargs)
47 def wrapper(args, *kwargs):
48 start_time = timeit.default_timer()
---> 49 f = func(args, *kwargs)
50 _time = round(timeit.default_timer() - start_time, 2)
51 if log_time:
~\AppData\Roaming\Python\Python36\site-packages\optimus\dataframe\columns.py in remove_accents(input_cols, output_cols)
954 return with_out_accents
955
--> 956 df = apply(input_cols, _remove_accents, "string", output_cols=output_cols, meta=Actions.REMOVE_ACCENTS.value)
957 return df
958
~\AppData\Roaming\Python\Python36\site-packages\optimus\dataframe\columns.py in apply(input_cols, func, func_return_type, args, func_type, when, filter_col_by_dtypes, output_cols, skip_output_cols_processing, meta)
240
241 for input_col, output_col in zip(input_cols, output_cols):
--> 242 df = df.withColumn(output_col, expr(when))
243 df = df.preserve_meta(self, meta, output_col)
244
~\AppData\Roaming\Python\Python36\site-packages\optimus\dataframe\columns.py in expr(_when)
232
233 def expr(_when):
--> 234 main_query = audf(input_col, func, func_return_type, args, func_type)
235 if when is not None:
236 # Use the data type to filter the query
~\AppData\Roaming\Python\Python36\site-packages\optimus\audf.py in abstract_udf(col, func, func_return_type, attrs, func_type)
30 types = ["column_exp", "udf", "pandas_udf"]
31 if func_type not in types:
---> 32 RaiseIt.value_error(func_type, types)
33
34 # It handle if func param is a plain expression or a function returning and expression
~\AppData\Roaming\Python\Python36\site-packages\optimus\helpers\raiseit.py in value_error(var, data_values)
76 type=divisor.join(map(
77 lambda x: "'" + x + "'",
---> 78 data_values)), var_type=one_list_to_val(var)))
79
80 @staticmethod
ValueError: 'func_type' must be 'column_exp', 'udf', 'pandas_udf', received 'None'
我怎样才能纠正这个错误?从我浏览过的所有文档和页面来看,这一步似乎不应该出现任何错误。
暂无答案!
目前还没有任何答案,快来回答吧!