我正在尝试删除rdd中每个单词的特殊字符:
special_characters = '~!@#$%^&*()_+-=[]{};:,<.>/?'
def remove_special_characters(word):
for character in special_characters[0: len(special_characters)]:
word = word.replace(character, '')
return word
words = lines.flatMap(lambda line: line.split(" "))
words_lower = words.map(lambda word: word.lower())
clean_words_1 = words_lower.map(lambda word: remove_special_characters(word))
clean_words_2 = words_lower.map(remove_special_characters)
每个单词只替换第一个特殊字符。
1条答案
按热度按时间t8e9dugd1#
return需要放在for循环的外部。