python 无法在我的Pandas数据框中保留我的日期时间数据和“否”字

rsl1atfo 于 2022-12-02 发布在 Python

关注(0)|答案(1)|浏览(114)

我有一个来自csv的Pandas Dataframe ，我想用Python中的正则表达式清理它。我得到的数据看起来像这样：
| 名称名称名称|日期|状态|编号|
| - -|- -|- -|- -|
| A/b代码定义|2022年7月11日|是的|输入输出123 -07|
| 吉吉|2022年7月12日|没有|输入输出456 -08|
我试图清理 Dataframe ，这样它会更容易处理，但问题是，我的代码删除了日期，单词'no'，和连字符。
这是我目前得到的数据：
| 姓名|日期|状态|号码|
| - -|- -|- -|- -|
| abc定义||是的|输入输出|
| 吉伊克尔||不可以|输入输出|
这是我在互联网上找到的代码，并在我的数据框架上尝试：

def regex_values(cols):
    nltk.download("stopwords")
    stemmer = nltk.SnowballStemmer('english')
    stopword = set(stopwords.words('english'))

    cols = str(cols).lower()
    cols = re.sub('\[.*?\]', '', cols)
    cols = re.sub('https?://\S+|www\.\S+', '', cols)
    cols = re.sub('<.*?>+/', '', cols)
    cols = re.sub('[%s]' % re.escape(string.punctuation), '', cols)
    cols = re.sub('\n', '', cols)
    cols = re.sub('\w*\d\w*', '', cols)
    cols = re.sub(r'^\s+|\s+$', '', cols)
    cols = re.sub(' +', ' ', cols)
    cols = re.sub(r'\b(\w+)(?:\W\1\b)+', 'r\1', cols, flags = re.IGNORECASE)
    cols = [word for word in cols.split(' ') if word not in stopword]
    cols = " ".join(cols)
    
    return cols

这是我希望在最后得到的Pandas Dataframe ：
| 姓名|日期|状态|号码|
| - -|- -|- -|- -|
| abc定义|2022年7月11日|是的|输入输出123 -07|
| 吉伊克尔|2022年7月12日|不可以|输入输出456 -08|
我是Regex的新手，所以我希望任何人都能帮助我编写正确的代码。或者如果有一个更简单的方法来清理我的数据，我会非常感谢帮助。提前感谢。

python

来源：https://stackoverflow.com/questions/74636944/cannot-keep-my-datetime-data-and-no-word-in-my-pandas-dataframe

1条答案

按热度按时间

olqngx591#

你能试试这个吗：

df = df.applymap(lambda s: s.lower() if type(s) == str else s) #lower string values
df.columns = df.columns.str.lower() #lower for columns
df['name']=df['name'].str.replace(r'\W+', '') #remove any non-word character

#output
'''
     name        date status    number
0  abcdef  2022-07-11    yes  io123-07
1  ghijkl  2022-07-12     no  io456-08
'''

赞(0）回复(0）举报 2022-12-02

我来回答

python 无法在我的Pandas数据框中保留我的日期时间数据和“否”字

1条答案

相关问题

热门标签

最新问答