pandas 使用for循环替换 Dataframe 中的所有字符串

2lpgd968 于 2023-02-02 发布在其他

关注(0)|答案(3)|浏览(137)

| 价格|
| - ------|
| 145加元|
| + 二十二块三四|
| 来自美国|
嗨，这是我上面的df，我基本上想删除所有的特殊字符（$，+和空格），然后我想移动它们，使它们看起来像下面的表作为整数，这样我就可以处理成CSV和分析超过50行的价格。
| 价格|装运|起源|
| - ------|- ------|- ------|
| 一百四十五|二十二点三四分|来自美国|
我在想也许我需要把它放在一个数据框里，然后用一个iloc把它移过来？

Apples=["$ 145", "+ 22.34", "From USA"]
df=pd.DataFrame({'Price': Apples})

new_df=pd.DataFrame({'Price':df['Price'].iloc[::2].values, 'Shipping':df2['Price'].iloc[1::2].values})

在这一点上，它需要的价格和航运，但它只转移到两列，我只需要它是为3做的。我如何才能使它进入列一样，在新的表上面，也剥离列"价格"和"航运"与所有的字符串，也许像这样的东西，但复制它的列价格和航运？谢谢你的任何帮助，我刚刚开始在Pandas和Python真的!

new_df['Price']=new_df.Price.str.extract(r'(\d+[.\d]*)')

pandas

来源：https://stackoverflow.com/questions/75279858/replace-all-strings-in-a-dataframe-using-for-loop

3条答案

按热度按时间

xxe27gdn1#

将方向从垂直更改为水平可以使用转置来完成：

df = df.T
df.columns = ['price', 'shipping', 'origin']

然后，您可以根据需要处理每一列以将字符串转换为整型或浮点型：

import re  # regular expressions

# Replace anything not a digit ('\D') with an empty string, then convert to 
# int
df.price = df.price.apply(lambda x: int(re.sub(r'\D', '', x)))

# Replace anything not a digit or decimal ('[^0-9.]') with an empty string, 
# then convert to float
df.shipping = df.shipping.apply(
    lambda x: float(re.sub(r'[^0-9.]', '', x)))

# put it all in a single function:
def convert_df(df):
    df = df.T
    df.columns = ['price', 'shipping', 'origin']
    df.price = df.price.apply(lambda x: int(re.sub(r'\D', '', x)))
    df.shipping = df.shipping.apply(
        lambda x: float(re.sub(r'[^0-9.]', '', x)))
    return df

即使您的初始输入包含多个列，例如：

apples = ['$ C145', '+ $22.34', 'From USA']
corn = ['$ C197', '+ $18.46', 'From Canada']
df2 = pd.DataFrame({'apples': apples, 'corn': corn})

convert_df(df2)

赞(0）回复(0）举报 2023-02-02

7fhtutme2#

因此，您的数据如下所示：price, shipping, origin, price, shipping, origin, price...？在这种情况下，您可以使用如下语法每隔3行提取一次：

df.iloc[list(range(start, df.index.max(), 3)), :].reset_index(drop=True)

..然后将它们连接起来，并应用上述数字提取。
一个一个一个一个一个x一个一个二个一个x一个一个三个一个x一个一个x一个四个一个

赞(0）回复(0）举报 2023-02-02

insrf1ej3#

另一种可能的解决方案：

(df['Price'].str.replace(r'^\W', '', regex=True)
 .to_frame().assign(id = np.repeat(np.arange(len(df) // 3), 3), 
                    name = ['Price', 'shipping', 'origin'] * (len(df) // 3))
 .pivot(index='id', columns='name', values = 'Price')
 .reset_index(drop=True).rename_axis(None, axis=1))

输出：

Price    origin shipping
0   145  From USA    22.34
1   123   From UK    12.45

输入：

Price
0     $ 145
1   + 22.34
2  From USA
3     £ 123
4   + 12.45
5   From UK

赞(0）回复(0）举报 2023-02-02

我来回答

pandas 使用for循环替换 Dataframe 中的所有字符串

3条答案

相关问题

热门标签

最新问答