csv python多处理 Dataframe 行

wrrgggsh 于 2023-03-21 发布在 Python

关注(0)|答案(1)|浏览(132)

def main():     
    df_master = read_bb_csv(file)
                p = Pool(2)
                if len(df_master.index) >= 1:
                    for row in df_master.itertuples(index=True, name='Pandas'):
                         p.map((partial(check_option, arg1=row), df_master))
    
    
    def check_option(row):
       get_price(row)

我正在使用Pandas来读取CSV文件，遍历行并处理信息。给予get_price（）需要进行几次http调用，我想使用多进程来一次处理所有行（取决于CPU内核）以加快函数的速度。
我遇到的问题是，我是多进程新手，不知道如何使用p.map（（check_option，arg1=row），df_master）处理 Dataframe 中的所有行。不需要将row值返回给函数。只需要允许进程处理行。
谢谢你的帮助。

csv

来源：https://stackoverflow.com/questions/69487692/python-multiprocessing-dataframe-rows

1条答案

按热度按时间

axr492tv1#

你可以使用下面的python3版本，我在任何地方都用它，它的工作就像一个魅力！还有一个python3包mpire，我发现它真的很有用，用法与python3的多处理包类似。

from multiprocessing import Pool
import pandas as pd

def get_price(idx, row):
    # logic to fetch price
    return idx, price

def main():
    df = pd.read_csv("path to file")
    NUM_OF_WORKERS = 2 
    with Pool(NUM_OF_WORKERS) as pool:
        results = [pool.apply_async(get_price, [idx, row]) for idx, row in df.iterrows()]
        for result in results:
            idx, price = result.get()
            df.loc[idx, 'Price'] = price
    # do whatever you want to do with df, save it to same file.

if __name__ == "__main__":
    # don't forget to call main func as module
    # This is must in windows use multiple processes/threads. It's also a good practice, more info on this page https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming
    main()

赞(0）回复(0）举报 2023-03-21

我来回答

csv python多处理 Dataframe 行

1条答案

相关问题

热门标签

最新问答