Paddle How to limit GPU memory of Fluid

wfsdck30  于 2022-04-21  发布在  Java
关注(0)|答案(3)|浏览(276)

System information

-PaddlePaddle version 1.3
-CPU: i7-6700
-GPU: NVIDIA 1080TI CUDA:9.2
-OS Platform Ubuntu 16.04
-Python version 3.5

When I run Pyramid Box model (from widerface_eval.py) on a 1080TI GPU it creates out of memory error (on a single image). I lowered the numbers in the 'get_shrink' function to make it fit the GPU memory (thus doing a smaller resize). Is there another way or environment variable which can control the total GPU memory consumed by fluid? I found someone in another thread who ran this network with an 8GB GPU without problems so I assume this is possible. I'm looking for something similar to Tensorflow's per_process_gpu_memory_fraction which can run a TF process on part of the GPU. Suppose I want to use 80% of the GPU memory, how can I do it in fluid?

hi3rlvi2

hi3rlvi21#

@AmitRozner You can remove this code : https://github.com/PaddlePaddle/models/blob/4dc42a621ec5b2f9c369dc8f6b6e9da18bf932e6/fluid/PaddleCV/face_detection/widerface_eval.py#L47-L53

Or use this code:

import argparse
import functools
import os
import time
import numpy as np
import paddle.fluid as fluid
from PIL import Image

import reader
from utility import add_arguments, print_arguments
from visualize import draw_bboxes

use_gpu = True

# 创建执行器

place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())

# 保存预测模型路径

save_path = 'pyramidbox_model/'

# 从模型中获取预测程序、输入数据名称列表、分类器

[infer_program, feeded_var_names, target_var] = fluid.io.load_inference_model(dirname=save_path, 
                                                                              executor=exe,
                                                                              model_filename='model',
                                                                              params_filename='params')

def infer(image_path, confs_threshold):

    if True:
        image = Image.open(image_path)
        if image.mode == 'L':
            image = image.convert('RGB')
        shrink, max_shrink = get_shrink(image.size[1], image.size[0])

        start = time.time()
        det0 = detect_face(image, shrink)

        dets = det0
        end = time.time()
        print("infer time: %f" % (end - start))

        keep_index = np.where(dets[:, 4] >= confs_threshold)[0]
        dets = dets[keep_index, :]
        draw_bboxes(image_path, dets[:, 0:4])

def detect_face(image, shrink):
    image_shape = [3, image.size[1], image.size[0]]
    if shrink != 1:
        h, w = int(image_shape[1] * shrink), int(image_shape[2] * shrink)
        image = image.resize((w, h), Image.ANTIALIAS)
        image_shape = [3, h, w]

    img = np.array(image)
    img = reader.to_chw_bgr(img)
    mean = [104., 117., 123.]
    scale = 0.007843
    img = img.astype('float32')
    img -= np.array(mean)[:, np.newaxis, np.newaxis].astype('float32')
    img = img * scale
    img = [img]
    img = np.array(img)

    detection, = exe.run(infer_program,
                         feed={feeded_var_names[0]: img},
                         fetch_list=target_var,
                         return_numpy=False)
    detection = np.array(detection)
    # layout: xmin, ymin, xmax. ymax, score
    if np.prod(detection.shape) == 1:
        print("No face detected")
        return np.array([[0, 0, 0, 0, 0]])
    det_conf = detection[:, 1]
    det_xmin = image_shape[2] * detection[:, 2] / shrink
    det_ymin = image_shape[1] * detection[:, 3] / shrink
    det_xmax = image_shape[2] * detection[:, 4] / shrink
    det_ymax = image_shape[1] * detection[:, 5] / shrink

    det = np.column_stack((det_xmin, det_ymin, det_xmax, det_ymax, det_conf))
    return det

def get_shrink(height, width):
    """
Args:
height (int): image height.
width (int): image width.
"""
    # avoid out of memory
    max_shrink_v1 = (0x7fffffff / 577.0 / (height * width))**0.5
    max_shrink_v2 = ((678 * 1024 * 2.0 * 2.0) / (height * width))**0.5

    def get_round(x, loc):
        str_x = str(x)
        if '.' in str_x:
            str_before, str_after = str_x.split('.')
            len_after = len(str_after)
            if len_after >= 3:
                str_final = str_before + '.' + str_after[0:loc]
                return float(str_final)
            else:
                return x

    max_shrink = get_round(min(max_shrink_v1, max_shrink_v2), 2) - 0.3
    if max_shrink >= 1.5 and max_shrink < 2:
        max_shrink = max_shrink - 0.1
    elif max_shrink >= 2 and max_shrink < 3:
        max_shrink = max_shrink - 0.2
    elif max_shrink >= 3 and max_shrink < 4:
        max_shrink = max_shrink - 0.3
    elif max_shrink >= 4 and max_shrink < 5:
        max_shrink = max_shrink - 0.4
    elif max_shrink >= 5:
        max_shrink = max_shrink - 0.5

    shrink = max_shrink if max_shrink < 1 else 1
    return shrink, max_shrink

if __name__ == '__main__':
    confs_threshold = 0.15
    image_path = 'images/000001.jpg'
    infer(image_path, confs_threshold)
s4n0splo

s4n0splo2#

First, try export FLAGS_fraction_of_gpu_memory_to_use= to a smaller value, but this won't change the total used gpu memory.

Can you find out how much memory this model need exactly? And can you try to reduce the test set of images?

4ngedf3f

4ngedf3f3#

@yeyupiaoling Already tried this code but it uses more than 10GB of memory on my GPU. Based on your experience with 8GB GPU there must be a way to lower the GPU consumption.

@Xreki I tried the FLAGS_fraction_of_gpu_memory_to_use but it seems to work only for part of the cases and it does not control the total GPU memory used. If I lower it a little bit it seems to work but if I lower it below some number it is completely ignored and takes more than 90% of the GPU. Could you explain how it works?
How can I find out how much memory the model needs? I think it is input dependent.

相关问题