- 版本、环境信息:
1)PaddlePaddle版本:1.5
2)CPU:
3)GPU:p40,cudnn 7.1
4)系统环境:python 2.7
- 训练信息
1)单机,单卡
2)22G
- 问题描述:
训练带depthwise conv的网络发现速度极慢,测试结果如下(实验中stride=1,kernel_size=3)
75 class SepConv(fluid.dygraph.Layer):
76 def __init__(self, namescope, C_in, C_out, kernel_size, stride, padding):
77 super(SepConv, self).__init__(namescope)
78 self.layer_helper = LayerHelper(self.full_name(), act='relu')
79 self.conv1 = fluid.dygraph.Conv2D(self.full_name(), C_in, filter_size=kernel_size, stride=stride, padding=padding, groups=C_in, bias_attr=False)
80 self.conv2 = fluid.dygraph.Conv2D(self.full_name(), C_in, filter_size=1, padding=0, bias_attr=False)
81 self.bn1 = fluid.dygraph.BatchNorm(self.full_name(), C_in, act='relu')
82 self.conv3 = fluid.dygraph.Conv2D(self.full_name(), C_in, filter_size=kernel_size, stride=1, padding=padding, group s=C_in, bias_attr=False)
83 self.conv4 = fluid.dygraph.Conv2D(self.full_name(), C_out, filter_size=1, padding=0, bias_attr=False)
84 self.bn2 = fluid.dygraph.BatchNorm(self.full_name(), C_out)
85
86 def forward(self, x):
87 x = self.layer_helper.append_activation(x)
88 start = time.time()
89 x = self.conv1(x)
90 print("conv1 {}".format(time.time()-start))
91 start = time.time()
92 x = self.conv2(x)
93 print("conv2 {}".format(time.time()-start))
94 x = self.bn1(x)
95 start = time.time()
96 x = self.conv3(x)
97 print("conv3 {}".format(time.time()-start))
98 start = time.time()
99 x = self.conv4(x)
100 print("conv4 {}".format(time.time()-start))
101 x = self.bn2(x)
102 return x
conv1 0.00126600265503
conv2 0.000811815261841
conv3 0.00108098983765
conv4 0.000757932662964
conv1 0.00116205215454
conv2 0.000720977783203
conv3 0.0011100769043
conv4 0.000762939453125
conv1 0.00119805335999
conv2 0.000770807266235
conv3 0.00117683410645
conv4 0.000722885131836
conv1 0.00113987922668
conv2 0.000788927078247
conv3 0.00113391876221
conv4 0.000740051269531
conv1 0.00114893913269
conv2 0.000794887542725
conv3 0.00111103057861
conv4 0.000723123550415
conv1 0.00114107131958
conv2 0.000730037689209
conv3 0.00116300582886
conv4 0.000748157501221
conv1 0.00130414962769
conv2 0.000818967819214
conv3 0.00115299224854
conv4 0.000853061676025
conv1 0.00120306015015
conv2 0.00074291229248
conv3 0.00116586685181
conv4 0.000756978988647
conv1 0.00118207931519
conv1 0.0135319232941
conv2 0.000817060470581
conv3 0.0081729888916
conv4 0.000773906707764
conv1 0.00786113739014
conv2 0.000820159912109
conv3 0.00811100006104
conv4 0.000760078430176
conv1 0.00880193710327
conv2 0.000787973403931
conv3 0.00756812095642
conv4 0.000771999359131
conv1 0.00728583335876
conv2 0.00105094909668
conv3 0.00719094276428
conv4 0.00080680847168
conv1 0.00528907775879
conv2 0.000849962234497
conv3 0.00715088844299
conv4 0.000745058059692
conv1 0.0129702091217
conv2 0.000772953033447
conv3 0.00745916366577
conv4 0.000847101211548
conv1 0.00710797309875
conv2 0.000824928283691
conv3 0.00743889808655
conv4 0.000769138336182
conv1 0.00778412818909
conv2 0.000773906707764
conv3 0.00764393806458
conv4 0.000722169876099
conv1 0.00678586959839
随着channel加深,feature map减小,depthwise conv的速度越来越慢,理论上depthwise conv计算量应该远小于1x1 的pointwise conv
4条答案
按热度按时间gdx19jrr1#
@Gaffey 默认conv使用cuDNN的conv,但cuDNN v7.1的group conv的速度较慢,Paddle自身有优化过depthwise conv,需要设置use_cudnn=False。
eqzww0vc2#
@Gaffey 默认conv使用cuDNN的conv,但cuDNN v7.1的group conv的速度较慢,Paddle自身有优化过depthwise conv,需要设置use_cudnn=False。
@qingqing01 在group conv的layer上加了use_cudnn=False,看上去比之前更慢了,请问还需要调整其他配置吗
nfs0ujit3#
@Gaffey 您是group = 输入channel = 输出channel吗? 我们MobileNet里都采用的是use_cudnn=False配置的。 能个完整的可测试的代码吗?
mec1mxoz4#
@Gaffey 您是group = 输入channel = 输出channel吗? 我们MobileNet里都采用的是use_cudnn=False配置的。 能个完整的可测试的代码吗?
我在hi上联系您