问题类型
Bug
你是否在TF nightly版本中复现了这个bug?
否
源代码
二进制文件
Tensorflow版本
TF 2.10
自定义代码
是
OS平台和发行版
Ubuntu 20.04.5 LTS
移动设备
- 无响应*
Python版本
3.8.10
Bazel版本
- 无响应*
GCC/编译器版本
- 无响应*
CUDA/cuDNN版本
CUDA 11.8
GPU型号和内存
- 无响应*
当前行为?
我正在尝试使用TF-TRT将一个量化的TF模型转换为其他格式,但是以下问题阻止了我这样做。我已经尝试了一个临时的解决方法来修复问题#1,但对于下一个问题,我找不到可能的解决方案。根据PR #52248,当使用Tensor-RT 8时,Tensorflow应该支持显式Q/DQ模型。
- 问题1*
在Tensorflow模型中添加量化-反量化节点的非弃用方法是通过tf.quantization.quantize_and_dequantize_v2
。然而,这会添加tensorflow/tensorflow/python/ops/array_ops.py中的以下行:
b6517cc | @tf_export("quantization.quantize_and_dequantize_v2") | @dispatch.add_dispatch_support | defquantize_and_dequantize_v2( | input, # pylint: disable=redefined-builtin | nodes with tag QuantizeAndDequantizeV4
tensorflow/tensorflow/core/ops/array_ops.cc Line 2916 in 6285a27 | REGISTER_OP("QuantizeAndDequantizeV4") | which显然不在显式精度模式下的op列表中。tensorflow/tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.h Lines 35 to 39 in 7103c2c | // Operations with supported conversion to Q/DQ ops in TensorRT explicit | // precision mode. | constexpr std::array<constchar*, 1> kExplicitQuantizationOpNames = { | "QuantizeAndDequantizeV2", | }; A可能的解决方法是使用已弃用的API tf.quantization.quantize_and_dequantize
,它将添加一个仍然受支持的QuantizeAndDequantizeV2
节点。
- 问题2*
在执行上述解决方法后,我遇到了第二个错误(可能是由于使用不当)。要将使用TensorRT显式保存的TF模型转换为其他格式,我正在遵循Nvidia TF-TRT文档中提供的示例。然而,我遇到了一些失败的tensorRT引擎转换问题和一些警告,我在日志输出中附上了这些警告。预期行为显式量化的Tensorflow保存的模型应该可以与TF-TRT一起转换。
独立代码以重现问题
**Custom quantized keras layer to build an example model**
import tensorflow as tf
from tensorflow import keras
class CustomConv2D(keras.layers.Layer):
def __init__(self, filters, kernel_size, name="CustomConv2d"):
super(CustomConv2D, self).__init__()
self.w = self.add_weight(
shape=(kernel_size, kernel_size, filters, filters),
initializer="random_normal",
dtype="float32",
name=self.name+"_weights",
trainable=True
)
def call(self, inputs):
# Using the deprecated quantize_and_dequantize here since quantize_and_dequantize_v2 is listed as unsupported-ops by TF-TRT
q_i = tf.quantization.quantize_and_dequantize(inputs, 0, 1, name=self.name+"_q_i", narrow_range=True)
q_w = tf.quantization.quantize_and_dequantize(self.w, -1, 1, name=self.name+"q_w",narrow_range=True)
return tf.nn.conv2d(q_i, q_w, 2, "SAME")
l = CustomConv2D(64, 3)
t = tf.random.normal((1, 224, 224, 64), dtype="float32")
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=(224, 224, 64)))
for i in range(5):
model.add(CustomConv2D(64, 3, name=f'custom_conv2d_{i}'))
model.save('./saved_model_qat/')
**Code used for converting saved quantized TF model using TF-TRT**
```python
from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = trt.TrtGraphConverterV2(
input_saved_model_dir='saved_model_qat',
precision_mode=trt.TrtPrecisionMode.INT8,
use_calibration=False
)
trt_func = converter.convert()
converter.summary()
x_test = tf.ones((2, 224, 224, 64))
MAX_BATCH_SIZE=2
def input_fn():
batch_size = MAX_BATCH_SIZE
x = x_test[0:batch_size, :]
yield [x]
converter.build(input_fn=input_fn)
### Relevant log output
```shell
Logs generated
1. when using `quantize_and_dequantize_v2` instead of `quantize_and_dequantize` at the `trt_func = converter.convert()` step
INFO:tensorflow:Clearing prior device assignments in loaded saved model
INFO:tensorflow:Automatic mixed precision will be used on the whole TensorFlow Graph. This behavior can be deactivated using the environment variable: TF_TRT_EXPERIMENTAL_FEATURES=deactivate_mixed_precision.
More information can be found on: https://www.tensorflow.org/guide/mixed_precision.
2023-02-16 12:12:02.652333: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:12:02.653263: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2023-02-16 12:12:02.653394: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2023-02-16 12:12:02.653696: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:12:02.654338: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:12:02.655044: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:12:02.655721: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:12:02.656447: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:12:02.657032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20554 MB memory: -> device: 0, name: NVIDIA A10, pci bus id: 0000:04:00.0, compute capability: 8.6
2023-02-16 12:12:02.668533: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2359] Running auto_mixed_precision graph optimizer
2023-02-16 12:12:02.675439: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1195] Automatic Mixed Precision Grappler Pass Summary:
Total processable nodes: 46
Recognized nodes available for conversion: 11
Total nodes converted: 6
Total FP16 Cast ops used (excluding Const and Variable casts): 10
Allowlisted nodes converted: 5
Denylisted nodes blocking conversion: 0
Nodes blocked from conversion by denylisted nodes: 0
For more information regarding mixed precision training, including how to make automatic mixed precision aware of a custom op type, please see the documentation available here:
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-user-guide/index.html#tfamp
2023-02-16 12:12:02.682088: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:952]
################################################################################
TensorRT unsupported/non-converted OP Report:
- QuantizeAndDequantizeV4 -> 10x
- Conv2D -> 5x
- NoOp -> 2x
- Identity -> 1x
- Placeholder -> 1x
--------------------------------------------------------------------------------
- Total nonconverted OPs: 19
- Total nonconverted OP Types: 5
For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops.
################################################################################
2023-02-16 12:12:02.682177: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:1280] The environment variable TF_TRT_MAX_ALLOWED_ENGINES=20 has no effect since there are only 0 TRT Engines with at least minimum_segment_size=3 nodes.
2023-02-16 12:12:02.682195: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:799] Number of TensorRT candidate segments: 0
2. when using `quantize_and_dequantize` at the `trt_func = converter.convert()` step
```shell
INFO:tensorflow:Clearing prior device assignments in loaded saved model
INFO:tensorflow:Automatic mixed precision will be used on the whole TensorFlow Graph. This behavior can be deactivated using the environment variable: TF_TRT_EXPERIMENTAL_FEATURES=deactivate_mixed_precision.
More information can be found on: https://www.tensorflow.org/guide/mixed_precision.
2023-02-16 12:14:27.966166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:14:27.966825: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2023-02-16 12:14:27.966960: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2023-02-16 12:14:27.967241: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:14:27.967877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:14:27.968493: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:14:27.969167: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:14:27.969781: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-16 12:14:27.970355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20554 MB memory: -> device: 0, name: NVIDIA A10, pci bus id: 0000:04:00.0, compute capability: 8.6
2023-02-16 12:14:27.981918: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2359] Running auto_mixed_precision graph optimizer
2023-02-16 12:14:27.988304: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1195] Automatic Mixed Precision Grappler Pass Summary:
Total processable nodes: 46
Recognized nodes available for conversion: 11
Total nodes converted: 6
Total FP16 Cast ops used (excluding Const and Variable casts): 10
Allowlisted nodes converted: 5
Denylisted nodes blocking conversion: 0
Nodes blocked from conversion by denylisted nodes: 0
For more information regarding mixed precision training, including how to make automatic mixed precision aware of a custom op type, please see the documentation available here:
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-user-guide/index.html#tfamp
2023-02-16 12:14:27.993404: I tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:206] [TF-TRT] Using explicit QDQ mode
2023-02-16 12:14:27.994965: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_1/custom_conv2d_1q1_w has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995142: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_2/custom_conv2d_2q1_w has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995296: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_3/custom_conv2d_3q1_w has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995449: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_4/custom_conv2d_4q1_w has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995603: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_5/custom_conv2d_5q1_w has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995626: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_1/custom_conv2d_1_q1_i has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995681: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_2/custom_conv2d_2_q1_i has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995710: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_3/custom_conv2d_3_q1_i has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995737: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_4/custom_conv2d_4_q1_i has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995764: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_5/custom_conv2d_5_q1_i has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:14:27.995803: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:952]
################################################################################
TensorRT unsupported/non-converted OP Report:
- Conv2D -> 5x
- NoOp -> 2x
- Identity -> 1x
- Placeholder -> 1x
--------------------------------------------------------------------------------
- Total nonconverted OPs: 9
- Total nonconverted OP Types: 4
For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops.
################################################################################
2023-02-16 12:14:27.995933: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:1280] The environment variable TF_TRT_MAX_ALLOWED_ENGINES=20 has no effect since there are only 5 TRT Engines with at least minimum_segment_size=3 nodes.
2023-02-16 12:14:27.995954: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:799] Number of TensorRT candidate segments: 5
2023-02-16 12:14:27.997440: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:916] Replaced segment 0 consisting of 3 nodes by TRTEngineOp_000_000.
2023-02-16 12:14:27.997478: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:919] TF-TRT Warning: Cannot replace segment 1 consisting of 16 nodes by TRTEngineOp_000_001 reason: Segment has no inputs (possible constfold failure) (keeping original segment).
2023-02-16 12:14:27.997532: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:916] Replaced segment 2 consisting of 3 nodes by TRTEngineOp_000_002.
2023-02-16 12:14:27.997588: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:916] Replaced segment 3 consisting of 3 nodes by TRTEngineOp_000_003.
2023-02-16 12:14:27.997641: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:916] Replaced segment 4 consisting of 3 nodes by TRTEngineOp_000_004.
2023-02-16 12:14:28.000505: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1533] No allowlist ops found, nothing to do
2023-02-16 12:14:28.002105: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1533] No allowlist ops found, nothing to do
2023-02-16 12:14:28.003662: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1533] No allowlist ops found, nothing to do
2023-02-16 12:14:28.005172: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1533] No allowlist ops found, nothing to do
如上所示,Conv2D
节点奇怪地没有被TF-TRT转换。
- 下一步
converter.build(input_fn=input_fn)
引发更多错误
2023-02-16 12:16:44.399623: I tensorflow/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8700
2023-02-16 12:16:44.493304: I tensorflow/compiler/tf2tensorrt/common/utils.cc:104] Linked TensorRT version: 8.5.1
2023-02-16 12:16:44.493380: I tensorflow/compiler/tf2tensorrt/common/utils.cc:106] Loaded TensorRT version: 8.5.1
2023-02-16 12:16:46.885047: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger The NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION flag has been deprecated and has no effect. Please do not use this flag when creating the network.
2023-02-16 12:16:46.886238: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_2/custom_conv2d_2_q1_i has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:16:46.985131: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:1103] TF-TRT Warning: Engine creation for TRTEngineOp_000_000 failed. The native segment will be used instead. Reason: INTERNAL: tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:217 TRT_ENSURE_OK failure:
INTERNAL: ./tensorflow/compiler/tf2tensorrt/convert/ops/layer_utils.h:610 TRT_ENSURE failure
2023-02-16 12:16:46.985391: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:936] TF-TRT Warning: Engine retrieval for input shapes: [[2,112,112,64]] failed. Running native segment for TRTEngineOp_000_000
2023-02-16 12:16:49.308912: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger The NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION flag has been deprecated and has no effect. Please do not use this flag when creating the network.
2023-02-16 12:16:49.310048: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_3/custom_conv2d_3_q1_i has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:16:49.406425: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:1103] TF-TRT Warning: Engine creation for TRTEngineOp_000_002 failed. The native segment will be used instead. Reason: INTERNAL: tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:217 TRT_ENSURE_OK failure:
INTERNAL: ./tensorflow/compiler/tf2tensorrt/convert/ops/layer_utils.h:610 TRT_ENSURE failure
2023-02-16 12:16:49.406586: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:936] TF-TRT Warning: Engine retrieval for input shapes: [[2,56,56,64]] failed. Running native segment for TRTEngineOp_000_002
2023-02-16 12:16:51.735398: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger The NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION flag has been deprecated and has no effect. Please do not use this flag when creating the network.
2023-02-16 12:16:51.736541: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_4/custom_conv2d_4_q1_i has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:16:51.835118: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:1103] TF-TRT Warning: Engine creation for TRTEngineOp_000_003 failed. The native segment will be used instead. Reason: INTERNAL: tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:217 TRT_ENSURE_OK failure:
INTERNAL: ./tensorflow/compiler/tf2tensorrt/convert/ops/layer_utils.h:610 TRT_ENSURE failure
2023-02-16 12:16:51.835262: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:936] TF-TRT Warning: Engine retrieval for input shapes: [[2,28,28,64]] failed. Running native segment for TRTEngineOp_000_003
2023-02-16 12:16:54.067165: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger The NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION flag has been deprecated and has no effect. Please do not use this flag when creating the network.
2023-02-16 12:16:54.068481: W tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:146] QuantizeAndDequantizeV2: StatefulPartitionedCall/sequential/custom_conv2d_5/custom_conv2d_5_q1_i has narrow_range=true, but for TensorRT conversion, narrow_range=false is recommended.
2023-02-16 12:16:54.175458: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:1103] TF-TRT Warning: Engine creation for TRTEngineOp_000_004 failed. The native segment will be used instead. Reason: INTERNAL: tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops.cc:217 TRT_ENSURE_OK failure:
INTERNAL: ./tensorflow/compiler/tf2tensorrt/convert/ops/layer_utils.h:610 TRT_ENSURE failure
2023-02-16 12:16:54.175760: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:936] TF-TRT Warning: Engine retrieval for input shapes: [[2,14,14,64]] failed. Running native segment for TRTEngineOp_000_004
2023-02-16 12:16:54.201923: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:936] TF-TRT Warning: Engine retrieval for input shapes: [[2,112,112,64]] failed. Running native segment for TRTEngineOp_000_000
</details>
5条答案
按热度按时间dauxcl2d1#
请使用以下定义重新执行:
请将日志输出复制到这里。谢谢
9ceoxa922#
你好,
感谢快速的回复。我已经使用了你提到的标志,并收集了以下日志。
在使用
quantize_and_dequantize
于trt_func = converter.convert()
步骤时设置了这个标志的
converter.build(input_fn=input_fn)
步骤似乎抑制了我所得到的所有警告,而没有它的话。我注意到了其他的事情,当我检查转换后的模型摘要时,即转换器显示输入和输出的数据类型为
float16
,而在检查 TF-TRT 转换前的原始模型时,我可以看到层dtypes
被推断为float32
。这是 TF-TRT 的自动行为吗?可以被抑制吗?
6za6bjd03#
有关此问题的任何更新?@DEKHTIARJonathan,上面的日志是否有助于解决此问题,还是需要更多信息?
xghobddn4#
请注意,TF-TRT的显式量化和反量化仍然是实验性的,并不真正受支持。我刚刚完成了处理您提到的一些问题,包括不支持Conv2D(对我来说是因为它有一个Tensor输入)。
@DEKHTIARJonathan 是否有人在积极地解决这个问题?我很乐意加入并添加我已经找到的修复。
x33g5p2x5#
我已经在这里打开了另一个问题:#60168
我认为我已经实施的修复措施是真正的错误,也可以在这里提供帮助@codejaeger。