tensorflow 什么是tf.bfloat16“截断的16位浮点”？

igetnqfo 于 2022-11-16 发布在其他

关注(0)|答案(2)|浏览(467)

https://www.tensorflow.org/versions/r0.12/api_docs/python/framework/tensor_types中列出的tf.float16和tf.bfloat16之间有什么区别？
还有，他们所说的“量化整数”是什么意思？

tensorflow

来源：https://stackoverflow.com/questions/44873802/what-is-tf-bfloat16-truncated-16-bit-floating-point

2条答案

按热度按时间

8cdiaqws1#

bfloat16是一种tensorflow 格式，不同于IEEE自己的float16，因此有了新的名称。
基本上，bfloat16是float32的前16位截断值。因此，它有相同的8位用于指数，只有7位用于尾数。因此，它很容易与float32进行转换，并且因为它与float32具有基本相同的值域。它最小化了当从float32切换时具有NaN或爆发/消失梯度的风险。
从sources：

// Compact 16-bit encoding of floating point numbers. This representation uses
// 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa.  It
// is assumed that floats are in IEEE 754 format so the representation is just
// bits 16-31 of a single precision float.
//
// NOTE: The IEEE floating point standard defines a float16 format that
// is different than this format (it has fewer bits of exponent and more
// bits of mantissa).  We don't use that format here because conversion
// to/from 32-bit floats is more complex for that format, and the
// conversion for this format is very simple.

至于量化整数，它们被设计用来代替训练过的网络中的浮点数，以加快处理速度，基本上，它们是真实的的一种定点编码，尽管选择了一个操作范围来表示网络中任何给定点的观察分布。
关于量化here的更多信息。

赞(0）回复(0）举报 2022-11-16

nr7wwzry2#

下面的图片描述了三种浮点格式的内部结构：