RT-Thread-关于 tensorflow lite 量化的一点思考(简）RT-Thread问答社区

关于 tensorflow lite 量化的一点思考(简）

发布于 2021-06-30 15:51:34 浏览：1501 订阅该版

[tocm]

> tensorflow lite 官网：https://tensorflow.google.cn/lite/performance/post_training_integer_quant

**本文仅涉及到模型训练好之后的量化工作**

Tensorflow Lite 官网提供了三种量化方式：

![image.png](https://oss-club.rt-thread.org/uploads/20210630/8e91c0b87436ee156e57551fc0daa8af.png)

## 1 (未量化) 模型转 tflite 格式

```python
import tensorflow as tf

tflite_model = tf.keras.models.load_model(path)

# 如果是tf1: tf.compat.v1.lite.TFLiteConverter.from_saved_model()
# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(tflite_model) # path to the SavedModel directory
tflite_model = converter.convert()

# Convert the keras model.
converter = tf.lite.TFLiteConverter.from_keras_model(tflite_model)
tflite_model = converter.convert()

# Save the model.
with open('model.tflite', 'wb') as f:
  f.write(tflite_model)
```

## 2 Float 16 量化

将权重与激活函数均转换为16位浮点数。

- 模型减小1/2。

- 量化中精度损失最少

缺点：float16 量化模型在 CPU 上运行时会将权重值“反量化”为 float32。

```python
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_quant_model = converter.convert()
```

## 3 动态范围量化

Dynamic range quantization

只能用于CPU加速，

“动态范围”：根据激活函数的范围动态的将其转换为8bit整数

仅量化权重，从float32量化为int8，激活保持不变，模型减小了3/4 。

在推理的时候，把int8转回fp32，**输入和输出都是浮点数**

```python
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
```

![image.png](https://oss-club.rt-thread.org/uploads/20210630/7a0eb4ac4d2bc8bf8ceac2730e1463f8.png.webp)

## 4 全整型量化

需要校准或估计模型中所有浮点张量的范围，即 (min, max)，所以需要一部分的数据集。

将数据集送进去的函数：

```python
def representative_dataset():
    for sample in samples:
        yield [sample.image]
```

- 量化权重和偏置，输入输出是浮点型

```python
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# 定义示例数据生成器
def representative_dataset_gen():
  for _ in range(num_calibration_steps):
    # Get sample input data as a numpy array in a method of your choosing.
    yield [input]
    
# 为转换器提供示例数据
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()
```

- 输入输出是整型

```python
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
  for _ in range(num_calibration_steps):
    # Get sample input data as a numpy array in a method of your choosing.
    yield [input]
    
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()
```

可以看到输入和输出张量现在是整数格式：

```python
interpreter = tf.lite.Interpreter(model_content=tflite_model_quant)
input_type = interpreter.get_input_details()[0]['dtype']
print('input: ', input_type)
output_type = interpreter.get_output_details()[0]['dtype']
print('output: ', output_type)

# input:  <class 'numpy.uint8'>
# output:  <class 'numpy.uint8'>
```