Issue #58 – Quantisation of Neural Machine Translation models

31 Oct19

Issue #58 – Quantisation of Neural Machine Translation models

Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic

When large amounts of training data are available, the quality of Neural MT engines increases with the size of the model. However, larger models imply decoding with more parameters, which makes the engine slower at test time. Improving the trade-off between model compactness and translation quality is an active research topic. One of the ways to achieve more compact models is via quantisation, that is, by requiring each parameter value to occupy a fixed number of bits, thus limiting the computational cost. In this post we take a look at a paper which achieves 4 times more compact Transformer Neural MT models via quantisation into 8 bit values, with no loss in translation quality according to BLEU score.

Method

Gabriele Prato et al. (2019) propose to quantise all operations which will provide a computational speed gain at test time. The method consists of using a function which assigns an integer between 0 and 255 (8 bits) to a parameter value, corresponding to where this value stands between the minimum and the maximum values taken by
To finish reading, please visit source site

Leave a Reply