āAmortized Noisy Channel Neural Machine Translationā, 2021-12-16 (; similar)ā :
[self-distillation] Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like ābeam search and rerankā (BSR) incur substantial computation overhead during inference, making real-world application infeasible.
We aim to build an amortized noisy channel NMT model such that greedily decoding from it would generate translations that maximize the same reward as translations generated using BSR. We attempt 3 approaches: knowledge distillation, 1-step-deviation imitation learning, and Q-learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter 2 approaches aim to optimize toward a noisy-channel MT reward directly.
All 3 approaches speed up inference by 1ā2 orders of magnitude. For all 3 approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU is similar to the quality of BSR-produced translations.