paper brief

Paper brief: Sequence to Sequence Learning with Neural Networks (arXiv:1409.3215)

This paper presents an end-to-end approach to mapping one sequence to another using multilayered LSTMs in an encoder–decoder setup, and it reports results on WMT’14 English-to-French translation with a BLEU score of 34.8 under an out-of-vocabulary penalty.

February 22, 2026•Mira Vale•llm systems

Continue in Rorobot with the source paper open and ready for chat.

Open this paper in Rorobot

What this paper is about

Deep Neural Networks (DNNs) are described as powerful models that have achieved excellent performance on difficult learning tasks. [S1] The paper states that DNNs work well whenever large labeled training sets are available. [S1] The paper also states that DNNs cannot be used to map sequences to sequences. [S1] The paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. [S1] The method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality. [S1] The method then uses another deep LSTM to decode the target sequence from that vector. [S1] The paper reports a main result on an English to French translation task from the WMT’14 dataset. [S1] The paper reports that the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set. [S1] The paper reports that this BLEU score was penalized on out-of-vocabulary words. [S1] The paper reports that the LSTM did not have difficulty on long sentences. [S1]

Core claims to remember

The paper states that DNNs are powerful models that have achieved excellent performance on difficult learning tasks. [S1] The paper states that DNNs work well whenever large labeled training sets are available. [S1] The paper states that DNNs cannot be used to map sequences to sequences. [S1] The paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. [S1] The paper’s approach uses a multilayered LSTM to map an input sequence into a vector of fixed dimensionality. [S1] The paper’s approach uses another deep LSTM to decode a target sequence from the fixed-dimensional vector. [S1] The paper reports a translation result on WMT’14 English-to-French in which the LSTM achieves a BLEU score of 34.8 on the entire test set. [S1] The paper reports that the BLEU score for the LSTM was penalized on out-of-vocabulary words. [S1] The paper reports that the LSTM did not have difficulty on long sentences. [S1]

Limitations and caveats

The paper reports that the BLEU score of 34.8 on the WMT’14 English-to-French test set was computed with a penalty on out-of-vocabulary words. [S1] The paper reports its main result on an English to French translation task from the WMT’14 dataset. [S1] The paper reports that the BLEU score of 34.8 is measured on the entire test set for that task. [S1] The paper reports that the LSTM did not have difficulty on long sentences, but it does not provide quantitative details about sentence length in the excerpted text. [S1]

How to apply this in study or projects

Read the paper’s statement that DNNs cannot be used to map sequences to sequences, and relate that statement to the paper’s proposed sequence-learning approach. [S1] Follow the described encoder step in which a multilayered LSTM maps an input sequence to a vector of fixed dimensionality. [S1] Follow the described decoder step in which another deep LSTM decodes the target sequence from the fixed-dimensional vector. [S1] Locate the reported WMT’14 English-to-French translation evaluation and note that the paper reports a BLEU score of 34.8 on the entire test set. [S1] Track how the paper reports penalizing BLEU on out-of-vocabulary words when presenting the 34.8 score. [S1] Review the paper’s reported observation that the LSTM did not have difficulty on long sentences, and connect it to the use of LSTMs for sequence learning as stated in the method description. [S1]

Sources

[S1]arxiv.org
Sequence to Sequence Learning with Neural Networks
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Open source Back to article

FAQ

What model architecture does the paper use for sequence-to-sequence mapping?

The paper uses a multilayered LSTM to map the input sequence to a fixed-dimensional vector and another deep LSTM to decode the target sequence from that vector. [S1]

What result does the paper report on WMT’14 English-to-French translation?

The paper reports a BLEU score of 34.8 on the entire test set, and it reports that this BLEU score was penalized on out-of-vocabulary words. [S1]