RNN/LSTM、Encoder-Decoder 和 Attention

RNN:循环神经网络

LSTM:long short-term memory

语言模型( language model)定义了自然语言中标记序列的概率分布。

Embedding几乎是在 NLP 任务使用深度学习的标准步骤。我们可以通过 Word2Vec、GloVe 等从未标注数据无监督的学习到词的 Embedding,然后把它用到不同的特定任务中。这种方法得到的 Embedding 叫作预训练的 (pretrained)Embedding。如果特定任务训练数据较多,那么我们可以用预训练的 Embedding 来初始化模型的 Embedding,然后用特定任务的监督数据来 fine-tuning。如果监督数据较少,我们可以固定 (fix)Embedding,只让模型学习其它的参数。这也可以看成一种 Transfer Learning。

Transformer

BERT stands for “Bidirectional Encoder Representation with Transformers”. To put it in simple words BERT extracts patterns or representations from the data or word embeddings by passing it through an encoder. The encoder itself is a transformer architecture that is stacked together. It is a bidirectional transformer which means that during training it considers the context from both left and right of the vocabulary to extract patterns or representations.

BERT uses two training paradigms: Pre-training and Fine-tuning.

During pre-training, the model is trained on a large dataset to extract patterns. This is generally an unsupervised learning task where the model is trained on an unlabelled dataset like the data from a big corpus like Wikipedia.

During fine-tuning the model is trained for downstream tasks like Classification, Text-Generation, Language Translation, Question-Answering, and so forth. Essentially, you can download a pre-trained model and then Transfer-learn the model on your data.

Context-free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary. For example, the word “bank” would have the same representation in “bank deposit” and in “riverbank”. Contextual models instead generate a representation of each word that is based on the other words in the sentence.

https://www.youtube.com/c/VenelinValkovBG/videos