174 view times

# Sequence model

eg of sequence data:

One way to make NN model of word input is a classic MLP. We coulde map every word location in the dictionary into a sparse matrix by onehot. But there are two problems: Every word have different length or say scale, which would decrease model performance. 2 MLP could not learn relationship, some hidden share info between different word.

## Recurrent NN

The images in the left and right is different interpretation but same meaning. RNN just throw the word $$x^i$$into multiple layers and get prediction output $$y^i$$ extracted feature $$a^i$$, which would be input into next multiple NN. At the begining, researcher usually input random vector or zero vector as feature$$a^0$$. One shortcoming of RNN is that when it makes prediction from $$x^i$$, it only consider the information(word) before $$x^i$$ and ignore the information $$x^{i+1}…x^n$$. However, when we analyze a sentence, the words from whole sentence around located word should be included.

$$W_{aa},W_{ax}$$ could be add togerther so as to vectorize and speed up the algorthim.为什么numpy的向量化能加速，这里简单说一点。numpy里所有东西用c编写，同样一句话用python这类解释性语言写里面的指令会多很多。

RNN architecture: many-to-many(eg: input and output consist of many words{translation of a sentence}), and many-to-one(eg: input consist of many word but ouput is a number{ review(1-5) from a review sentence}),one-to-one,one-to-many(eg:music generation)

## Model building

training

1 tokenize sentence.(form a vocabulary by mapping each words to one-hot vector, and then end of sentence(EOS){eg:punctuation . as indication(label) of a sentence )

2 build up the architecture in the figure and then input trainingset and labels.

3 building up loss function of each labels and sum them together.

sampling

4 :the differece of training step is that $$y^1….y^n$$ is not a sentence word from training set but a generated words from feature $$a^i$$ in early layers

5 set a length threshold to end or EOS to end