677 view times

# Emojify!

Welcome to the second assignment of Week 2. You are going to use word vector representations to build an Emojifier.

Have you ever wanted to make your text messages more expressive? Your emojifier app will help you do that.
So rather than writing:

“Congratulations on the promotion! Let’s get coffee and talk. Love you!”

The emojifier can automatically turn this into:

“Congratulations on the promotion! 👍 Let’s get coffee and talk. ☕️ Love you! ❤️”

• You will implement a model which inputs a sentence (such as “Let’s go see the baseball game tonight!”) and finds the most appropriate emoji to be used with this sentence (⚾️).

#### Using word vectors to improve emoji lookups

• In many emoji interfaces, you need to remember that ❤️ is the “heart” symbol rather than the “love” symbol.
• In other words, you’ll have to remember to type “heart” to find the desired emoji, and typing “love” won’t bring up that symbol.
• We can make a more flexible emoji interface by using word vectors!
• When using word vectors, you’ll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate additional words in the test set to the same emoji.
• This works even if those additional words don’t even appear in the training set.
• This allows you to build an accurate classifier mapping from sentences to emojis, even using a small training set.

#### What you’ll build

1. In this exercise, you’ll start with a baseline model (Emojifier-V1) using word embeddings.
2. Then you will build a more sophisticated model (Emojifier-V2) that further incorporates an LSTM.

## 1 – Baseline model: Emojifier-V1

### 1.1 – Dataset EMOJISET

Let’s start by building a simple baseline classifier.

You have a tiny dataset (X, Y) where:

• X contains 127 sentences (strings).
• Y contains an integer label between 0 and 4 corresponding to an emoji for each sentence.

Let’s load the dataset using the code below. We split the dataset between training (127 examples) and testing (56 examples).

### 1.2 – Overview of the Emojifier-V1

In this part, you are going to implement a baseline model called “Emojifier-v1”.

#### Inputs and outputs

• The input of the model is a string corresponding to a sentence (e.g. “I love you).
• The output will be a probability vector of shape (1,5), (there are 5 emojis to choose from).
• The (1,5) probability vector is passed to an argmax layer, which extracts the index of the emoji with the highest probability.

### 1.3 – Implementing Emojifier-V1

As shown in Figure 2 (above), the first step is to:

• Convert each word in the input sentence into their word vector representations.
• Then take an average of the word vectors.
• Similar to the previous exercise, we will use pre-trained 50-dimensional GloVe embeddings.

Run the following cell to load the word_to_vec_map, which contains all the vector representations.

# GRADED FUNCTION: sentence_to_avg

def sentence_to_avg(sentence, word_to_vec_map):
"""
Converts a sentence (string) into a list of words (strings). Extracts the GloVe representation of each word
and averages its value into a single vector encoding the meaning of the sentence.

Arguments:
sentence -- string, one training example from X
word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation

Returns:
avg -- average vector encoding information about the sentence, numpy-array of shape (50,)
"""

### START CODE HERE ###
# Step 1: Split sentence into list of lower case words (≈ 1 line)
words = sentence.lower().split()

# Initialize the average word vector, should have the same shape as your word vectors.
avg = np.zeros((50,))

# Step 2: average the word vectors. You can loop over the words in the list "words".
total = 0
for w in words:
total += word_to_vec_map[w]
avg = total / len(words)

### END CODE HERE ###

return avg


#### Model

You now have all the pieces to finish implementing the model() function.
After using sentence_to_avg() you need to:

• Pass the average through forward propagation
• Compute the cost
• Backpropagate to update the softmax parameters

Exercise: Implement the model() function described in Figure (2).

• The equations you need to implement in the forward pass and to compute the cross-entropy cost are below:
• The variable $$Y_{oh}$$ (“Y one hot”) is the one-hot encoding of the output labels.

$$z^{(i)} = W . avg^{(i)} + b$$

$$a^{(i)} = softmax(z^{(i)})$$

$$\mathcal{L}^{(i)} = – \sum_{k = 0}^{n_y – 1} Y_{oh,k}^{(i)} * log(a^{(i)}_k)$$

Note It is possible to come up with a more efficient vectorized implementation. For now, let’s use nested for loops to better understand the algorithm, and for easier debugging.

We provided the function softmax(), which was imported earlier.

### 2.1 – Overview of the model

Here is the Emojifier-v2 you will implement:

### 2.3 – The Embedding layer

• In Keras, the embedding matrix is represented as a “layer”.
• The embedding matrix maps word indices to embedding vectors.
• The word indices are positive integers.
• The embedding vectors are dense vectors of fixed size.
• When we say a vector is “dense”, in this context, it means that most of the values are non-zero. As a counter-example, a one-hot encoded vector is not “dense.”
• The embedding matrix can be derived in two ways:
• Training a model to derive the embeddings from scratch.
• Using a pretrained embedding

#### Using and updating pre-trained embeddings

• In this part, you will learn how to create an Embedding() layer in Keras
• You will initialize the Embedding layer with the GloVe 50-dimensional vectors.
• In the code below, we’ll show you how Keras allows you to either train or leave fixed this layer.
• Because our training set is quite small, we will leave the GloVe embeddings fixed instead of updating them.

#### Inputs and outputs to the embedding layer

• The Embedding() layer’s input is an integer matrix of size (batch size, max input length).
• This input corresponds to sentences converted into lists of indices (integers).
• The largest integer (the highest word index) in the input should be no larger than the vocabulary size.
• The embedding layer outputs an array of shape (batch size, max input length, dimension of word vectors).
• The figure shows the propagation of two example sentences through the embedding layer.
• Both examples have been zero-padded to a length of max_len=5.
• The word embeddings are 50 units in length.
• The final dimension of the representation is (2,max_len,50).
# GRADED FUNCTION: sentences_to_indices

def sentences_to_indices(X, word_to_index, max_len):
"""
Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.
The output shape should be such that it can be given to Embedding() (described in Figure 4).

Arguments:
X -- array of sentences (strings), of shape (m, 1)
word_to_index -- a dictionary containing the each word mapped to its index
max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this.

Returns:
X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
"""

m = X.shape[0]                                   # number of training examples

### START CODE HERE ###
# Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line)
X_indices = np.zeros((m, max_len))

for i in range(m):                               # loop over training examples

# Convert the ith training sentence in lower case and split is into words. You should get a list of words.
sentence_words =X[i].lower().split()

# Initialize j to 0
j = 0

# Loop over the words of sentence_words
for w in sentence_words:
# Set the (i,j)th entry of X_indices to the index of the correct word.
X_indices[i, j] = word_to_index[w]
# Increment j to j + 1
j = j+1

### END CODE HERE ###

return X_indices


#### Build embedding layer

• Let’s build the Embedding() layer in Keras, using pre-trained word vectors.
• The embedding layer takes as input a list of word indices.
• sentences_to_indices() creates these word indices.
• The embedding layer will return the word embeddings for a sentence.

Exercise: Implement pretrained_embedding_layer() with these steps:

1. Initialize the embedding matrix as a numpy array of zeros.
• The embedding matrix has a row for each unique word in the vocabulary.
• There is one additional row to handle “unknown” words.
• So vocab_len is the number of unique words plus one.
• Each row will store the vector representation of one word.
• For example, one row may be 50 positions long if using GloVe word vectors.
• In the code below, emb_dim represents the length of a word embedding.
2. Fill in each row of the embedding matrix with the vector representation of a word
• Each word in word_to_index is a string.
• word_to_vec_map is a dictionary where the keys are strings and the values are the word vectors.
3. Define the Keras embedding layer.
• Use Embedding().
• The input dimension is equal to the vocabulary length (number of unique words plus one).
• The output dimension is equal to the number of positions in a word embedding.
• Make this layer’s embeddings fixed.
• If you were to set trainable = True, then it will allow the optimization algorithm to modify the values of the word embeddings.
• In this case, we don’t want the model to modify the word embeddings.
4. Set the embedding weights to be equal to the embedding matrix.
• Note that this is part of the code is already completed for you and does not need to be modified.
# GRADED FUNCTION: pretrained_embedding_layer

def pretrained_embedding_layer(word_to_vec_map, word_to_index):
"""
Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.

Arguments:
word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

Returns:
embedding_layer -- pretrained layer Keras instance
"""

vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)

### START CODE HERE ###
# Step 1
# Initialize the embedding matrix as a numpy array of zeros.
# See instructions above to choose the correct shape.
emb_matrix = np.zeros((vocab_len, emb_dim))

# Step 2
# Set each row "idx" of the embedding matrix to be
# the word vector representation of the idx'th word of the vocabulary
for word, idx in word_to_index.items():
emb_matrix[idx, :] = word_to_vec_map[word]

# Step 3
# Define Keras embedding layer with the correct input and output sizes
# Make it non-trainable.
embedding_layer = Embedding(vocab_len, emb_dim, trainable=False)
### END CODE HERE ###

# Step 4 (already done for you; please do not modify)
# Build the embedding layer, it is required before setting the weights of the embedding layer.
embedding_layer.build((None,)) # Do not modify the "None".  This line of code is complete as-is.

# Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
embedding_layer.set_weights([emb_matrix])

return embedding_layer


## 2.3 Building the Emojifier-V2

Lets now build the Emojifier-V2 model.

• You feed the embedding layer’s output to an LSTM network.
# GRADED FUNCTION: Emojify_V2

def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
"""
Function creating the Emojify-v2 model's graph.

Arguments:
input_shape -- shape of the input, usually (max_len,)
word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

Returns:
model -- a model instance in Keras
"""

### START CODE HERE ###
# Define sentence_indices as the input of the graph.
# It should be of shape input_shape and dtype 'int32' (as it contains indices, which are integers).
sentence_indices = Input(input_shape, dtype='int32')

# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)

# Propagate sentence_indices through your embedding layer
# (See additional hints in the instructions).
embeddings = embedding_layer(sentence_indices)

# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
# The returned output should be a batch of sequences.
X = LSTM(128, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X trough another LSTM layer with 128-dimensional hidden state
# The returned output should be a single hidden state, not a batch of sequences.
X = LSTM(128, return_sequences=False)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X through a Dense layer with 5 units
X = Dense(5)(X)
# Add a softmax activation
X = Activation('softmax')(X)

# Create Model instance which converts sentence_indices into X.
model = Model(inputs=sentence_indices, outputs=X)

### END CODE HERE ###

return model