Contents
- What is BERT?
- Model architecture
- Data Preparation
- Visualization
- Load BERT model
- Tokenize vocabulary
- Preprocess data
- Create custom model
- Call model
- Model summary
- Plot model
- Compile model
- Train the model
- Print history
- Visualize loss and accuracy
- Evaluate_the_model
- Predict
- Compute classification report
- Predict intent with new sentences
Bidirectional Encoder Representations from Transformers (BERT) is a technique for NLP (Natural Language Processing) pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google.
BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a re-sult, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a widerange of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful.
BERT won the Best Long Paper Award at the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). On October 25, 2019, Google Search announced that they had started applying BERT models for English language search queries within the US. On December 9, 2019, it was reported that BERT had been adopted by Google Search for over 70 languages.
BERT is a multi-layer bidirectional Transformer encoder. There are two models introduced in the paper. BERT denote the number of layers(i.e., Transformer blocks) as L, the hidden size as H, and the number of self-attention heads as A.
BERT base – L=12, H=768, A=12, Total Parameters=110M
BERT Large – L=24, H=1024, A=16, Total Parameters=340M
import pandas as pd
import numpy as np
!ls /home/jupyter-thakur/xv-shared-folders/training/input/intent-recognition
inputFolder = '/home/jupyter-thakur/xv-shared-folders/training/input/intent-recognition/'
inputFolder
pandas.read_csv() Read a comma-separated values (csv) file into DataFrame.
train = pd.read_csv(inputFolder + "train.csv")
valid = pd.read_csv(inputFolder + "valid.csv")
test = pd.read_csv(inputFolder + "test.csv")
print(f"train: {train.shape} \n{train.head()}" )
print(f"\nvalid: {valid.shape} \n{valid.head()}" )
print(f"\ntest: {test.shape} \n{test.head()}" )
train = pd.concat([train, valid], ignore_index = True)
#print(train.shape)
print(f"train: {train.shape} \n{train.head()}" )
pandas.unique(values) - Hash table-based unique. Uniques are returned in order of appearance. This does NOT sort.
#print the unique intents
train.intent.unique()
DataFrame.value_counts(subset=None, normalize=False, sort=True, ascending=False) - Return a Series containing counts of unique rows in the DataFrame.
#print the count of intent
train.intent.value_counts()
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
plt.figure(figsize = (12, 8))
chart = sns.countplot(x = 'intent', data = train, palette='Set1')
chart.set_xticklabels(chart.get_xticklabels(), rotation = 30, horizontalalignment='right', fontweight='light', fontsize='medium')
chart.set_title('Intent Distribution', fontsize = 18)
chart.set_xlabel('Intents', fontsize = 14)
chart.set_ylabel('Counts', fontsize = 14)
plt.show()
plt.figure(figsize = (12, 8))
data = train.intent.value_counts()
explode = (0.1, 0, 0, 0, 0, 0, 0)
ax = data.plot.pie(autopct = '%1.1f%%', labels = data.index, explode = explode, fontsize = 14)
ax.set_title('Intent Distribution', fontsize = 18)
plt.axis('off')
ax.legend(labels = data.index, loc = "upper left", fontsize = 14, fancybox = True,
labelspacing = 1, framealpha = 1, shadow=True, borderpad=1)
plt.show()
We can download bert model and all files related to bert model from below link.
We are going to use 12/768 (BERT-Base) model, which can be downloaded from this link. https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-12_H-768_A-12.zip
BERT-Base has Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith. The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for our task.
We can download all 24 from here https://github.com/google-research/bert
We can get it from TensorFlow Hub also https://tfhub.dev/google/collections/bert/1
!ls /home/jupyter-thakur/xv-shared-folders/training/input/uncased_L-12_H-768_A-12
!cat /home/jupyter-thakur/xv-shared-folders/training/input/uncased_L-12_H-768_A-12/bert_config.json
import os
modelInputFolder = '/home/jupyter-thakur/xv-shared-folders/training/input/'
bert_model_name="uncased_L-12_H-768_A-12"
bert_ckpt_dir = os.path.join(modelInputFolder, bert_model_name)
bert_ckpt_file = os.path.join(bert_ckpt_dir, "bert_model.ckpt")
bert_config_file = os.path.join(bert_ckpt_dir, "bert_config.json")
print(bert_ckpt_dir)
print(bert_ckpt_file)
print(bert_config_file)
!head ~/xv-shared-folders/training/input/uncased_L-12_H-768_A-12/vocab.txt
vocab_file = os.path.join(bert_ckpt_dir, "vocab.txt")
print(vocab_file)
from bert.tokenization.bert_tokenization import FullTokenizer
Tokenization is the process of dividing text into pieces such as words, keywords, phrases, symbols and other elements. These pieces are called tokens.
tokenizer = FullTokenizer(vocab_file)
print(tokenizer)
tokenizer.convert_tokens_to_ids: converts a string in a sequence of ids (integer), using the tokenizer.
tokens = tokenizer.tokenize("Hello, How are you?")
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print(tokens)
print(token_ids)
classes = train.intent.unique().tolist()
print(classes)
class IntentDataManager:
def __init__(self):
print("IntentDataManager class is called")
pass
pass
#class IntentDataManager:
data = IntentDataManager()
print(data)
Pass train, test, tokenizer, classes and max_seq_len as inputs of class IntentDataManager
max_seq_len = 192
class IntentDataManager:
def __init__(self, train, test, tokenizer: FullTokenizer, classes, max_seq_len):
#declare tokenizer and classes as a class members
self.tokenizer = tokenizer
self.classes = classes
print(f"train shape: {train.shape}")
print(f"test shape: {test.shape}")
print(f"tokenizer: {self.tokenizer}")
print(f"intent_classes: {self.classes}")
print(f"max_seq_len: {max_seq_len}")
pass
pass
#class IntentDataManager:
data = IntentDataManager(train, test, tokenizer, classes, max_seq_len)
print(data)
len_of_text = train['text'].str.len()
print(len_of_text)
Pandas sort_values() function sorts a data frame in Ascending or Descending order of passed Column.
#sort text by length
sorted_indexes = len_of_text.sort_values()
print(sorted_indexes)
#Sort length of text by index
sorted_indexes = len_of_text.sort_values().index
print(sorted_indexes)
#pass dataframe as input in lambda function
#sort values by length of text index
#and return the new index of sorted_indexes
sort_by_length_text = lambda input_df: input_df.reindex(
input_df['text'].str.len().sort_values().index
)
print(sort_by_length_text)
class IntentDataManager:
def __init__(self, train, test, tokenizer: FullTokenizer, classes, max_seq_len):
#declare tokenizer and classes as a class members
self.tokenizer = tokenizer
self.classes = classes
'''
print(f"train shape: {train.shape}")
print(f"test shape: {test.shape}")
print(f"tokenizer: {self.tokenizer}")
print(f"intent_classes: {self.classes}")
print(f"max_seq_len: {max_seq_len}")
'''
#sort train and test data by length of text
train, test = map(sort_by_length_text, [train, test])
print(f"train shape: {train.shape} \n\n {train.head()}")
print(f"\n\ntest shape: {test.shape} \n\n {test.head()}")
pass
pass
#class IntentDataManager:
data = IntentDataManager(train, test, tokenizer, classes, max_seq_len)
print(data)
To use a pre-trained BERT model, we need to convert the input data into an appropriate format so that each sentence can be sent to the pre-trained model to obtain the corresponding embedding.
BERT embeddings are trained with two training tasks:
1. Classification Task: to determine which category the input sentence should fall into
2. Next Sentence Prediction Task: to determine if the second sentence naturally follows the first sentence.
The [CLS] and [SEP] Tokens:
For the classification task, a single vector representing the whole input sentence is needed to be fed to a classifier. In BERT, the decision is that the hidden state of the first token is taken to represent the whole sentence. To achieve this, an additional token has to be added manually to the input sentence. The token [CLS] is chosen for this purpose.
In the "next sentence prediction" task, we need to inform the model where does the first sentence end, and where does the second sentence begin. Hence, another artificial token, [SEP], is introduced. If we are trying to train a classifier, each input sample will contain only one sentence (or a single text input). In that case, the [SEP] token will be added to the end of the input text.
In summary, to preprocess the input text data, the first thing we will have to do is to add the [CLS] token at the beginning, and the [SEP] token at the end of each input text.
class IntentDataManager:
def __init__(self, train, test, tokenizer: FullTokenizer, classes, max_seq_len):
#declare tokenizer and classes as a class members
self.tokenizer = tokenizer
self.classes = classes
self.max_seq_len = 0
#sort train and test data by length of text
train, test = map(sort_by_length_text, [train, test])
#call preprocessData function
(train_X, train_y), (test_X, test_y) = map(self.preprocessData, [train, test])
print(f"train_X shape: {train_X.shape}")
print(f"train_y shape: {train_y.shape}")
print(f"\ntrain_X: \n{train_X[:5]}")
print(f"\ntrain_y: \n{train_y[:5]}")
print(f"test_X shape: {test_X.shape}")
print(f"test_y shape: {test_y.shape}")
print(f"\ntest_X: \n{test_X[:5]}")
print(f"\ntest_y: \n{test_y[:5]}")
pass
def preprocessData(self, df):
x, y = [], []
for idx, row in df[:5].iterrows():
text = row['text']
label = row['intent']
#convert text to tokens
tokens = self.tokenizer.tokenize(text)
tokens = ["[CLS]"] + tokens + ["[SEP]"]
print(f"tokens = {tokens}")
#convert tokens to ids
token_ids = self.tokenizer.convert_tokens_to_ids(tokens)
print(f"token_ids = {token_ids}")
#append tokens_ids to x
x.append(token_ids)
#get maxmium sequence length
self.max_seq_len = max(self.max_seq_len, len(token_ids))
print(f"max_seq_len = {self.max_seq_len}")
print(f"classes = {self.classes}")
print(f"label = {label}")
#get index of class label
class_label_index = self.classes.index(label)
print(f"class_label_index = {class_label_index}")
#append index of class label to y
y.append(class_label_index)
pass
arrX = np.array(x)
arrY = np.array(y)
print(f"\narrX = {arrX}")
print(f"\narrY = {arrY}")
return arrX, arrY
pass
pass
#class IntentDataManager:
data = IntentDataManager(train, test, tokenizer, classes, max_seq_len)
print(data)
The BERT model receives a fixed length of sentence as input. If sentences are shorter than maximum length, we will have to add paddings to the sentences to make up the length.
class IntentDataManager:
def __init__(self, train, test, tokenizer: FullTokenizer, classes, max_seq_len):
#declare tokenizer and classes as a class members
self.tokenizer = tokenizer
self.classes = classes
self.max_seq_len = 0
#sort train and test data by length of text
train, test = map(sort_by_length_text, [train, test])
#call preprocessData function
(train_X, train_y), (test_X, test_y) = map(self.preprocessData, [train, test])
'''
print(f"train_X shape: {train_X.shape}")
print(f"train_y shape: {train_y.shape}")
print(f"\ntrain_X: \n{train_X[:5]}")
print(f"\ntrain_y: \n{train_y[:5]}")
print(f"test_X shape: {test_X.shape}")
print(f"test_y shape: {test_y.shape}")
print(f"\ntest_X: \n{test_X[:5]}")
print(f"\ntest_y: \n{test_y[:5]}")
'''
print(f"\nmax_seq_len = {self.max_seq_len}")
#pad x and y to max_seq_len
train_X = self.padSequences(train_X)
test_X = self.padSequences(test_X)
print(f"\ntrain_X: \n{train_X[:5]}")
print(f"\ntest_X: \n{test_X[:5]}")
pass
def preprocessData(self, df):
x, y = [], []
for idx, row in df[:5].iterrows():
text = row['text']
label = row['intent']
#convert text to tokens
tokens = self.tokenizer.tokenize(text)
tokens = ["[CLS]"] + tokens + ["[SEP]"]
#convert tokens to ids
token_ids = self.tokenizer.convert_tokens_to_ids(tokens)
#append tokens_ids to x
x.append(token_ids)
#get maxmium sequence length
self.max_seq_len = max(self.max_seq_len, len(token_ids))
#get index of class label
class_label_index = self.classes.index(label)
#append index of class label to y
y.append(class_label_index)
pass
arrX = np.array(x)
arrY = np.array(y)
return arrX, arrY
pass
def padSequences(self, arr):
#print("arr", arr)
newArr = []
for item in arr:
#print("item", item)
#calculate the shortfall of sequence length
shortfall = self.max_seq_len - len(item)
#add zero to shortfall
item = item + [0] * (shortfall)
#print(newItem)
newArr.append(item)
pass
return np.array(newArr)
pass
pass
#class IntentDataManager:
data = IntentDataManager(train, test, tokenizer, classes, max_seq_len)
print(data)
class IntentDataManager:
def __init__(self, train, test, tokenizer: FullTokenizer, classes, max_seq_len):
#declare tokenizer and classes as a class members
self.tokenizer = tokenizer
self.classes = classes
self.max_seq_len = 0
#sort train and test data by length of text
train, test = map(sort_by_length_text, [train, test])
#call preprocessData function
(self.train_X, self.train_y), (self.test_X, self.test_y) = map(self.preprocessData, [train, test])
self.max_seq_len = min(self.max_seq_len, max_seq_len)
self.train_X, self.test_X = map(self.padSequences, [self.train_X, self.test_X])
pass
def preprocessData(self, df):
x, y = [], []
for idx, row in df.iterrows():
text = row['text']
label = row['intent']
#convert text to tokens
tokens = self.tokenizer.tokenize(text)
tokens = ["[CLS]"] + tokens + ["[SEP]"]
#convert tokens to ids
token_ids = self.tokenizer.convert_tokens_to_ids(tokens)
#append tokens_ids to x
x.append(token_ids)
#get maxmium sequence length
self.max_seq_len = max(self.max_seq_len, len(token_ids))
#get index of class label
class_label_index = self.classes.index(label)
#append index of class label to y
y.append(class_label_index)
pass
arrX = np.array(x)
arrY = np.array(y)
return arrX, arrY
pass
def padSequences(self, arr):
#print("arr", arr)
newArr = []
for item in arr:
#print("item", item)
#calculate the shortfall of sequence length
shortfall = self.max_seq_len - len(item)
#add zero to shortfall
#zerosToAdd = np.zeros(shortfall, dtype = np.int32)
#newItem = np.append(item, zerosToAdd)
item = item + [0] * (shortfall)
#print(newItem)
newArr.append(np.array(item))
pass
return np.array(newArr)
pass
pass
#class IntentDataManager:
data = IntentDataManager(train, test, tokenizer, classes, max_seq_len)
#print(data)
data.train_X.shape
print(data.train_X[0])
print(data.test_X[0])
import tensorflow as tf
from bert.loader import StockBertConfig, map_stock_config_to_params
from bert import BertModelLayer
from tensorflow import keras
from bert.loader import load_stock_weights
#create customModel function
def customModel():
print("Custom model")
pass
model = customModel()
print(model)
#create custom_model function
def customModel(max_seq_len,
bert_config_file,
bert_ckpt_file):
#read config file with special reader tf.io.gfile.GFile
with tf.io.gfile.GFile(bert_config_file, "r") as reader:
#read data as json string
customConfig = StockBertConfig.from_json_string(reader.read())
print(f"customConfig = {customConfig}")
#load all params for our model
#If params not in customConfig, defauls value is used
bert_params = map_stock_config_to_params(customConfig)
print(f"\nbert_params = {bert_params}")
#print(f"\nbert_params.adapter_size = {bert_params.adapter_size}")
bert_params.adapter_size = None
print(f"\nbert_params.adapter_size = {bert_params.adapter_size}")
pass
pass
model = customModel(data.max_seq_len,
bert_config_file,
bert_ckpt_file)
print(model)
#create custom_model function
def customModel(max_seq_len,
bert_config_file,
bert_ckpt_file):
#read config file with special reader tf.io.gfile.GFile
with tf.io.gfile.GFile(bert_config_file, "r") as reader:
#read data as json string
customConfig = StockBertConfig.from_json_string(reader.read())
print(f"customConfig = {customConfig}")
#load all params for our model
#If params not in customConfig, defauls value is used
bert_params = map_stock_config_to_params(customConfig)
print(f"\nbert_params = {bert_params}")
#print(f"\nbert_params.adapter_size = {bert_params.adapter_size}")
bert_params.adapter_size = None
print(f"\nbert_params.adapter_size = {bert_params.adapter_size}")
#create bert layer
bert_layer = BertModelLayer.from_params(bert_params, name="bert_layer")
print(f"\nbert_layer = {bert_layer}")
pass
pass
model = customModel(data.max_seq_len,
bert_config_file,
bert_ckpt_file)
print(model)
#create customModel
def customModel(max_seq_len,
bert_config_file,
bert_ckpt_file):
#create input layer
input_layer = keras.layers.Input(
shape=(max_seq_len, ),
dtype='int32',
name="input_layer")
#read config file with special reader tf.io.gfile.GFile
with tf.io.gfile.GFile(bert_config_file, "r") as reader:
#read data as json string
customConfig = StockBertConfig.from_json_string(reader.read())
print(f"customConfig = {customConfig}")
#load all params for our model
#If params not in customConfig, defauls value is used
bert_params = map_stock_config_to_params(customConfig)
print(f"\nbert_params = {bert_params}")
#print(f"\nbert_params.adapter_size = {bert_params.adapter_size}")
bert_params.adapter_size = None
print(f"\nbert_params.adapter_size = {bert_params.adapter_size}")
#create bert layer
bert_layer = BertModelLayer.from_params(bert_params, name="bert_layer")
print(f"\nbert_layer = {bert_layer}")
pass
#process input through bert_layer
bert_output = bert_layer(input_layer)
print(f"bert shape = {bert_output.shape}")
#create model with all layers
custom_model = keras.Model(inputs = input_layer, outputs = bert_output)
custom_model.build(input_shape = (None, max_seq_len))
#load weights
load_stock_weights(bert_layer, bert_ckpt_file)
return custom_model
pass
pass
model = customModel(data.max_seq_len,
bert_config_file,
bert_ckpt_file)
print(model)
max_seq_len = 192
#create customModel
def customModel(max_seq_len,
bert_config_file,
bert_ckpt_file):
#create input layer
input_layer = keras.layers.Input(
shape=(max_seq_len, ),
dtype='int32',
name="input_layer")
#read config file with special reader tf.io.gfile.GFile
with tf.io.gfile.GFile(bert_config_file, "r") as reader:
#read data as json string
customConfig = StockBertConfig.from_json_string(reader.read())
print(f"customConfig = {customConfig}")
#load all params for our model
#If params not in customConfig, defauls value is used
bert_params = map_stock_config_to_params(customConfig)
print(f"\nbert_params = {bert_params}")
#print(f"\nbert_params.adapter_size = {bert_params.adapter_size}")
bert_params.adapter_size = None
print(f"\nbert_params.adapter_size = {bert_params.adapter_size}")
#create bert layer
bert_layer = BertModelLayer.from_params(bert_params, name="bert_layer")
print(f"\nbert_layer = {bert_layer}")
pass
#process input through bert_layer
bert_output = bert_layer(input_layer)
print(f"bert shape = {bert_output.shape}")
#add hidden layer1
hidden_output1 = keras.layers.Lambda(lambda seq: seq[:, 0, :])(bert_output)
print(f"hidden_output1 = {hidden_output1.shape}")
#dropout layer 1
dropout_1 = keras.layers.Dropout(0.5)(hidden_output1)
print(f"dropout_output1 = {hidden_output1.shape}")
#add hidden layer2
hidden_output2 = keras.layers.Dense(units=768, activation="tanh")(dropout_1)
#print(f"hidden_output2 = {hidden_output2.shape}")
#dropout layer 2
dropout_2 = keras.layers.Dropout(0.5)(hidden_output2)
print(f"dropout_output2 = {hidden_output2.shape}")
final_output = keras.layers.Dense(units=len(classes), activation="softmax")(dropout_2)
print(f"final_output = {final_output.shape}")
#create model with all layers
model = keras.Model(inputs = input_layer, outputs = final_output)
model.build(input_shape = (None, max_seq_len))
load_stock_weights(bert_layer, bert_ckpt_file)
return model
pass
pass
model = customModel(data.max_seq_len,
bert_config_file,
bert_ckpt_file)
print(model)
#create customModel
def customModel(max_seq_len,
bert_config_file,
bert_ckpt_file):
#create input layer
input_layer = keras.layers.Input(
shape=(max_seq_len, ),
dtype='int32',
name="input_layer")
#read config file with special reader tf.io.gfile.GFile
with tf.io.gfile.GFile(bert_config_file, "r") as reader:
#read data as json string
customConfig = StockBertConfig.from_json_string(reader.read())
#load all params for our model
#If params not in customConfig, defauls value is used
bert_params = map_stock_config_to_params(customConfig)
#print(f"\nbert_params.adapter_size = {bert_params.adapter_size}")
bert_params.adapter_size = None
#create bert layer
bert_layer = BertModelLayer.from_params(bert_params, name="bert_layer")
pass
#process input through bert_layer
bert_output = bert_layer(input_layer)
#add hidden layer1
hidden_output1 = keras.layers.Lambda(lambda seq: seq[:, 0, :])(bert_output)
#dropout layer 1
dropout_1 = keras.layers.Dropout(0.5)(hidden_output1)
#add hidden layer2
hidden_output2 = keras.layers.Dense(units=768, activation="tanh")(dropout_1)
#dropout layer 2
dropout_2 = keras.layers.Dropout(0.5)(hidden_output2)
final_output = keras.layers.Dense(units=len(classes), activation="softmax")(dropout_2)
#create model with all layers
model = keras.Model(inputs = input_layer, outputs = final_output)
model.build(input_shape = (None, max_seq_len))
load_stock_weights(bert_layer, bert_ckpt_file)
return model
pass
pass
model = customModel(data.max_seq_len, bert_config_file, bert_ckpt_file)
print(model)
model.summary()
from tensorflow.keras.utils import plot_model
plot_model(model, to_file='bert_model.png')
model.compile(
optimizer=keras.optimizers.Adam(1e-5),
# Loss function to minimize
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
# List of metrics to monitor
metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")]
)
fit() function will train the model by slicing the data into "batches" of size "batch_size", and repeatedly iterating over the entire dataset for a given number of "epochs".
x = data.train_X
print(x[0])
y = data.train_y
print(y[:5])
history = model.fit(
x,
y,
validation_split = 0.1,
batch_size = 16,
shuffle = True,
epochs = 5
)
history.history
plt.figure(figsize = (10, 6))
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend(['train', 'test'])
plt.title('Loss during training')
plt.show();
plt.figure(figsize = (10, 6))
plt.plot(history.history['acc'], label = 'train')
plt.plot(history.history['val_acc'], label = 'test')
plt.legend(['train', 'test'])
plt.title('Accuracy during training')
plt.show();
Evaluate the model on the test data by evaluate()
train_loss, train_accuracy = model.evaluate(data.train_X, data.train_y)
test_loss, test_accuracy = model.evaluate(data.test_X, data.test_y, batch_size = 16)
print("train_loss, train_accuracy:", train_accuracy)
print("test_loss, test_accuracy:", test_accuracy)
#predict test data
y_pred = model.predict(data.test_X).argmax(axis = -1)
y_pred.shape
y_pred[:10]
for label in y_pred[:10]:
print(classes[label])
pass
from sklearn.metrics import classification_report
print(classification_report(data.test_y, y_pred, target_names = classes))
sentences = [
"Play party song",
"Dance song",
"How is weather today"
]
#tokenize sentences
tokens = map(tokenizer.tokenize, sentences)
#add [CLS] and [SEP] Tokens
tokens = map(lambda token: ["[CLS]"] + token + ["[SEP]"], tokens)
#convert each tokens to ids
token_ids = list(map(tokenizer.convert_tokens_to_ids, tokens))
#add padding
token_ids = map(lambda tids: tids + [0] * (data.max_seq_len-len(tids)), token_ids)
token_ids = np.array(list(token_ids))
#predict
predictions = model.predict(token_ids).argmax(axis = -1)
for text, label in zip(sentences, predictions):
print("Text:", text, "\nIntent:", classes[label])
print()