POS tagging using the Viterbi algorithm and n-gram models.
You can easily run this code on google colab by just clicking this badge
the dataset is a train set and the test set we create the validation set from the train set for validating our model.
here is part of the train set frame :
here is part of the test set frame :
here is our formula we need
so for calculating the P(T|W) we should calculate the emission and transition model.
the P(W|T) is the emission section and the P(Ti) is the transition section calculated by ngram
here is different accuracy from different N-gram model we can compare them together :
Uni-gram :
Accuracy: 25.5772
Precision: 40.7354
Recall: 40.7354
F1-score: 40.7354
Bigram :
Accuracy: 92.4962
Precision: 96.1019
Recall: 96.1019
F1-score: 96.1019
Trigram :
Accuracy: 92.3538
Precision: 96.0249
Recall: 96.0249
F1-score: 96.0249
This project is licensed under the MIT License.