Skip to content

AsadiAhmad/POS-Tagging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

POS-Tagging

POS tagging using the Viterbi algorithm and n-gram models.

Tech 🛠️ Languages and Tools :

Python  Jupyter Notebook  Google Colab  Request  Numpy  Polars  MatPlotLib  seaborn 

Run the Notebook on Google Colab

You can easily run this code on google colab by just clicking this badge Open In Colab

Dataset

the dataset is a train set and the test set we create the validation set from the train set for validating our model.

here is part of the train set frame :

here is part of the test set frame :

Formula

here is our formula we need

$$P(T|W) = argmax_T P(W|T)P(T_i) \quad \text{(Unigram)}$$ $$P(T|W) = argmax_T P(W|T)P(T_i|T_{i-1}) \quad \text{(Bigram)}$$ $$P(T|W) = argmax_T P(W|T)P(T_i|T_{i-1},T_{i-2}) \quad \text{(Trigram)}$$

so for calculating the P(T|W) we should calculate the emission and transition model.
the P(W|T) is the emission section and the P(Ti) is the transition section calculated by ngram

Conclusion

here is different accuracy from different N-gram model we can compare them together :

Uni-gram :

Accuracy: 25.5772
Precision: 40.7354
Recall: 40.7354
F1-score: 40.7354

Bigram :

Accuracy: 92.4962
Precision: 96.1019
Recall: 96.1019
F1-score: 96.1019

Trigram :

Accuracy: 92.3538
Precision: 96.0249
Recall: 96.0249
F1-score: 96.0249

License

This project is licensed under the MIT License.

About

POS tagging using the Viterbi algorithm and n-gram models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published