Spam Classification – Bayesian Models in scikit-learn

All of my findings can be viewed in this jupyter notebook.

###--Concepts-- Machine Learning
Bayesian Models
Feature Extraction and Selection

###--Libraries in use-- Pandas
Scikit-learn

In this exercise I used scikit-learn to create and train a Bayesian classifier to discern spam from ham (non-spam emails). I used pandas to import and manage the data, which came from the from UCI Machine Learning Repository. The classifier that was trained on all of the data had an R² score of 0.78. Intent on finding a more reliable model, I tried linear regression, and more Bayesian models with different combinations of features. The most reliable method involves a few of the most significant word counts and a character count for the '!' symbol. This model performed with an R² score of 0.88.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
Spambase.ipynb		Spambase.ipynb
requirements.txt		requirements.txt
spambase.py		spambase.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spam Classification – Bayesian Models in scikit-learn

About

Uh oh!

Releases

Packages

Languages

katjackson/spambase

Folders and files

Latest commit

History

Repository files navigation

Spam Classification – Bayesian Models in scikit-learn

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages