Skip to content

katjackson/spambase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Classification – Bayesian Models in scikit-learn

All of my findings can be viewed in this jupyter notebook.

###--Concepts-- Machine Learning
Bayesian Models
Feature Extraction and Selection

###--Libraries in use-- Pandas
Scikit-learn

In this exercise I used scikit-learn to create and train a Bayesian classifier to discern spam from ham (non-spam emails). I used pandas to import and manage the data, which came from the from UCI Machine Learning Repository. The classifier that was trained on all of the data had an R2 score of 0.78. Intent on finding a more reliable model, I tried linear regression, and more Bayesian models with different combinations of features. The most reliable method involves a few of the most significant word counts and a character count for the '!' symbol. This model performed with an R2 score of 0.88.

About

Iron Yard homework - machine learning for spam classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published