All of my findings can be viewed in this jupyter notebook.
###--Concepts--
Machine Learning
Bayesian Models
Feature Extraction and Selection
###--Libraries in use--
Pandas
Scikit-learn
In this exercise I used scikit-learn to create and train a Bayesian classifier to discern spam from ham (non-spam emails). I used pandas to import and manage the data, which came from the from UCI Machine Learning Repository. The classifier that was trained on all of the data had an R2 score of 0.78. Intent on finding a more reliable model, I tried linear regression, and more Bayesian models with different combinations of features. The most reliable method involves a few of the most significant word counts and a character count for the '!' symbol. This model performed with an R2 score of 0.88.