Tweeter occupation prediction using an ngram model.
To use the tokenizer:
- import the common_descriptors.py file
- instantiate Tokenizer
- feed it lots of tweets so that it creates a reasonably-sized histogram.
- call end_feeding and start using it to tokenize.