Distributed under the Eclipse Public License, the same as Clojure.

A Storm project experiment with VowPal wabbit and the twitter streaming api

Usage

Be sure to read up on Storm first:

http://storm-project.net/ https://github.com/nathanmarz/storm/ https://github.com/nathanmarz/storm/wiki

This is my storm playground experimenting with Machine learning in Storm with VowPal wabbit.

One stream of bolt predicts sentiment using vowpal wabbit regression named sent.model.

Another stream computes if the user is a bot or human using some arbitrary calculation (note this is just an experiment in making bolts compute several functions at once rather than an accurate way of measuring bots). Finally the two streams are aggregated and the output shows the sentiment together with the bot or human feature of a user.

the aclImdb corpus is used, the original training and test data can be found in ~/storm-ml-play/vwtraining/aclImdb.tar.gz

A quick and dirty script turned the original training data into an input that vowPal wabbit will accept, see ~/storm-ml-play/vwtraining/train.vw and test.vw for the training and test data. the sentiment analysis model can be found ~/storm-ml-play/vwtraining/sent.model

*** I know there are a lot of bolts but I am playing around with large vs small functions to see if it makes a difference to storm's speed.

To run on a local cluster:

lein run -m storm-ml-play.topology/run!
# OR
lein run -m storm-ml-play.topology/run! debug false workers 10

To run on a distributed cluster:

lein uberjar
# copy jar to nimbus, and then on nimbus:
bin/storm jar path/to/uberjar.jar storm-ml-play.TopologySubmitter workers 30 debug false

or use [storm-deploy](https://github.com/nathanmarz/storm-deploy/wiki)

License

Distributed under the Eclipse Public License, the same as Clojure.

storm-ml-play

bfd8d749302d344b404a122e7205fce79a879958

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
doc		doc
src/storm_ml_play		src/storm_ml_play
test/storm_ml_play		test/storm_ml_play
vwtraining		vwtraining
.gitignore		.gitignore
README.md		README.md
README.md2		README.md2
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Usage

License

Distributed under the Eclipse Public License, the same as Clojure.

storm-ml-play

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

yods/storm-ml-play

Folders and files

Latest commit

History

Repository files navigation

Usage

License

Distributed under the Eclipse Public License, the same as Clojure.

storm-ml-play

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages