Some detail of every script can be found in the beginning of the script
This script normalizes Arabic and do some Twitter related normalization e.g. hashtags, numbers, etc.
python normalize_Arabic.pl input_file output_file
Given a text file, calculate frequency of each word in it and replace the least frequent ones (occurring <= 3) with tag
python pruneVocab.py input_file > output
You can also specify your own threshold value. Note that check on threshold value is inclusive of the threshold frequency
python pruneVocab.py input_file 10 > output
python candidatePairs.py file_with_nearest_neighbor