-
Takes some arguments as wikipedia article titles and save their contents into
pagesfolder. -
Takes
RANDOM $numberto save$numberwikipedia article(s).
python .\wikipedia-reader.py "Project I.G.I." "Dota 2"
python .\wikipedia-reader.py RANDOM 5
Get files in pages directory and after cleaning and filtering, make dict(dict({float, float, list, int})) model which is word, documnet name and {TF_IDF, normalized TF-IDF, positional index, count}.
- For changing to boolean retrival run file with arg
booleanandvectorfor vector space model(default).
stopwords and punkt from nltk is required.
import nltk
nltk.download(['punkt','stopwords'])
- Takes query and search in models created in
retrieval.py. (retrieval_inverted_index.dictanddoc_names.list) - Also takes keywords (
AND,OR,WITH,NEARandNOT) for boolean retrieval. - For changing to boolean retrival run file with arg
booleanandvectorfor vector space model(default).
Search: valve dota
Search: valve AND dota
Search: valve OR NOT dota
Search: valve NEAR5 dota
Search: valve WITH dota
Search: valve NOT WITH dota
Search: valve NOT NEAR10 dota
Search: \exit