This is a simple Python project to generate word clouds from a PDF or text file using Python. The project leverages various libraries such as pymupdf for PDF processing, nltk for text preprocessing, and wordcloud for generating the word cloud image. The GUI version uses customtkinter and other related packages.
The project script performs the following tasks:
- Extracts text from a PDF or a text file.
- Preprocesses the text by tokenizing, lemmatizing, and removing stopwords.
- Generates a word cloud image using the processed text.
- Saves the word cloud image with a timestamp.
- Provides a GUI for easy interaction and word cloud generation.
The project is structured into 4 main folders i.e. input, output, fonts, colors. Don't delete this folders if you are using the command line.
Miniconda is a minimal installer for conda, a package manager, and an environment management system. To install Miniconda, follow these steps:
- Download the Miniconda installer for your operating system from the official Miniconda page.
- Run the installer and follow the instructions to complete the installation.
- 
Open the Miniconda command line. 
- 
Navigate to the project folder using: cd C:/your/project/path
- 
Create the conda environment using the environment.ymlfile:conda env create -f environment.yml 
- 
Activate the conda environment: conda activate wordcloud-env 
NOTE: after the first installation you can skip the 3rd point.
The script requires certain NLTK data to function correctly.
Run the script setup-nltk.py to download the necessary NLTK data:
python setup_nltk.py- 
Open the Miniconda command line. 
- 
Navigate to the project folder using: cd C:/your/project/path
- 
Activate the conda environment: conda activate wordcloud-env Run the script with the desired arguments. For example: python wordcloud_gen.py --pdf path/to/document.pdf --lang english --exclude word1 word2 word3 --color_file path/to/colors.json --width 1920 --height 1080 --background white --font path/to/font.ttf Alternatively, starting from a text file: python wordcloud_gen.py --txt path/to/document.txt --lang english --exclude word1 word2 word3 --color_file path/to/colors.json --width 1920 --height 1080 --background white --font path/to/font.ttf Simply use the GUI via the command line. Run the Wordcloud Generation Script python wordcloud_gen_GUI.py 
If already built use the wordcloud_gen.exe executable.
Note:
A known error when running wordcloud_gen_GUI.py is related to pymupdf library. To fix the issue try run the following command:
pip install --upgrade --force-reinstall pymupdfIf you would like to build the project into a standalone usable .exe file do the following:
- 
Show the paths for each required package with other dependencies using: pip show packagename You should check the paths for customtkinter,CTkColorPicker,CTkMessagebox,CTkToolTip,CTkListbox,wordcloud. Keep these paths as they will be used later.
- 
Install pyinstaller:pip install pyinstaller 
- 
Build the application (if you used Miniconda): pyinstaller --noconfirm --onedir --windowed --add-data "C:/Users/user/miniconda3/envs/wordcloud-env/Lib/site-packages/customtkinter;customtkinter/" --add-data ... wordcloud_gen_GUI.py
- 
Add the --add-dataflag for each package (customtkinter,CTkColorPicker,CTkMessagebox,CTkToolTip,CTkListbox,wordcloud) using:--add-data "C:/package/path/packagename;packagename/"
- 
Run the command and wait for the build. After the building process, navigate to the distfolder. Inside the folder namedwordcloud_gen_GUI, you will find the executable namedwordcloud_gen_GUI.exe.
- 
For easy access, drag and drop the input, output, and font folders outside the main folder (the one where the .exefile is placed).
By following these steps, you should be able to generate a word cloud from a PDF or text file with your preferred settings.
This project demonstrates basic text processing and visualization techniques using Python.
Your contributions are appreciated! 😄 If you find this project helpful or like it, don't forget to star it ⭐
Feel free to contribute to this project by providing feedback, suggesting improvements, opening issues or pull requests. For major changes, please open an issue first to discuss what you would like to change.
Check out the Contributors section to see who has contributed to this project!
WordcloudGen by biagio11 is licensed under CC BY-NC-SA 4.0