This application visualizes the internal workings of transformer-based models like BERT, DistilBERT, RoBERTa, and ALBERT. It provides insights into gradients, attention mechanisms, hidden states, and activation distributions for a given input-text.
- Model Selection: Choose from popular transformer models.
- Gradient Analysis: Visualize token-level gradients to understand model sensitivity.
- Attention Heatmap: Explore attention weights across tokens.
- Attention Heads: Compare attention patterns across multiple heads.
- Hidden States Analysis: Examine hidden state activations across layers.
- Activation Distribution: Visualize the distribution of activations across layers.
get_model(model_name)
: Loads and caches the selected transformer modelmain()
: Main application logic and UI components
load_model(model_name)
: Loads the specified transformer model and tokenizerprocess_text(text, model, tokenizer)
: Processes input text through the modelplot_gradients(embeddings, inputs, tokenizer)
: Generates gradient analysis plotsplot_attention_heatmap(outputs, inputs, tokenizer)
: Creates attention heatmap visualizationplot_attention_heads(outputs, inputs, tokenizer)
: Visualizes multiple attention headsplot_hidden_states(outputs, token_index)
: Analyzes hidden states for specific tokensplot_activation_distribution(outputs)
: Shows activation distribution across layers
- Python 3.8 or higher
- Libraries listed in
requirements.txt
-
Clone the repository:
git clone https://github.com/dhamu2github/transformer.git cd transformer
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
-
Open your browser and navigate to the provided local URL (e.g.,
http://localhost:8501
). -
Select a transformer model from the dropdown menu, input text, and click Analyze to generate visualizations.
├── app.py # Main Streamlit application
├── app_utils.py # Utility functions for model operations and plotting
├── requirements.txt # Project dependencies
└── readme.md # This file
- Displays the averaged gradients for each input token
- Helps understand which tokens contribute most to the model's predictions
- Shows the attention weights between tokens
- Visualizes how different words relate to each other in the model's attention mechanism
- Provides detailed views of individual attention heads
- Displays 12 different attention patterns across heads
Hidden States Analysis
- Allows analysis of hidden state values for specific tokens
- Includes a slider to select different token positions
- Shows the distribution of activation values across different layers
- Helps understand the model's internal representations
By default, the app loads with the input text: "A cow jumped over the moon." The visualizations below are generated using the model "bert-base-uncased".
-
Gradient Analysis Click here to understand the graph
-
Attention Heatmap Click here to understand the heatmap
-
Attention Heads Analysis Click here to understand the attention heads
-
Mean Hidden State Analysis Click here to understand the hidden state
-
Activation Distribution Click here to understand the activation distribution
bert-base-uncased
distilbert-base-uncased
roberta-base
albert-base-v2
To add more models, update the model_name
list in app.py
:
model_name = st.selectbox(
"Choose a Transformer Model:",
["bert-base-uncased", "distilbert-base-uncased", "roberta-base", "albert-base-v2", "your-custom-model"]
)
- I would like to extend my sincere appreciation to Mr. Asif Qamar of Support Vecors Lab for his guidance in cultivating critical thinking skills in AI.
- My thought process was enhanced by claude.ai
- Built with Streamlit
- Uses Hugging Face Transformers
- Visualization powered by Matplotlib and Seaborn