UCD_Noyce

Objective

Our overarching objective is to build systems that investigate biases in social media recommendation algorithms especially in politics. To this end, we are interested in measuring political partisanship, toxicity, and hate speech on online social media platforms. Our goals are to: (1) build reliable data collection infrastructure using platform APIs or web scraping tools for Facebook, Reddit, and YouTube; (2) build machine learning models to classify political ideology and other attributes of social media posts.

Methodology

We will be using NLP techniques with deep learning including fine tuning transformers to train models that specialize in classifying text based on toxicity. Our current focus at the moment is Bidirectional Encoder Representations from Transformers (BERT). BERT is a neural network transformer model that is pre trained to be able to understand and process language. Our goal is to use our own data to apply transfer learning on a BERT model and make it specialize in detecting toxicity and hate speech on online social media platforms. One of our options is to use the Hugging Face Library which provides pretrained models (e.g. BERT) on a large corpora of text. We can then fine-tune the model for our downstream task. Our team will need to run experiments and find the parameters that work best for our purposes. We will be using web scraping techniques and libraries like Selenium to gather our own data from political websites.

Requirements

Install the requirements by running the following commands:

git clone  https://github.com/rezhv/UCD_Noyce.git
pip install -r UCD_Noyce/requirements.txt

Usage

You can run the train the classifier through the command line. First, Clone the repository:

git clone  https://github.com/rezhv/UCD_Noyce.git

Here is a list of possible arguments to pass to the trainer:

python3 ./UCD_Noyce/Noyce/train_classifier.py -h 
usage: train_classifier.py [-h] [-e EPOCHS] [-b BATCH_SIZE] [-l LOGGING_STEPS] 
[-lr LEARNING_RATE] [-v] [-ds DATASET] [-tl TOKENIZATIONLENGTH] [-m MODEL] [-s]

optional arguments:
  -h, --help show this help message and exit
  -e EPOCHS, --epochs EPOCHS Number of Training Epochs
  -b BATCH_SIZE, --batch_size BATCH_SIZE Batch Size
  -l LOGGING_STEPS, --logging_steps LOGGING_STEPS  Number of Steps to Run Evaluation
  -lr LEARNING_RATE, --learning_rate LEARNING_RATE Value of Learning Rate
  -v, --verbose         
  -ds DATASET, --dataset DATASET Path/Name of the Dataset to Train On
  -tl TOKENIZATIONLENGTH, --tokenizationlength TOKENIZATIONLENGTH Tokenization Max Length
  -m MODEL, --model MODEL  Name of the model: bert, xlmr, roberta
  -s, --scheduler Use Learning Rate Scheduler

Run the training script:

python3 ./UCD_Noyce/Noyce/train_classifier.py

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
Noyce		Noyce
Prediction_Results		Prediction_Results
.gitignore		.gitignore
README.md		README.md
noyce_usage_example.ipynb		noyce_usage_example.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UCD_Noyce

Objective

Methodology

Requirements

Usage

About

Uh oh!

Releases 1

Packages

Languages

rezhv/UCD_Noyce

Folders and files

Latest commit

History

Repository files navigation

UCD_Noyce

Objective

Methodology

Requirements

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages