Stars
A large-scale open data lake for the science of science research.
TweetNLP for all the NLP enthusiasts working on Twitter! The Python library tweetnlp provides a collection of useful tools to analyze/understand tweets such as sentiment analysis, emoji prediction,…
String-to-String Algorithms for Natural Language Processing
lead-ratings / gender-guesser
Forked from ferhatelmas/sexmachineGuess gender from first name in Python 2 and 3
How random is the review outcome? A systematic study of the impact of external factors on eLife peer review
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
Contextualised Topic Coherence Metrics: A new way to evaluate neural topic models.
State-of-the-Art Text Embeddings
codes and data to produce main results of the paper "Scientific Prizes and the Extraordinary Growth of Scientific Topics".
Systematic dataset of Covid-19 policy, from Oxford University
Back end for producing indicators and loading them into the COVIDcast API.
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
A Python wrapper around the topic modeling functions of MALLET.
Tracking Emotional Compositions of Online Discourse Before and After the COVID-19 Outbreak
A repository of data on coronavirus cases and deaths in the U.S.
Flexible calculation of moral foundation scores from textual input data based on word embedding methods.
Free and Open Source, Distributed, RESTful Search Engine
Top2Vec learns jointly embedded topic, document and word vectors.
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Interpretable data visualizations for understanding how texts differ at the word level
A large-scale COVID-19 specific geotagged global tweets dataset. Associated paper: https://doi.org/10.1016/j.asoc.2022.109603
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020.
A collection of Jupyter notebooks, each walking you through a common example of bibliometric analysis using scholarly data from the OpenAlex API.