Skip to content

lukastrm/csa-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crypto-sa - Data Retrieval

This repository holds the contents for the Neural Information Processing project Sentimental Analysis for Predicting Cryptocurrency Markets (crypto-sa) at the Technical University of Berlin in 2021.

Module Contents

Package csadata.core

Use the build_sentiment_data_set(start, end, interval, ...) method to build a joint data set of Tweet sentiment and cryptocurrency price data. The method returns a 2d-NumPy array with rows corresponding to single time intervals as well as the following column scheme:

  • base time: The start time of a single data point time interval in seconds since Epoch.
  • negativity: The negativity score (probability of negative sentiment) of the Tweets in the time intervals.
  • neutrality: The neutrality score (probability of neutral sentiment) of the Tweets in the time intervals.
  • positivity: The positivity score (probability of positive sentiment) of the Tweets in the time intervals.
  • number of Tweets: The total amount of Tweets in the time intervals.

For each cryptocurrency that is added to the data set the following columns are appended:

  • open time
  • open
  • high
  • low
  • close
  • volume
  • close time
  • quote asset volume
  • number of trades
  • taker buy base asset volume
  • taker buy quote asset volume

Please refer to the Binance API for more information on these fields.

Package csadata.twitter

tbd.

Package csadata.crypto

Use the download_crypto_price_data(price, interval, start, end, api_key, path) method to download historical cryptocurrency price data via the Binance API. Please refer to the API specification in order to determine the desire price symbol, which is passed as price parameter. For the interval parameter either specify on of the packages INTERVAL_HOURLY or INTERVAL_DAILY constants or refer to other valid interval enumeration values from the API specification. The start and end parameters refer to the start and end times of the whole search interval and are given in UNIX timestamps (milliseconds since Epoch). If omitted, both parameters default to None and search results are limited to 1000 data points. In order to make an API request with this method, the api_key parameter or the environment variable as specified by the package constant ENV_VAR_BINANCE_API_KEY has to be set to a valid API key. This key can be retrieved via a Binance account. Please refer to the API specification for further information on authorization.

The method downloads price data and writes it into a CSV file at a file path as specified with the path parameter. If no path parameter is specified, the data is stored in the current working directory.

Use the load_crypto_price_data(price, interval, start, end, path) method to load previously downloaded historical price data into a Numpy array.

Package csadata.emoji

Use the resolve_emoji_sentiments(emoji_data, emoji_sent_dict) to resolve a list of N emoji strings to an N-by-3 Numpy array in which each row refers to a single emoji string and the 3 columns represent the sentiment values for negativity, neutrality and positivity. If the emoji_sent_dict parameter is not present, this method internally calls the load_emoji_sentiment_mapping() method, which loads the corresponding emoji sentiment mapping from the module resources. This mapping is a dictionary of integer values corresponding to the unicode code of the specified emoji and unidimensional Numpy arrays of length 3 corresponding to the sentiment values. By calling the load_emoji_sentiment_mapping() method separately, it is possible to modify or extend this mapping programmatically before passing the mapping dictionary as emoji_sent_dict parameter to the resolve_emoji_sentiments(...) method.

The default emoji sentiment mapping uses the data from the Emoji Sentiment Ranking.

The extract_emojis(string) method is a helper method to extract an emoji string out of another given string (which may also contain other characters).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages