PROSSTT (PRObabilistic Simulations of ScRNA-seq Tree-like Topologies) is a package with code for the simulation of scRNAseq data for dynamic processes such as cell differentiation. PROSSTT is open source GPL-licensed software implemented in Python.
Single-cell RNAseq is revolutionizing cellular biology, and many algorithms are developed for the analysis of scRNAseq data. PROSSTT provides an easy way to test the performance of trajectory inference methods on realistic data with a known "gold standard". The algorithm can produce datasets with user-defined topologies while simulating any number of sampled cells and genes.
PROSSTT can be installed using the pip package manager or any pip-compatible package manager:
pip install git+git://github.com/soedinglab/prosstt.git
git clone https://github.com/soedinglab/prosstt.git
cd prosstt
python setup.py install
PROSSTT was developed and tested in Python 3.5 and 3.6. PROSSTT requires:
- numpy, for data structures
- scipy, for probabilistic distributions and special functions
- pandas, for I/O
- newick, for the Newick tree file format
We also recommend the following libraries:
- matplotlib, for plotting
- jupyternotebooks, for demonstration and development purposes
- scanpy, for the visualization of simulations via diffusion maps. This requires anndata and Python 3.6 to work.
In PROSSTT, topologies are described in terms of branches and their connectivity. The simulated genes are all impacted by the differentiation process (no nonsense genes included, although they may be in the future).
We provide jupyter notebooks with a baseline example, a more involved example that explains the choice of variance parameters, and a notebook that showcases the different sampling strategies.
Alternatively, we include a python script that can be run on the command line (examples/generate_simN.py) to produce simulations like the ones used in the MERLoT paper.
For more information please refer to the documentation.