This GitHub repository is associated with the paper DNA shape complements sequence-based representations of transcription factor binding sites by P. DeFord and J. Taylor, and reproduces all of the analysis and figures from that paper.
This analysis relies heavily on the StruM package. Source code, installation instructions, and documentation can be found here.
In addition this analysis has the following dependencies:
These can all be installed via conda. Below is an example of how to set up an appropriate environment via Conda to run this analysis.
WORKING_DIRECTORY="~/scratch/working_draft"
cd $WORKING_DIRECTORY
git clone https://github.com/pdeford/strum_paper.git
cd strum_paper
mkdir src
cd src
git clone https://github.com/pdeford/StructuralMotifs.git
conda create -n strum_paper python=2.7.15
source activate strum_paper
conda install -c bioconda \
bedtools=2.27.1 biopython=1.68 meme=4.12.0 \
regex=2016.06.24 samtools=1.9
conda install -c default \
matplotlib=2.2.3 numpy=1.15.4 scikit-learn=0.20.1 \
scipy=1.1.0 cython=0.29.2 python-dateutil=2.7.5 \
requests=2.20.1 libiconv=1.15
conda install -c conda-forge \
rdflib=4.2.2
cd StructuralMotifs
python setup.py install
cd ../..
Once your environment is initialize appropriately, all of the code can be produced using the command:
./do_all.sh $n_processes
where $n_processes is the number of processors that you have available to devote to the analysis.