Loading UTA

Loading is largely driven by the Makefile in loading/Makefile. You don't
have to do it this way, but the details of the loading appear there.

Loading occurs in two distinct stages: extraction and translations from
data sources into intermediate files, and loading from intermediates into
UTA.  The extraction scripts, which are specific for each source, are in
uta/sbin/ and write one or more intermediates in formats that are
specified in uta/formats/.  The loading process is based on intermediate
file format type but is identical across sources.  An outline of this is
in ../doc/misc-figures.pdf.

Source extraction requires dependencies that are not part of the UTA
python project dependencies.  This is not expected to be a problem for
users (who don't need to load data), but it requires special attention for
admins.  

TODO: Detail requirements (eutils, bdi, perl+ensembl)


How Reece does it
-----------------

*) prepare a database
(These instructions are approximate.)

createuser uta_admin
createuser uta_public
createuser reece
createdb -O uta_admin uta_dev

in psql:
grant uta_admin to reece;
grant uta_public to reece;


*) Extraction

make main-data

*) uncompress resulting fasta files into the fasta directory (see
main.conf)

*) Option: make test data
These data are used for testing and therefore committed with the repo.
You probably don't need to rebuild them.

make test-data

*) create and load a database

The general command is

make build-db IN=<dataset>

IN (instance name) may be main, test, or quick.  test is the default.  See
the configuration in IN.conf. (Currently, main loads into a uta_stage
database, and test and quick load into a uta_dev database.)

uta --conf=main.conf align-exons --sql "AND tx_ac ~ '^NM_' AND alt_ac~'^NC_00000'"

*) push to RDS

I build locally, then push to RDS like this:

pg_dump -d uta_dev | psql -h uta.invitae.com -U uta_admin -d uta_dev
