System requirements:
- Docker CE
- Docker must be installed to build and run code (For Google workstations, see http://go/installdocker).
- IMPORTANT: be sure to allocate 12GB of memory (if possible) and 2GB swap to the Docker Engine. See See https://docs.docker.com/docker-for-mac/#advanced for screenshots and instructions for Mac.
- Ruby
- Our team's dev/ops scripts are written in Ruby. Most common operations are launched via the project.rb script at the root of each sub-project.
- Python >= 2.7.9
- Python is required by some project-specific scripts and by the Google Cloud Platform tools.
- gcloud
For local development, also install:
After you've installed gcloud
, login using your pmi-ops
account:
gcloud auth login
To initialize the project, run the following:
git clone https://github.com/all-of-us/workbench
cd workbench
git submodule update --init --recursive
Then set up git secrets and fire up the development servers. Optionally, you can set up your Intellij for UI or API work.
Before doing any development, you must run:
./gradlew compileGeneratedJava appengineRun
This will generate compiled Java and MapStruct files that are necessary for the app to compile. On Macs, this command will never complete - when it has gotten to 97% it will hang forever. It can safely be ctrl+c
'd at that point.
To make changes, do:
git checkout master
git pull
git checkout -b <USERNAME>/<BRANCH_NAME>
# (make changes and git add / commit them)
git push -u origin <USERNAME>/<BRANCH_NAME>
And make a pull request in your browser at https://github.com/all-of-us/workbench based on your upload.
After responding to changes, merge in GitHub.
- Autoformat Java code via google-java-format:
./gradlew spotlessApply
(git pre-push / Circle will complain if you forget)
- Direct your editor to write swap files outside the source tree, so Webpack does not reload when they're updated. Example for vim.
From the api/
directory:
./project.rb dev-up
When the console displays "Listening for transport dt_socket at address: 8001", your
local API server endpoints are available under http://localhost:8081/. You can test this by
navigating to the status endpoint in your browser or
executing curl http://localhost:8081/v1/status
Note: If you haven't loaded any data locally for the app, please run the goal below. Also, this will not run while dev-up is running, so please kill dev-up first.
./project.rb run-local-data-migrations
Or you can run all migrations with:
./project.rb run-local-all-migrations
You can run the server (skipping config and db setup) by running:
./project.rb run-api
Other available operations may be discovered by running:
./project.rb
The above steps for starting the API server can take upwards of 8-10 minutes on MacOS, most likely due to performance issues with Docker for Mac. Follow these steps to set up your developer environment to start the API server outside of docker. A full restart should take ~30 seconds with this method.
All commands should be run from workbench/api
- Install Java 8
- Add following to
~/.bash_profile
. Note:- Your Java8 library directory may be different.
YOUR_WORKBENCH_DIRECTORY_PATH is the pathname to your workbench git repo.export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home export WORKBENCH_DIR=[YOUR_WORKBENCH_DIRECTORY_PATH] source $WORKBENCH_DIR/api/db/load_vars.sh
- If you are using zsh, add the lines above to
~/.zshrc
.
- If you are using zsh, add the lines above to
- Source your bash profile or open a new terminal
source `~/.bash_profile`
- Your Java8 library directory may be different.
- Install Java App Engine components
-
gcloud components install app-engine-java
-
- Generate Google Cloud access token
-
./project.rb get-test-service-creds
-
- If you have schema migrations or pending configuration updates, run through the normal docker developer startup process at least once successfully. This is typically only necessary when switching between branches.
- Start services required for API server
-
./project.rb start-api-reqs # the counterpart command is ./project.rb stop-api-reqs
-
- Start API server through gradle
-
./gradlew appengineRun
-
While the API is running locally, saving a .java file should cause a recompile and reload of that class. Status is logged to the console. Not all changes reload correctly (e.g., model classes do not appear to reload more than once).
The first time a server is started after a database clear (./project.rb docker-clean
, ./project.rb run-local-all-migrations
), it may take some time for a billing project to become available for workspace creation. Generally, this takes about 5 minutes, but has been known to take upwards of 40 in some cases. The billing project buffer size is set to 2 for local environments (config_local.json
). A cron job runs every minute while the API server is running to check if there are empty slots.
To check whether you have available billing projects:
./project.rb connect-to-db
select * from billing_project_buffer_entry;
The status
column enum values can be found in org.pmiops.workbench.db.model.StorageEnums
.
Before launching or testing the UI, yarn must first install the neccessary packages. From the ui/
directory:
yarn install
To launch the local UI:
yarn dev-up
You can view your local UI server at http://localhost:4200/.
By default, this connects to our test API server. Use --configuration=$ENV
to
use an alternate src/environments/environment.$ENV.ts
file and connect to a
different API server. To connect to your own API server running at
localhost:8081
, pass --configuration=local
.
To run react UI tests:
yarn test-react
Other useful yarn commands:
# To upgrade yarn packages:
yarn
# To lint the UI and automatically fix issues:
yarn lint --fix
You can also run the UI through project.rb. NOTE: this is slower and not recommended.
From the ui/
directory,
./project.rb dev-up
[legacy] UI tests in Angular can be run and viewed at http://localhost:9876/index.html.
Other available operations may be discovered by running:
./project.rb
./project.rb swagger-regen
To deploy your local workbench API code to a given AppEngine project, in the api directory run:
./project.rb deploy --project PROJECT --version VERSION --[no-]promote
This also migrates the SQL databases, so avoid using this when you have local SQL schema changes.
Example:
./project.rb deploy --project all-of-us-workbench-test --version dantest --no-promote
When the api is deployed, you'll be able to access it at https://VERSION-dot-api-dot-PROJECT.appspot.com. If you specify --promote, it will be the main API code served out of https://api-dot-PROJECT.appspot.com. Aside from releases, this command can be used to test a topic branch in the shared test project before submitting. If possible, push to a version with your own username and --no-promote.
To deploy your local UI code to a given AppEngine project, in the ui directory run:
./project.rb deploy-ui --project PROJECT --version VERSION --[no-]promote
Example:
./project.rb deploy-ui --project all-of-us-workbench-test --version dantest --no-promote
When the UI is deployed, you'll be able to access it at https://VERSION-dot-PROJECT.appspot.com. If you specify --promote, you can access it at https://PROJECT.appspot.com. Note that either way, it will be pointing at the live test API service (https://api-dot-PROJECT.appspot.com). (This can be overridden locally in the Chrome console).
Download the git-secrets tool. If you are on a mac, run:
brew install git-secrets
If you are on Linux, run:
rm -rf git-secrets
git clone https://github.com/awslabs/git-secrets.git
cd git-secrets
sudo make install && sudo chmod o+rx /usr/local/bin/git-secrets
cd ..
rm -rf git-secrets
git-secrets by default runs every time you make a commit. But if you want to manually scan:
git secrets --scan
git secrets --scan /path/to/file (/other/path/to/file *)
git secrets --scan -r /path/to/directory
Spring application configs, in application.properties
files, specify behavior
like logging. They are static files bundled with the built Java binary.
Database connection information is read from application-web.xml
. These
secrets vary per-environment; Ruby setup scripts pull the values from Google
Cloud Storage and generate the XML, which is then deployed with the Java binary.
Server behavior configuration is stored in the database. It may be changed
without restarting the server, but is usually set only at deployment time. It's
based on config_$ENV.json files (which are converted into WorkbenchConfig
objects) and loaded into the database by workbench.tools.ConfigLoader
.
CacheSpringConfiguration
, a Spring @Configuration
, provides
the @RequestScoped
WorkbenchConfig
. It caches the values fetched from the
database with a 10 minute expiration.
Loading of local tables/data for both schemas (workbench/cdr) happens in a manual goal(creates tables in both schemas and insert any app data needed for local development):
./project.rb run-local-all-migrations
Local tables loaded with data are:
- workbench - cdr_version
- cdr - criteria, achilles_analysis, concept, concept_relationship, vocabulary, domain, achilles_results, achilles_results_concept and db_domain
When editing database models, you must write a new changelog XML file. See Liquibase change docs, such as createTable.
You can get Hibernate to update the schema for inspection (and then backport
that to liquibase's XML files) by editing api/db/vars.env
to make Hibernate
run as the liquibase user and adding spring.jpa.hibernate.ddl-auto=update
to api/src/main/resources/application.properties
.
Then use api/project.rb connect-to-db
and SHOW CREATE TABLE my_new_table
.
Revert your changes or drop the db when you're done to verify the changelog
works.
Finally, write a new changelog file in api/db/changelog/
and include it in
db.changelog-master.xml
.
liquibase
does not roll back partially failed changes.
Workbench schema lives in api/db
--> all workbench related activities access/persist data here
CDR schema lives in api/db-cdr
--> all cdr/cohort builder related activities access/persist data here
The following scripts need to be run anytime a new cdr is released or if you want all the count data for cohort builder.
Description of arguments these scripts take are as follows.
- bq-project : Project where BigQuery cdr lives. Ex: all-of-us-ehr-dev, all-of-us-workbench-test
- bq-dataset : BigQuery Dataset name of the cdr release. Ex: synthetic_cdr20180606
- workbench-project: Project where private count dataset (cdr) is generated. This must exist.
- cdr-version: Name of the cloud cdr your creating. Ex: synth_r_2019q3_1
- bucket: A GCS Bucket where csv data dumps are of the generated data. This must exist.
- instance: Cloud Sql Instance. Ex: workbenchmaindb
Examples below need to be run in the following order. It's also very important that the prep tables are in a viable state before starting this process(Check with CB team on this).
Generate all denormalized tables(search, review and data set) in the BigQuery cdr only one time when it is released or as needed
./project.rb make-bq-denormalized-tables --bq-project all-of-us-ehr-dev --bq-dataset synthetic_cdr20180606
- The BigQuery dataset has new denormalized tables(search, review and data set) for cohort builder to work.
- Each of these can be run individually if needed(sequential ordering is very important here:
./project.rb make-bq-denormalized-search --bq-project all-of-us-ehr-dev --bq-dataset synthetic_cdr20180606
./project.rb generate-cb-criteria-tables --bq-project all-of-us-ehr-dev --bq-dataset synthetic_cdr20180606
./project.rb make-bq-denormalized-review --bq-project all-of-us-ehr-dev --bq-dataset synthetic_cdr20180606
./project.rb make-bq-denormalized-dataset --bq-project all-of-us-ehr-dev --bq-dataset synthetic_cdr20180606
./project.rb make-bq-dataset-linking --bq-project all-of-us-ehr-dev --bq-dataset synthetic_cdr20180606
- Info/examples for dataset script below:
- Cdr BigQuery dataset: all-of-us-workbench-test:cdr20181107
- CSV dumps of tables in bucket all-of-us-workbench-private-cloudsql: cdr20181107/*.csv.gz
- Browse csvs in browser like here :https://console.cloud.google.com/storage/browser?project=all-of-us-workbench-test&organizationId=394551486437
- Note cdr-version can be '' to make dataset named cdr
The next 2 scripts are used to generate cloud cdr database instances. Generate cdr count data using deidentified cdr release.
./project.rb generate-private-cdr-counts --bq-project all-of-us-ehr-dev --bq-dataset synthetic_cdr20180606 --workbench-project all-of-us-workbench-test --cdr-version synth_r_2019q3_1 --bucket all-of-us-workbench-private-cloudsql/synthetic_cdr20180606
- Generates csv.gz files in the specified bucket. These files will be used in the next step
./project.rb generate-cloudsql-db --project all-of-us-workbench-test --instance workbenchmaindb --database synth_r_2019q3_1 --bucket all-of-us-workbench-private-cloudsql/synthetic_cdr20180606/synth_r_2019q3_1
- Databases are live on cloudsql.
- For the environment you want, in the workbench/api/config/cdr_versions_ENV.json , add a new object to the array for your cdr. Properties are:
- name: unique name
- dataAccessLevel: 1 = registered, 2 = controlled
- bigqueryProject: project the BigQuery cdr is
- bigqueryDataset: dataset of cdr,
- creationTime: date string in this format "2018-09-20 00:00:00Z",
- releaseNumber: gets incremented by 1 each time an official release is made. It has the same value for a registered and controlled cdr release.
- numParticipants: Number of participants in CDR.
- cdrDbName: name of the the cloudsql count database used by workbench "synth_r_2019q3_1". CDR versioning doc: https://docs.google.com/document/d/1W8DnEN7FnnPgGW6yrvGsdzLZhQrdOtTjvgdFUL6e4oc/edit
- Set the default cdr version for the environment in config_ENV.json.
- You probably don’t want to set your new cdr to the default before testing it.
- NOTE The cloudsql instance is set in code for each environment in /api/libproject/devstart.rb
- Make your config changes take effect:
- For non local environments:
- commit and merge your config files with master and the changes will take effect on the next build.
- OR run
./project.rb update-cloud-config --project <project>
where project is the project for your environment. You can find this project in config_.json server.projectId
- For local , run dev-up to build your api
- For non local environments:
Generate full local mysql test databases -- cdr for data generated above if you need to develop with a full test database
- DO NOT do this with production data. It is not allowed.
- Make a sql dump from cloud console of the database you want.
- Run
./project.rb local-mysql-import --sql-dump-file <FILE.sql> --bucket <BUCKET>
- Update your local environment per above.
Alternatively if you want to make a local database from csvs in gcs
- Run
./project.rb generate-local-count-dbs --cdr-version synth_r_2019q3_1 --bucket all-of-us-workbench-private-cloudsql
- You may want to do this if generate-cloudsql-db fails because of limited gcloud sql import csv functionality
- Or you have some local schema changes you need and just need csv data
- Local mysql database or databases.
- cdr-version in the alternative method can be an empty string, '', to make databases named 'cdr'
Put mysqldump of local mysql database in bucket for importing into cloudsql. Call once for each db you want to dump
./project.rb mysqldump-local-db --db-name synth_r_2019q3_1 --bucket all-of-us-workbench-private-cloudsql
- synth_r_2019q3_1.sql uploaded to all-of-us-workbench-private-cloudsql
./project.rb cloudsql-import --project all-of-us-workbench-test --instance workbenchmaindb --bucket all-of-us-workbench-private-cloudsql --database synth_r_2019q3_1 --file synth_r_2019q3_1.sql
Note a 3GB dump like cdr and public can take an hour or so to finish. You must wait before running another import on same instance (Cloudsql limitation) You can check status of import at the website: https://console.cloud.google.com/sql/instances/workbenchmaindb/operations?project=all-of-us-workbench-test
gcloud sql operations list --instance [INSTANCE_NAME] --limit 10
- databases are in cloudsql
./project.rb local-mysql-import --sql-dump-file synth_r_2019q3_1.sql --bucket all-of-us-workbench-private-cloudsql
- mysql db is in your local mysql for development. You need to alter your env per above to use it.
Elasticsearch is being integrated as an auxilliary backed on top of the BigQuery CDR for cohort building. Currently it can only be run via docker-compose on a local instance. See the full design: https://docs.google.com/document/d/1N_TDTOi-moTH6wrXn1Ix4dwUlw4j8GT9OsL9yXYXYmY/edit
./project.rb load-es-index
As of 3/4/19, you'll need to enable Elasticsearch locally to utilize it in the Cohort Builder.
sed -i 's/\("enableElasticsearchBackend": \)false/\1true/' config/config_local.json
Currently the default setting for the indexer is to only index ~1000 to keep the local data size small. Some example criteria that will match this default dataset:
- Conditions ICD9: Group 250 Diabetes mellitus
- Drugs: Acetaminophen
- PPI: Anything (support for individual answers coming soon)
- Procedures CPT: 99213
Requires that Elastic is running (via run-api or dev-up).
Show the top 5 standard condition concept IDs:
curl -H "Content-Type: application/json" "localhost:9200/cdr_person/_doc/_search?pretty" -d '{"size": 0, "aggs": {"aggs": {"terms": {"field": "condition_concept_ids", "size": 5 }}}}'
The above IDs can be cross-referenced against the Criteria table in SQL or BigQuery to determine cohort builder search targets.
Dump all participants matching a condition source concept ID (disclaimer: large):
curl -H "Content-Type: application/json" "localhost:9200/cdr_person/_doc/_search?pretty" -d '{"query": {"term": {"condition_source_concept_ids": "44833466"}}}' > dump.json
During ./project dev-up
the schema activity is the only activity run, which only creates tables for the cdr schema.
Loading of cloud data for the criteria trees and cdr version happens in a manual goal(deletes and inserts tree data into the criteria table):
./project.rb run-cloud-data-migrations
CDR Schema - We now have 2 activities in api/db-cdr/build.gradle
file:
liquibase {
activities {
schema {
changeLogFile "changelog/db.changelog-master.xml"
url "jdbc:mysql://${db_host}:${db_port}/cdr"
username "liquibase"
password "${liquibase_password}"
}
data {
changeLogFile "changelog-local/db.changelog-master.xml"
url "jdbc:mysql://${db_host}:${db_port}/cdr"
username "liquibase"
password "${liquibase_password}"
}
runList = project.ext.runList
}
}
CDR Schema - In the api/db-cdr/run-migrations.sh
for local deployments we call the liquibase update task with the specific activity name like so:
echo "Upgrading database..."
../gradlew update -PrunList=schema
CDR Schema - In the api/libproject/devstart.rb
for test deployment we call the liquibase update task with the specific activity name like so:
ctx.common.run_inline("#{ctx.gradlew_path} --info update -PrunList=schema")
To run both api and common api unit tests, in the api dir run:
./project.rb test
To run just api unit tests, run:
./project.rb test-api
To run bigquery tests (which run slowly and actually create and delete BigQuery datasets), run:
./project.rb bigquerytest
By default, all tests will return just test pass / fail output and stack traces for exceptions. To get full logging, pass on the command line --project-prop verboseTestLogging=yes when running tests.
To filter tests, use the --tests flag on any test command:
./project.rb bigquerytest --tests "org.pmiops.workbench.api.CohortBuilderControllerBQTest.countSubjectsNotValidMessageException"
To run tests in IntelliJ, go to your preferences, click plugins, and make sure you have the JUnit
plugin installed.
Once you have JUnit installed, go to the test file you want to run, right click on the test file, and select Run
or Debug
to run or debug the tests. To run or debug on a specific test class or method, open the file, and test running options should appear as green triangles pointing right on the side bar. Clicking that will open a dialog allowing you to run that specific class or method.
These are easiest if you need to authenticate as one of your researcher accounts.
- Firecloud
- Leo (notebook clusters)
This approach is required if you want to issue a request to a backend as a service account. This may be necessary in some cases as the Workbench service is an owner on all AoU billing projects.
This approach requires oauth2l to be installed (and brew install go
on MacOS):
go get github.com/google/oauth2l
go install github.com/google/oauth2l
To obtain the service account credentials, run
cd api
gcloud auth login <user>@pmi-ops.org # for test environment
gcloud config set account <user>@pmi-ops.org # also works if logged in.
gcloud auth list # confirm it's what you expect
./project.rb get-test-service-creds # the dev-up command should also include this step.
You should see a file sa-key.json
in the current directory
The following shows how to make an authenticated backend request as the shared
workbench test service account against Firecloud dev. It retrieves required authorizaiton
scopes of email
, profile
, and cloud-billing
.
# From the "api" directory, use `oauth2l` to retrieve an authorization header:
`~/go/bin/oauth2l header --json ./sa-key.json email profile cloud-billing`
Now we'll demonstrate calling Firecloud's profile/billing API with the service account credentials.
# call Firecloud Billing API, format the JSON output and view in less
curl -X GET -H "`~/go/bin/oauth2l header --json ./sa-key.json email profile cloud-billing`" \
-H "Content-Type: application/json" \
"https://firecloud-orchestration.dsde-dev.broadinstitute.org/api/profile/billing" \
| jq | less
# If you get 401 errors, you may need to clear your token cache.
oauth2l reset