Skip to content

Conversation

freemabd1
Copy link
Contributor

The pull request is for jira story RW-15. I still need to do alot to make this robust but this is the bare minimum to get connected to BigQuery.

Copy link
Contributor

@danqrodney danqrodney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I don't have any comments on the UI code. You may want @blrubenstein or @dmohs to give you feedback there.

If you're eager to get this code in, we can definitely address some of my comments later, but wanted to take a first stab.

compile 'mysql:mysql-connector-java:+'
compile 'com.google.cloud.sql:mysql-socket-factory:+'
compile 'com.google.appengine:appengine:+'
compile 'com.google.appengine:appengine-api-1.0-sdk:+'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which classes required this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i get this error without that dependency: java.lang.RuntimeException: Google App Engine runtime detected (the environment variable "com.google.appengine.runtime.version" is set), but unable to resolve appengine-sdk classes. For more details see https://github.com/GoogleCloudPlatform/google-cloud-java/blob/master/APPENGINE.md

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Thanks for the info!

common.status "Creating service account for API..."
iam_account = "[email protected]"
common.run_inline %W{
gcloud iam service-accounts keys create sa-key.json --iam-account #{iam_account}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will use whatever project and account you might already be gcloud configured to (which may not be the right ones.)

We should probably use --project all-of-us-workbench-test here (assuming that's relevant for this command?) and maybe --account .

Either way, it probably makes sense to try to reuse logic from run_with_creds() or get_service_account_creds_file() in here, as it's doing much the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess i'm still a little confused about this and the above comments. Maybe we can discuss over a call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Let's chat on Monday.

common.status "Running database migrations..."
common.run_inline %W{docker-compose run db-migration}
common.status "Creating service account for API..."
iam_account = "[email protected]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we may just want to use [email protected] here, for a couple reasons:

  • Our server will need to talk to Firecloud, not just BigQuery; we'll be using the creds for that, too.
  • We already have to configure things so that this service account has access to FireCloud and BigQuery to get our test env working; we can avoid having additional configuration by reusing it for local development.

Note that if you use run_with_creds() / get_service_account_creds_file() -- see below -- it will use that service account.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can certainly do that. I don't have access to create a service account under @appspot.gserviceaccount.com. I can currently only create under all-of-us-workbench-test.iam.gserviceaccount.com and @all-of-us-ehr-dev.iam.gserviceaccount.com

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to create a service account, it already exists. (It's the SA that our deployed AppEngine app in test uses.)

You will need a key for it, but you are an Owner on all-of-us-workbench-test, so you should have permissions to do that in cloud console or on the command line (like this).


QueryJobConfiguration queryConfig =
QueryJobConfiguration.newBuilder(
"SELECT id, " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe define constants for our queries?

"is_selectable, " +
"domain_id " +
"FROM `pmi-drc-api-test.synpuf.icd9_criteria` " +
"where parent_id = 0 " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume at some point the "0" here would be a parameter set to parentId, as described here:

https://cloud.google.com/bigquery/docs/parameterized-queries

Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.. that will definitely change.

.setUseLegacySql(false)
.build();

// Create a job ID so that we can safely retry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want a helper class that does all this boiler plate whenever running a blocking BigQuery query.

Alternatively, maybe we could use the query() method instead of creating a job?

http://googlecloudplatform.github.io/google-cloud-java/0.2.7/apidocs/com/google/cloud/bigquery/BigQuery.html#query(com.google.cloud.bigquery.QueryRequest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.. i think the link about also uses query()

CriteriaListResponse criteriaResponse = new CriteriaListResponse();
for (List<FieldValue> row : result.iterateAll()) {
final Criteria criteria = new Criteria();
criteria.setId(row.get(0).getLongValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forsee this method of extracting results from queries as a bit brittle / hard to follow, since we're using indexes here instead of column names.

What may make sense: have a helper that runs the query, constructs a map of column name to index using this method:

http://googlecloudplatform.github.io/google-cloud-java/0.2.7/apidocs/com/google/cloud/bigquery/QueryResult.html#schema()

and then lets you retrieve values from rows by column name instead of index (by looking up the index.)

description: type of criteria
type: string
code:
description: what level of data access the user has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

description: specifies of child or parent
type: boolean
selectable:
description: specifies if use can search with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which criteria would we not be able to search on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

type: integer
format: int64
group:
description: specifies of child or parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"child" or "parent"?

Is the idea that nodes which are "parent" are not selectable, and are just containers for children?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some are not selectable and some are depending on the specific criteria tree, so we use a combination of isGroup and IsSelectable to determine what is a parent or leaf and is selectable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "group" = "true" means there are children, and "selectable" = "true" means you can query on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

@s-rubenstein s-rubenstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this from a UI side looks pretty reasonable. I left the office before I got a chance to look at it, so I haven't actually been able to pull it down and play with it.


private static final Logger log = Logger.getLogger(CohortBuilderController.class.getName());

public static final String CRTIERIA_QUERY = "SELECT id, type, code, name, est_count, is_group, is_selectable, domain_id\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some long lines in here. We should wrap them.


private static final Logger log = Logger.getLogger(CohortBuilderController.class.getName());

public static final String CRTIERIA_QUERY = "SELECT id, type, code, name, est_count, is_group, is_selectable, domain_id\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRTIERIA -> CRITERIA

return response.getResult();
}

protected Map<String, Integer> getResultMapper(QueryResult result) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we'll probably want to move this and some of the code above into some BigQuery utils class. (Fine to leave it here for now.)

@freemabd1 freemabd1 merged commit 1e248cf into master Sep 11, 2017
@freemabd1 freemabd1 deleted the freemabd1/RW-15 branch September 13, 2017 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants