Skip to content

Conversation

danqrodney
Copy link
Contributor

I don't love this code but it should speed up performance dramatically, and I don't see a lot of alternatives. :( Let me know if you have suggestions on how to make it simpler / clearer.

@danqrodney danqrodney requested a review from calbach April 18, 2018 21:23
Copy link
Contributor

@calbach calbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, mostly minor comments. Hopefully we don't need to make any more query optimizations; I'm not sure there is enough flexibility to add another non-trivial optimization like this.

Also worth noting that we'll still hit this worst case performance in the event that the user includes a column from every joined table in either the filters or order by. Depending on how aggressively users are doing that, we may need further changes

column.columnAlias);
// If the table we're retrieving this from is found in the inner query, select the column
// in both the inner and outer query.
if (column.tableAlias.equals(queryState.mainTableName) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will beforeLimitTables not include the main table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No... JoinedTableInfo doesn't really make sense for the main table, so I'm just special casing it.

outerSelectExpressions.add(columnSql);
}
}
Joiner commaJoiner = Joiner.on(", ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not used util next codeblock, would move down

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}
Joiner commaJoiner = Joiner.on(", ");
if (hasAfterLimitTables) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if should be unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Columns that appear in the inner ORDER BY appear in the outer ORDER BY, too; make sure
// they get returned.
String columnAlias = getColumnAlias(column.columnInfo.getColumnName());
log.info("COLUMN ALIAS = " + columnAlias);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

outerSql.append(commaJoiner.join(outerOrderByExpressions));
return outerSql.toString();
} else {
return innerSql.toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: invert the if, early return this, then avoid the indent for the outerSql handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

StringBuilder endSql = handleOrderBy(queryState, tableQuery.getOrderBy());
endSql.append(" limit ");
ImmutableList<OrderByColumn> orderByColumns = handleOrderBy(queryState, tableQuery.getOrderBy());
StringBuilder endSql = new StringBuilder(" limit ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps rename to limitOffsetSql; no longer necessarily the end if there's an inner query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; and moved the generation of this SQL into buildSql. (I wanted to apply a LIMIT to the outer SQL too, just for good measure.)

columnConfig);
} else {
TableNameAndAlias tableNameAndAlias = getTableNameAndAlias(columnParts, queryState);
TableNameAndAlias tableNameAndAlias = getTableNameAndAlias(columnParts, queryState, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this function know it should pass true for beforeLimit? This doesn't make sense to me; getColumnInfo() seems like something that could be called for any column, not just those pertaining to WHERE / ORDER BY. If it's currently only called in those cases, that seems like an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Exposed it as a parameter.

Copy link
Contributor

@calbach calbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since all my comments were style or refactoring-type comments. I'm out today - so please add someone else if you want further feedback on this this week.

Copy link
Contributor Author

@danqrodney danqrodney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to warn researchers not to use columns on joined tables in WHERE / ORDER BY more than necessary when dealing with large cohorts.

Hopefully a lot of joins will be to the concept table and will only show up in the SELECT clause.

I filed https://precisionmedicineinitiative.atlassian.net/browse/RW-531 to allow for cohort materialization to GCS, which can help with cases where BigQuery queries exceed 60 seconds.

columnConfig);
} else {
TableNameAndAlias tableNameAndAlias = getTableNameAndAlias(columnParts, queryState);
TableNameAndAlias tableNameAndAlias = getTableNameAndAlias(columnParts, queryState, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Exposed it as a parameter.

column.columnAlias);
// If the table we're retrieving this from is found in the inner query, select the column
// in both the inner and outer query.
if (column.tableAlias.equals(queryState.mainTableName) ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No... JoinedTableInfo doesn't really make sense for the main table, so I'm just special casing it.

outerSelectExpressions.add(columnSql);
}
}
Joiner commaJoiner = Joiner.on(", ");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}
Joiner commaJoiner = Joiner.on(", ");
if (hasAfterLimitTables) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Columns that appear in the inner ORDER BY appear in the outer ORDER BY, too; make sure
// they get returned.
String columnAlias = getColumnAlias(column.columnInfo.getColumnName());
log.info("COLUMN ALIAS = " + columnAlias);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

outerSql.append(commaJoiner.join(outerOrderByExpressions));
return outerSql.toString();
} else {
return innerSql.toString();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

StringBuilder endSql = handleOrderBy(queryState, tableQuery.getOrderBy());
endSql.append(" limit ");
ImmutableList<OrderByColumn> orderByColumns = handleOrderBy(queryState, tableQuery.getOrderBy());
StringBuilder endSql = new StringBuilder(" limit ");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; and moved the generation of this SQL into buildSql. (I wanted to apply a LIMIT to the outer SQL too, just for good measure.)

@danqrodney danqrodney merged commit 05e96a4 into master Apr 19, 2018
@danqrodney danqrodney deleted the danrodney/bq_opt branch April 19, 2018 15:04
freemabd1 added a commit that referenced this pull request Apr 19, 2018
* RW-391 created applicaiton test.

* RW-391 attempt at getting @SpringBootTest working.

* RW-391 more experimentation.

* RW-391 successfuly SpringBootTest!

* RW-391 removed unused properties.

* RW-391 moved test case to integration.

* RW-391 replace with master copy.

* Tweak deploy docs and logging (#731)

* Js/demo form update (#728)

* New "select boxes"

* Stylin

* Stub in ethnicity

* Tests & linting

* Small bugfix, more linting

* Adding non-standard columns to observation table in CDM schema (#729)

* Adding non-standard columns to observation table in CDM schema

* Updating comment

* Updated README

* Refactor detail tab pages to use single page (#733)

* Refactor detail tab pages to use single page

* Add tab name

* Fix "Google Bucket" link (#736)

* RW-391 finding a way to programmatically test all beans.

* Add ability to run just the tests (#735)

* Add ability to run just the tests

* Remove unused options

* fix for new notebook button being truncated (#738)

* Blrubenstein/not found page (#734)

* Not Found Page

* Fix spacing

* PR Comments

* Move to better not found pattern.

* Fix linting?

* Fix sidebar, cohort-review wrapper, persistent scrollbar (#737)

* Fix sidebar, cohort-review wrapper, persistent scrollbar

* Add comment about height

* Clean up PR (#740)

* RW-391

* Remove imports, get rid of one getme call (#741)

* Remove imports, get rid of one getme call

* Make loc public

* Add BSD 3-clause LICENSE (#743)

* [RW-514] Joining to tables after LIMIT where possible in cohort materialization. (#744)

* Start

* Fixes

* Removing imports

* PR comments

* Removing log

* RW-391 added comments to data.sql and schema.sql and programmtically load beans were interested in testing.

* RW-391 cleanup of unused imports.

* RW-391 removed unused inserts.

* RW-391 removed println.

* RW-391 added comment about initializing the database.

* RW-391 added maps to list for iteration.

* RW-391 created applicaiton test.

* RW-391 attempt at getting @SpringBootTest working.

* RW-391 more experimentation.

* RW-391 successfuly SpringBootTest!

* RW-391 removed unused properties.

* RW-391 moved test case to integration.

* RW-391 replace with master copy.

* RW-391 finding a way to programmatically test all beans.

* RW-391

* RW-391 added comments to data.sql and schema.sql and programmtically load beans were interested in testing.

* RW-391 cleanup of unused imports.

* RW-391 removed unused inserts.

* RW-391 removed println.

* RW-391 added comment about initializing the database.

* RW-391 added maps to list for iteration.
SrushtiGangireddy pushed a commit that referenced this pull request May 30, 2018
…ialization. (#744)

* Start

* Fixes

* Removing imports

* PR comments

* Removing log
SrushtiGangireddy pushed a commit that referenced this pull request May 30, 2018
* RW-391 created applicaiton test.

* RW-391 attempt at getting @SpringBootTest working.

* RW-391 more experimentation.

* RW-391 successfuly SpringBootTest!

* RW-391 removed unused properties.

* RW-391 moved test case to integration.

* RW-391 replace with master copy.

* Tweak deploy docs and logging (#731)

* Js/demo form update (#728)

* New "select boxes"

* Stylin

* Stub in ethnicity

* Tests & linting

* Small bugfix, more linting

* Adding non-standard columns to observation table in CDM schema (#729)

* Adding non-standard columns to observation table in CDM schema

* Updating comment

* Updated README

* Refactor detail tab pages to use single page (#733)

* Refactor detail tab pages to use single page

* Add tab name

* Fix "Google Bucket" link (#736)

* RW-391 finding a way to programmatically test all beans.

* Add ability to run just the tests (#735)

* Add ability to run just the tests

* Remove unused options

* fix for new notebook button being truncated (#738)

* Blrubenstein/not found page (#734)

* Not Found Page

* Fix spacing

* PR Comments

* Move to better not found pattern.

* Fix linting?

* Fix sidebar, cohort-review wrapper, persistent scrollbar (#737)

* Fix sidebar, cohort-review wrapper, persistent scrollbar

* Add comment about height

* Clean up PR (#740)

* RW-391

* Remove imports, get rid of one getme call (#741)

* Remove imports, get rid of one getme call

* Make loc public

* Add BSD 3-clause LICENSE (#743)

* [RW-514] Joining to tables after LIMIT where possible in cohort materialization. (#744)

* Start

* Fixes

* Removing imports

* PR comments

* Removing log

* RW-391 added comments to data.sql and schema.sql and programmtically load beans were interested in testing.

* RW-391 cleanup of unused imports.

* RW-391 removed unused inserts.

* RW-391 removed println.

* RW-391 added comment about initializing the database.

* RW-391 added maps to list for iteration.

* RW-391 created applicaiton test.

* RW-391 attempt at getting @SpringBootTest working.

* RW-391 more experimentation.

* RW-391 successfuly SpringBootTest!

* RW-391 removed unused properties.

* RW-391 moved test case to integration.

* RW-391 replace with master copy.

* RW-391 finding a way to programmatically test all beans.

* RW-391

* RW-391 added comments to data.sql and schema.sql and programmtically load beans were interested in testing.

* RW-391 cleanup of unused imports.

* RW-391 removed unused inserts.

* RW-391 removed println.

* RW-391 added comment about initializing the database.

* RW-391 added maps to list for iteration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants