-
Notifications
You must be signed in to change notification settings - Fork 10
[RW-514] Joining to tables after LIMIT where possible in cohort materialization. #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9609b35
to
80a1c44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, mostly minor comments. Hopefully we don't need to make any more query optimizations; I'm not sure there is enough flexibility to add another non-trivial optimization like this.
Also worth noting that we'll still hit this worst case performance in the event that the user includes a column from every joined table in either the filters or order by. Depending on how aggressively users are doing that, we may need further changes
column.columnAlias); | ||
// If the table we're retrieving this from is found in the inner query, select the column | ||
// in both the inner and outer query. | ||
if (column.tableAlias.equals(queryState.mainTableName) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will beforeLimitTables
not include the main table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No... JoinedTableInfo doesn't really make sense for the main table, so I'm just special casing it.
outerSelectExpressions.add(columnSql); | ||
} | ||
} | ||
Joiner commaJoiner = Joiner.on(", "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not used util next codeblock, would move down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
Joiner commaJoiner = Joiner.on(", "); | ||
if (hasAfterLimitTables) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
should be unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Columns that appear in the inner ORDER BY appear in the outer ORDER BY, too; make sure | ||
// they get returned. | ||
String columnAlias = getColumnAlias(column.columnInfo.getColumnName()); | ||
log.info("COLUMN ALIAS = " + columnAlias); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
outerSql.append(commaJoiner.join(outerOrderByExpressions)); | ||
return outerSql.toString(); | ||
} else { | ||
return innerSql.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: invert the if, early return this, then avoid the indent for the outerSql handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
StringBuilder endSql = handleOrderBy(queryState, tableQuery.getOrderBy()); | ||
endSql.append(" limit "); | ||
ImmutableList<OrderByColumn> orderByColumns = handleOrderBy(queryState, tableQuery.getOrderBy()); | ||
StringBuilder endSql = new StringBuilder(" limit "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps rename to limitOffsetSql; no longer necessarily the end if there's an inner query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; and moved the generation of this SQL into buildSql. (I wanted to apply a LIMIT to the outer SQL too, just for good measure.)
columnConfig); | ||
} else { | ||
TableNameAndAlias tableNameAndAlias = getTableNameAndAlias(columnParts, queryState); | ||
TableNameAndAlias tableNameAndAlias = getTableNameAndAlias(columnParts, queryState, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this function know it should pass true
for beforeLimit
? This doesn't make sense to me; getColumnInfo()
seems like something that could be called for any column, not just those pertaining to WHERE / ORDER BY. If it's currently only called in those cases, that seems like an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Exposed it as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving since all my comments were style or refactoring-type comments. I'm out today - so please add someone else if you want further feedback on this this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need to warn researchers not to use columns on joined tables in WHERE / ORDER BY more than necessary when dealing with large cohorts.
Hopefully a lot of joins will be to the concept table and will only show up in the SELECT clause.
I filed https://precisionmedicineinitiative.atlassian.net/browse/RW-531 to allow for cohort materialization to GCS, which can help with cases where BigQuery queries exceed 60 seconds.
columnConfig); | ||
} else { | ||
TableNameAndAlias tableNameAndAlias = getTableNameAndAlias(columnParts, queryState); | ||
TableNameAndAlias tableNameAndAlias = getTableNameAndAlias(columnParts, queryState, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Exposed it as a parameter.
column.columnAlias); | ||
// If the table we're retrieving this from is found in the inner query, select the column | ||
// in both the inner and outer query. | ||
if (column.tableAlias.equals(queryState.mainTableName) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No... JoinedTableInfo doesn't really make sense for the main table, so I'm just special casing it.
outerSelectExpressions.add(columnSql); | ||
} | ||
} | ||
Joiner commaJoiner = Joiner.on(", "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
Joiner commaJoiner = Joiner.on(", "); | ||
if (hasAfterLimitTables) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Columns that appear in the inner ORDER BY appear in the outer ORDER BY, too; make sure | ||
// they get returned. | ||
String columnAlias = getColumnAlias(column.columnInfo.getColumnName()); | ||
log.info("COLUMN ALIAS = " + columnAlias); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
outerSql.append(commaJoiner.join(outerOrderByExpressions)); | ||
return outerSql.toString(); | ||
} else { | ||
return innerSql.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
StringBuilder endSql = handleOrderBy(queryState, tableQuery.getOrderBy()); | ||
endSql.append(" limit "); | ||
ImmutableList<OrderByColumn> orderByColumns = handleOrderBy(queryState, tableQuery.getOrderBy()); | ||
StringBuilder endSql = new StringBuilder(" limit "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; and moved the generation of this SQL into buildSql. (I wanted to apply a LIMIT to the outer SQL too, just for good measure.)
* RW-391 created applicaiton test. * RW-391 attempt at getting @SpringBootTest working. * RW-391 more experimentation. * RW-391 successfuly SpringBootTest! * RW-391 removed unused properties. * RW-391 moved test case to integration. * RW-391 replace with master copy. * Tweak deploy docs and logging (#731) * Js/demo form update (#728) * New "select boxes" * Stylin * Stub in ethnicity * Tests & linting * Small bugfix, more linting * Adding non-standard columns to observation table in CDM schema (#729) * Adding non-standard columns to observation table in CDM schema * Updating comment * Updated README * Refactor detail tab pages to use single page (#733) * Refactor detail tab pages to use single page * Add tab name * Fix "Google Bucket" link (#736) * RW-391 finding a way to programmatically test all beans. * Add ability to run just the tests (#735) * Add ability to run just the tests * Remove unused options * fix for new notebook button being truncated (#738) * Blrubenstein/not found page (#734) * Not Found Page * Fix spacing * PR Comments * Move to better not found pattern. * Fix linting? * Fix sidebar, cohort-review wrapper, persistent scrollbar (#737) * Fix sidebar, cohort-review wrapper, persistent scrollbar * Add comment about height * Clean up PR (#740) * RW-391 * Remove imports, get rid of one getme call (#741) * Remove imports, get rid of one getme call * Make loc public * Add BSD 3-clause LICENSE (#743) * [RW-514] Joining to tables after LIMIT where possible in cohort materialization. (#744) * Start * Fixes * Removing imports * PR comments * Removing log * RW-391 added comments to data.sql and schema.sql and programmtically load beans were interested in testing. * RW-391 cleanup of unused imports. * RW-391 removed unused inserts. * RW-391 removed println. * RW-391 added comment about initializing the database. * RW-391 added maps to list for iteration. * RW-391 created applicaiton test. * RW-391 attempt at getting @SpringBootTest working. * RW-391 more experimentation. * RW-391 successfuly SpringBootTest! * RW-391 removed unused properties. * RW-391 moved test case to integration. * RW-391 replace with master copy. * RW-391 finding a way to programmatically test all beans. * RW-391 * RW-391 added comments to data.sql and schema.sql and programmtically load beans were interested in testing. * RW-391 cleanup of unused imports. * RW-391 removed unused inserts. * RW-391 removed println. * RW-391 added comment about initializing the database. * RW-391 added maps to list for iteration.
…ialization. (#744) * Start * Fixes * Removing imports * PR comments * Removing log
* RW-391 created applicaiton test. * RW-391 attempt at getting @SpringBootTest working. * RW-391 more experimentation. * RW-391 successfuly SpringBootTest! * RW-391 removed unused properties. * RW-391 moved test case to integration. * RW-391 replace with master copy. * Tweak deploy docs and logging (#731) * Js/demo form update (#728) * New "select boxes" * Stylin * Stub in ethnicity * Tests & linting * Small bugfix, more linting * Adding non-standard columns to observation table in CDM schema (#729) * Adding non-standard columns to observation table in CDM schema * Updating comment * Updated README * Refactor detail tab pages to use single page (#733) * Refactor detail tab pages to use single page * Add tab name * Fix "Google Bucket" link (#736) * RW-391 finding a way to programmatically test all beans. * Add ability to run just the tests (#735) * Add ability to run just the tests * Remove unused options * fix for new notebook button being truncated (#738) * Blrubenstein/not found page (#734) * Not Found Page * Fix spacing * PR Comments * Move to better not found pattern. * Fix linting? * Fix sidebar, cohort-review wrapper, persistent scrollbar (#737) * Fix sidebar, cohort-review wrapper, persistent scrollbar * Add comment about height * Clean up PR (#740) * RW-391 * Remove imports, get rid of one getme call (#741) * Remove imports, get rid of one getme call * Make loc public * Add BSD 3-clause LICENSE (#743) * [RW-514] Joining to tables after LIMIT where possible in cohort materialization. (#744) * Start * Fixes * Removing imports * PR comments * Removing log * RW-391 added comments to data.sql and schema.sql and programmtically load beans were interested in testing. * RW-391 cleanup of unused imports. * RW-391 removed unused inserts. * RW-391 removed println. * RW-391 added comment about initializing the database. * RW-391 added maps to list for iteration. * RW-391 created applicaiton test. * RW-391 attempt at getting @SpringBootTest working. * RW-391 more experimentation. * RW-391 successfuly SpringBootTest! * RW-391 removed unused properties. * RW-391 moved test case to integration. * RW-391 replace with master copy. * RW-391 finding a way to programmatically test all beans. * RW-391 * RW-391 added comments to data.sql and schema.sql and programmtically load beans were interested in testing. * RW-391 cleanup of unused imports. * RW-391 removed unused inserts. * RW-391 removed println. * RW-391 added comment about initializing the database. * RW-391 added maps to list for iteration.
I don't love this code but it should speed up performance dramatically, and I don't see a lot of alternatives. :( Let me know if you have suggestions on how to make it simpler / clearer.