-
Notifications
You must be signed in to change notification settings - Fork 187
Add databricks support #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Convert Snowflake TPC-H DDL to Databricks SQL syntax - Add primary keys and foreign key constraints for all 8 tables - Include detailed table and column descriptions based on TPC-H specification - Proper dependency ordering for constraint creation - Support for testing Databricks driver integration Tables: REGION, NATION, PART, SUPPLIER, CUSTOMER, PARTSUPP, ORDERS, LINEITEM Total: 65 columns with full business context documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…onships - Add Databricks SQL driver integration to main.go with databricks-sql-go dependency - Create dedicated Databricks driver in drivers/databricks/ with full schema analysis - Implement AnalyzeDatabricks() function for proper DSN handling without dburl parsing - Support tables, views, columns, constraints, and complete foreign key relationships - Use REFERENTIAL_CONSTRAINTS and KEY_COLUMN_USAGE system tables for accurate FK mapping - Handle Databricks three-level naming (catalog.schema.table) and query parameters - Provide graceful fallback for information schema features that may not be available - Follow existing driver patterns and conventions for consistency with other databases 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Resolves issues where table-specific markdown files and schema.json output were missing foreign key relationships and constraint metadata. The driver now properly populates: - Column ParentRelations/ChildRelations for foreign key relationships - Constraint.Columns field with column names from key_column_usage - Constraint.ReferencedTable/ReferencedColumns for FK constraints - Complete constraint definitions with proper SQL formatting This ensures that: - Table markdown files display relationships correctly in relations sections - Third-party tools using schema.json have complete relationship data - Foreign key constraints show proper column mappings and references 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…mprehensive tests Major improvements to the Databricks driver constraint handling: - Replace N+1 queries with single SQL aggregation query using COLLECT_LIST() - Remove intermediate constraintData struct for cleaner code flow - Add parseArrayString() helper to handle Databricks array string format - Implement comprehensive unit tests (32 test cases total): - TestParseArrayString: 10 test cases covering array parsing edge cases - TestBuildConstraintDefinition: 22 test cases covering all constraint types - Apply standard Go formatting with go fmt Performance improvements: - Single query per table instead of 1 + N queries for constraints - Direct SQL aggregation eliminates application-side grouping logic - Same exact functionality with significantly improved performance 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
k1LoW
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README.md
Outdated
|
|
||
| **Databricks:** | ||
|
|
||
| **Personal Access Token (PAT) Authentication:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure consistency with other data sources, I would appreciate it if you could either discontinue the use of bold formatting or add explanatory comments within the code blocks.
Co-authored-by: Ken’ichiro Oyama <[email protected]>
|
Thanks! both formatting changes made 👍 |
k1LoW
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks GREAT!! Thank you!!
firstly, thanks for the tool!
This PR adds support for databricks sql based on othe ther implementations. DBx does have support for primary/foreign keys, although they're not enforced. As they're still extremely useful for documentation I've pulled them through via the driver.
Some unit testing is included but as a cloud platform databricks doesn't make integration testing hugely easy. I have copied the snowflake TPC-H SF1 DDL in
./testdatato use DBx SQL syntax and have used this to carry out manual testing. The output of this testing is stored atsample/databricks. Databricks has recently released a free version which can be spun up with just an email (no payment details or cloud infra required) so if the maintainers wanted to it should be relatively straightforward to setup e2e testing.2 auth mechanisms are supported, depending on whether a user or service principal is being used to execute the commands. This means that the DSN Ive landed on for tbls use isn't exactly the same as the internal dsn used by the dbx go library but this felt like an acceptable tradeoff for a clearer interface.
finally, while local testing I wanted to avoid accidentally committing my own dbx creds so have added
tbls.ymlto the .gitignore file. I don't know if that's desirable or not, feel free to remove it if not.I think I've got everywhere that needs updating for a new database updated, but just point me to any I've missed!