Skip to content

Conversation

@Ewan-Keith
Copy link
Contributor

firstly, thanks for the tool!

This PR adds support for databricks sql based on othe ther implementations. DBx does have support for primary/foreign keys, although they're not enforced. As they're still extremely useful for documentation I've pulled them through via the driver.

Some unit testing is included but as a cloud platform databricks doesn't make integration testing hugely easy. I have copied the snowflake TPC-H SF1 DDL in ./testdata to use DBx SQL syntax and have used this to carry out manual testing. The output of this testing is stored at sample/databricks. Databricks has recently released a free version which can be spun up with just an email (no payment details or cloud infra required) so if the maintainers wanted to it should be relatively straightforward to setup e2e testing.

2 auth mechanisms are supported, depending on whether a user or service principal is being used to execute the commands. This means that the DSN Ive landed on for tbls use isn't exactly the same as the internal dsn used by the dbx go library but this felt like an acceptable tradeoff for a clearer interface.

finally, while local testing I wanted to avoid accidentally committing my own dbx creds so have added tbls.yml to the .gitignore file. I don't know if that's desirable or not, feel free to remove it if not.

I think I've got everywhere that needs updating for a new database updated, but just point me to any I've missed!

Ewan-Keith and others added 14 commits August 8, 2025 13:47
- Convert Snowflake TPC-H DDL to Databricks SQL syntax
- Add primary keys and foreign key constraints for all 8 tables
- Include detailed table and column descriptions based on TPC-H specification
- Proper dependency ordering for constraint creation
- Support for testing Databricks driver integration

Tables: REGION, NATION, PART, SUPPLIER, CUSTOMER, PARTSUPP, ORDERS, LINEITEM
Total: 65 columns with full business context documentation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…onships

- Add Databricks SQL driver integration to main.go with databricks-sql-go dependency
- Create dedicated Databricks driver in drivers/databricks/ with full schema analysis
- Implement AnalyzeDatabricks() function for proper DSN handling without dburl parsing
- Support tables, views, columns, constraints, and complete foreign key relationships
- Use REFERENTIAL_CONSTRAINTS and KEY_COLUMN_USAGE system tables for accurate FK mapping
- Handle Databricks three-level naming (catalog.schema.table) and query parameters
- Provide graceful fallback for information schema features that may not be available
- Follow existing driver patterns and conventions for consistency with other databases

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Resolves issues where table-specific markdown files and schema.json output
were missing foreign key relationships and constraint metadata. The driver
now properly populates:

- Column ParentRelations/ChildRelations for foreign key relationships
- Constraint.Columns field with column names from key_column_usage
- Constraint.ReferencedTable/ReferencedColumns for FK constraints
- Complete constraint definitions with proper SQL formatting

This ensures that:
- Table markdown files display relationships correctly in relations sections
- Third-party tools using schema.json have complete relationship data
- Foreign key constraints show proper column mappings and references

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…mprehensive tests

Major improvements to the Databricks driver constraint handling:

- Replace N+1 queries with single SQL aggregation query using COLLECT_LIST()
- Remove intermediate constraintData struct for cleaner code flow
- Add parseArrayString() helper to handle Databricks array string format
- Implement comprehensive unit tests (32 test cases total):
  - TestParseArrayString: 10 test cases covering array parsing edge cases
  - TestBuildConstraintDefinition: 22 test cases covering all constraint types
- Apply standard Go formatting with go fmt

Performance improvements:
- Single query per table instead of 1 + N queries for constraints
- Direct SQL aggregation eliminates application-side grouping logic
- Same exact functionality with significantly improved performance

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link
Owner

@k1LoW k1LoW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ewan-Keith GREAT WORK!!

Thank you!!

I have two minor requests for revision. Please consider them.

README.md Outdated

**Databricks:**

**Personal Access Token (PAT) Authentication:**
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure consistency with other data sources, I would appreciate it if you could either discontinue the use of bold formatting or add explanatory comments within the code blocks.

@Ewan-Keith
Copy link
Contributor Author

Thanks! both formatting changes made 👍

Copy link
Owner

@k1LoW k1LoW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks GREAT!! Thank you!!

@k1LoW k1LoW added enhancement New feature or request minor labels Sep 9, 2025
@k1LoW k1LoW merged commit ef9332c into k1LoW:main Sep 9, 2025
3 checks passed
@github-actions github-actions bot mentioned this pull request Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request minor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants