-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Support multi-byte search for query names and descriptions #3908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* add multi_byte_support_enabled option to organization settings * add `ilike %...%` to query search conditions when the option is enabled
503780a
to
6179e43
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking care of this. Please see comments.
redash/models/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it enough to match only name
and description
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it is popular, but search_vector
contains the query's SQL too, so we can find by some terms in SQL with this (only with small overhead). How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, you can add the query_text
column to the search instead. But I think it's better to leave it out, because for some search terms it can yield really bad results (like when you search for something that is also a table name). When using the regular search it has weights in place, so those are ranked lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your advise. I have removed search_vector
for multi-byte search. It still provides reasonable search results for me.
Co-Authored-By: Arik Fraimovich <[email protected]>
Test summaryRun details
View run in Cypress Dashboard ➡️ This comment has been generated by cypress-bot as a result of your project's GitHub integration settings. You can manage this integration in your project's settings in the Cypress Dashboard |
@arikfr So I'm not sure if this should block this PR, I just wondered whether the "search backends" idea would still useful for this? Looking back, I'm not super convinced the tsvector based search delivered what we hoped to get -- smooth deployment and improved results. Especially given the fact that tsvector is limited in many ways and excludes a huge part of the market for non-latin languages. My suggestion would be to:
|
Of course the additional search backends (e.g. tsvector or Elasticsearch) could even be shipped and just made optional (e.g. when a ES cluster is available). |
@jezdez this is good feedback, but will be lost once this is merged. Let's move it to https://discuss.redash.io/c/development? |
Good idea, expanded on it in https://discuss.redash.io/t/search-backend-proposal/4080 |
Merged 👍 |
cls.name.ilike(pattern), | ||
cls.description.ilike(pattern) | ||
) | ||
).order_by(Query.id).limit(limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed -- maybe we should order by created_at
or updated_at
timestamps?
…#3908) * Support multi-byte search for query names and descriptions * add multi_byte_support_enabled option to organization settings * add `ilike %...%` to query search conditions when the option is enabled * Improve description for multi_byte_search_enabled option Co-Authored-By: Arik Fraimovich <[email protected]> * Remove tsvector from search when multi_byte_search_enabled * Add a multi-byte search test case
ilike %...%
to query search conditions when the option is enabledWhat type of PR is this? (check all applicable)
Description
Since postgres tsvector does not work well with multi-byte characters in CJK- (and some others) launguages , currently queries named in multi-byte characters are often fails to be searched as described #2622.
This fixes the issue by adding an option to enable multi-byte compliant search into organization settings, and if it is enabled, search for queries using
<name or description> ilike %<search_term>%
pattern in addition to tsvector.This search method is slower than tsvector search, so it is recommended only when multi-byte search is more important than speed.
Related Tickets & Documents
This closes #2622.
Mobile & Desktop Screenshots/Recordings (if there are UI changes)
(If the option is disabled, nothing is hit in
ユーザー
search results.)