Support multi-byte search for query names and descriptions #3908

sekiyama58 · 2019-06-18T07:20:51Z

add multi_byte_support_enabled option to organization settings
add ilike %...% to query search conditions when the option is enabled

What type of PR is this? (check all applicable)

Bug Fix

Description

Since postgres tsvector does not work well with multi-byte characters in CJK- (and some others) launguages , currently queries named in multi-byte characters are often fails to be searched as described #2622.

This fixes the issue by adding an option to enable multi-byte compliant search into organization settings, and if it is enabled, search for queries using <name or description> ilike %<search_term>% pattern in addition to tsvector.
This search method is slower than tsvector search, so it is recommended only when multi-byte search is more important than speed.

Related Tickets & Documents

This closes #2622.

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

New organization settings for multi-byte search

The result of search result when multi-byte search is enabled

(If the option is disabled, nothing is hit in ユーザー search results.)

* add multi_byte_support_enabled option to organization settings * add `ilike %...%` to query search conditions when the option is enabled

arikfr

Thank you for taking care of this. Please see comments.

arikfr · 2019-06-18T07:22:18Z

redash/models/__init__.py

Isn't it enough to match only name and description?

I'm not sure it is popular, but search_vector contains the query's SQL too, so we can find by some terms in SQL with this (only with small overhead). How do you think?

In this case, you can add the query_text column to the search instead. But I think it's better to leave it out, because for some search terms it can yield really bad results (like when you search for something that is also a table name). When using the regular search it has weights in place, so those are ranked lower.

Thank you for your advise. I have removed search_vector for multi-byte search. It still provides reasonable search results for me.

client/app/pages/settings/OrganizationSettings.jsx

Co-Authored-By: Arik Fraimovich <[email protected]>

cypress · 2019-06-18T07:56:31Z

Test summary

29 • 0 • 0 • 0

Run details


Commit	`6179e43`
Started	Jun 18, 2019 7:35 AM UTC
Ended	Jun 18, 2019 9:44 AM UTC
Duration	09:07 💡
OS	linux Debian - 8.10
Browser	Electron 61.0.3163.100

View run in Cypress Dashboard ➡️

This comment has been generated by cypress-bot as a result of your project's GitHub integration settings. You can manage this integration in your project's settings in the Cypress Dashboard

jezdez · 2019-06-24T09:45:10Z

@arikfr So I'm not sure if this should block this PR, I just wondered whether the "search backends" idea would still useful for this?

Looking back, I'm not super convinced the tsvector based search delivered what we hoped to get -- smooth deployment and improved results. Especially given the fact that tsvector is limited in many ways and excludes a huge part of the market for non-latin languages.

My suggestion would be to:

adding a backend system (like the query runner registry) that can be extended with additional Python packages. It would need to have a CRUD interface, so could be pretty simplistic.
ship the old ILIKE based search as the default backend
implement more complex (tsvector or Elasticsearch) based backends as extensions (either by the Redash team/contributors or 3rd parties?)

jezdez · 2019-06-24T09:46:29Z

Of course the additional search backends (e.g. tsvector or Elasticsearch) could even be shipped and just made optional (e.g. when a ES cluster is available).

redash/models/__init__.py

arikfr · 2019-07-01T05:52:51Z

@jezdez this is good feedback, but will be lost once this is merged. Let's move it to https://discuss.redash.io/c/development?

jezdez · 2019-07-01T14:57:43Z

@jezdez this is good feedback, but will be lost once this is merged. Let's move it to https://discuss.redash.io/c/development?

Good idea, expanded on it in https://discuss.redash.io/t/search-backend-proposal/4080

arikfr · 2019-07-08T07:01:54Z

Merged 👍

arikfr · 2019-07-08T07:02:20Z

redash/models/__init__.py

+                    cls.name.ilike(pattern),
+                    cls.description.ilike(pattern)
+                )
+            ).order_by(Query.id).limit(limit)


Just noticed -- maybe we should order by created_at or updated_at timestamps?

…#3908) * Support multi-byte search for query names and descriptions * add multi_byte_support_enabled option to organization settings * add `ilike %...%` to query search conditions when the option is enabled * Improve description for multi_byte_search_enabled option Co-Authored-By: Arik Fraimovich <[email protected]> * Remove tsvector from search when multi_byte_search_enabled * Add a multi-byte search test case

Support multi-byte search for query names and descriptions

6179e43

* add multi_byte_support_enabled option to organization settings * add `ilike %...%` to query search conditions when the option is enabled

sekiyama58 force-pushed the multi-byte-query-search branch from 503780a to 6179e43 Compare June 18, 2019 07:25

arikfr reviewed Jun 18, 2019

View reviewed changes

Improve description for multi_byte_search_enabled option

f360ae4

Co-Authored-By: Arik Fraimovich <[email protected]>

weekly-digest bot mentioned this pull request Jun 24, 2019

Weekly Digest (17 June, 2019 - 24 June, 2019) #3927

Closed

Remove tsvector from search when multi_byte_search_enabled

6dc917c

arikfr requested changes Jul 1, 2019

View reviewed changes

redash/models/__init__.py Outdated Show resolved Hide resolved

Add a multi-byte search test case

2a4f638

arikfr approved these changes Jul 8, 2019

View reviewed changes

arikfr merged commit 261062d into getredash:master Jul 8, 2019

arikfr reviewed Jul 8, 2019

View reviewed changes

yoshiokatsuneo mentioned this pull request May 25, 2022

Use multi_byte_search_enabled option for My Queries search and Favorite List search #5761

Merged

2 tasks

Support multi-byte search for query names and descriptions #3908

Support multi-byte search for query names and descriptions #3908

Uh oh!

Conversation

sekiyama58 commented Jun 18, 2019

What type of PR is this? (check all applicable)

Description

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

Uh oh!

arikfr left a comment

Choose a reason for hiding this comment

Uh oh!

arikfr Jun 18, 2019

Choose a reason for hiding this comment

Uh oh!

sekiyama58 Jun 18, 2019

Choose a reason for hiding this comment

Uh oh!

arikfr Jun 29, 2019

Choose a reason for hiding this comment

Uh oh!

sekiyama58 Jul 1, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cypress bot commented Jun 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test summary

Run details

Uh oh!

jezdez commented Jun 24, 2019

Uh oh!

jezdez commented Jun 24, 2019

Uh oh!

Uh oh!

arikfr commented Jul 1, 2019

Uh oh!

jezdez commented Jul 1, 2019

Uh oh!

arikfr commented Jul 8, 2019

Uh oh!

arikfr Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cypress bot commented Jun 18, 2019 •

edited

Loading