-
-
Notifications
You must be signed in to change notification settings - Fork 306
implementing search-with-markers #777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| -a, --add URL [tag, ...] | ||
| -a, --add URL [+|-] [tag, ...] | ||
| bookmark URL with comma-separated tags | ||
| (prepend tags with '+' or '-' to use fetched tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…Forgot to update readme in #775 😅
|
|
||
| strip_delim = lambda s, delim=DELIM, sub=' ': str(s).replace(delim, sub) | ||
| taglist = lambda ss: sorted(s.lower() for s in set(ss) if s) | ||
| taglist = lambda ss: sorted(set(s.lower().strip() for s in ss if (s or '').strip())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensuring taglist() works correctly with unstripped tags
| taglist = lambda ss: sorted(s.lower() for s in set(ss) if s) | ||
| taglist = lambda ss: sorted(set(s.lower().strip() for s in ss if (s or '').strip())) | ||
| like_escape = lambda s, c='`': s.replace(c, c+c).replace('_', c+'_').replace('%', c+'%') | ||
| split_by_marker = lambda s: re.split(r'\s+(?=[.:>#*])', s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically how bukubrow does it
| return self.value != other and ((self.value < other) == self.ascending) | ||
|
|
||
| def __repr__(self): | ||
| return ('+' if self.ascending else '-') + repr(self.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to apply a sort direction to each field individually (when sorting in Python)
| valid = list(names) + list(names.values()) + ['tags'] | ||
| _fields = [(re.sub(r'^[+-]', '', s), not s.startswith('-')) for s in (fields or [])] | ||
| _fields = [(names.get(field, field), direction) for field, direction in _fields if field in valid] | ||
| return _fields or [('id', True)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing and validating an ordering description, converting it into a simple format for internal use.
| text_fields = (set() if not ignore_case else {'url', 'desc', 'title', 'tags'}) | ||
| get = lambda x, k: (getattr(x, k) if not ignore_case or k not in text_fields else str(getattr(x, k) or '').lower()) | ||
| order = self._ordering(fields, for_db=False) | ||
| return sorted(bookmark_vars(records), key=lambda x: [SortKey(get(x, k), ascending=asc) for k, asc in order]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting already fetched data… is not actually used anywhere in the code, but it might come in handy anytime.
| """Converts field list to SQL 'ORDER BY' parameters. (See also BukuDb._ordering().)""" | ||
| text_fields = (set() if not ignore_case else {'url', 'desc', 'metadata', 'tags'}) | ||
| return ', '.join(f'{field if field not in text_fields else "LOWER("+field+")"} {"ASC" if direction else "DESC"}' | ||
| for field, direction in self._ordering(fields)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generates a list of ORDER BY SQL clauses
| """ | ||
|
|
||
| return self._fetch('SELECT * FROM bookmarks', lock=lock) | ||
| return self._fetch(f'SELECT * FROM bookmarks ORDER BY {self._order(order)}', lock=lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default behaviour (when order is not specified) won't be affected here
| if not s: | ||
| return [] | ||
| tags = ([s] if regex and not markers else taglist(s.split(DELIM))) | ||
| return [('metadata', deep, s), ('url', deep, s), ('desc', deep, s)] + (tags and [('tags', deep, *tags)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each token includes the field name, its own deep value (to account for special tags behaviour), and one or more parameter (again, to account for tags since they cannot include ,).
Note that for a token to be matched, all of its parameters must be matched (i.e. when searching for foo,bar,baz in tags, all three tags must be matched for the token to match).
And for the keyword to match, any of its tokens must be matched.
| param = border(field, param[0]) + re.escape(param) + border(field, param[-1]) | ||
| args += [param] | ||
| clauses += (_clauses if len(_clauses) < 2 else [f'({" AND ".join(_clauses)})']) | ||
| return ' OR '.join(clauses), args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on the original implementation of searchdb(); regex=True means that deep is ignored and only one param is expected in a token (since comma-splitting is not done in regex mode), and otherwise we rely on \b to determine the word edge (or , when working with tags).
Note that I removed stripping / from the end of the string, since it's easy to not include a slash for the user if it's not necessary, but if it's stripped then it's impossible to actually search for it correctly (e.g. specifying .com/ in URL search may not produce the desired results).
| q0 = q0[:-3] + ' AS score FROM bookmarks WHERE score > 0 ORDER BY score DESC)' | ||
| query = ('SELECT id, url, metadata, tags, desc, flags\nFROM (SELECT *, (' + | ||
| '\n + '.join(map(_count, clauses)) + | ||
| f') AS score\n FROM bookmarks WHERE score > 0 ORDER BY score DESC, {_order})') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query-generating code itself is much more compact now.
(As for the \ns, they make no difference to the DB engine, but reading logged queries is easier when they're formatted.)
| q0 = q0[:-3] + ' AS score FROM bookmarks WHERE score > 0 ORDER BY score DESC)' | ||
| query = ('SELECT id, url, metadata, tags, desc, flags\nFROM (SELECT *, (' + | ||
| '\n + '.join(map(_count, clauses)) + | ||
| f') AS score\n FROM bookmarks WHERE score > 0 ORDER BY score DESC, {_order})') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that since score is the first clause of ORDER BY, the order value only affects search results that have the same "score".
| query = ('SELECT id, url, metadata, tags, desc, flags FROM bookmarks WHERE (' + | ||
| f' {search_operator} '.join("tags LIKE '%' || ? || '%'" for tag in qargs) + | ||
| ')' + ('' if not excluded_tags else ' AND tags NOT REGEXP ?') + | ||
| f' ORDER BY {_order}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than replacing the ORDER BY clause, the changes here don't really affect the produced SQL.
| all_keywords: bool = False, | ||
| deep: bool = False, | ||
| regex: bool = False, | ||
| stag: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding default values allows using these as keyword parameters.
(…And yes, stag is not actually a string here from what I could tell – the code even uses .join() to convert it into a string 😅)
| keywords : list of str | ||
| Keywords to search. | ||
| without : list of str | ||
| Keywords to exclude; ignored if empty. Default is None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this parameter simplifies search invocation
| for row in resultset: | ||
| out += '- [' + title(row, 'Untitled') + '](' + row.url + ')' | ||
| _title = title(row) | ||
| out += (f'- <{row.url}>' if not _title else f'- [{_title}]({row.url})') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can handle <raw URLs> as input, it seemed like a good idea to output them as well – thus allowing to reimport the same exact data that was exported (…sans descriptions, I suppose)
| if row.tags: | ||
| out += ' TAGS="' + html.escape(row.tags).encode('ascii', 'xmlcharrefreplace').decode('utf-8') + '"' | ||
| out += '>\n<title>{}</title>\n</bookmark>\n'\ | ||
| out += '>\n <title>{}</title>\n </bookmark>\n'\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the same formatting as in the HTML export (…mostly for sake of appearances)
| if not isinstance(obj, BukuDb): | ||
| LOGERR('Not a BukuDb instance') | ||
| return | ||
| bdb = obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is just less confusing
| if nav.startswith('s '): | ||
| results = obj.searchdb(nav[2:].split(), False, deep) | ||
| keywords = (nav[2:].split() if not markers else split_by_marker(nav[2:])) | ||
| results = bdb.searchdb(keywords, deep=deep, markers=markers, order=order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search behaviour only changes after enabling search-with-markers
| ids |= set(range(vals[0], vals[-1] + 1)) | ||
| else: | ||
| print('Invalid input') | ||
| ids and bdb.print_rec(ids, order=order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows to sort the printed records according to selected order
| elif args.unlock is not None: | ||
| BukuCrypt.decrypt_file(args.unlock) | ||
|
|
||
| order = [s for ss in (args.order or []) for s in re.split(r'\s*,\s*', ss.strip()) if s] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces and/or commas can be used to separate fields in the order description
| search_opted = True | ||
| tags_search = bool(args.stag is not None and len(args.stag)) | ||
| exclude_results = bool(args.exclude is not None and len(args.exclude)) | ||
| search_results, search_opted = None, True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"not None and not empty" is how truthiness is defined for lists anyway
| else: | ||
| LOGERR('no keyword') | ||
| search_results = bdb.search_keywords_and_filter_by_tags( | ||
| args.sany, deep=args.deep, stag=args.stag, markers=args.markers, without=args.exclude, order=order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…Yes, all of that got replaced by a single call (which still does the same)
| if args.json is None and not args.format: | ||
| num = 10 if not args.count else args.count | ||
| prompt(bdb, search_results, oneshot, args.deep, num=num) | ||
| prompt(bdb, search_results, noninteractive=oneshot, deep=args.deep, markers=args.markers, order=order, num=num) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using keyword params explicitly
| @media (min-width: 1200px) { | ||
| .filters .filter-op {width: 280px !important} | ||
| .filters .filter-val {width: 595px !important} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, flask-admin sets style="width: …" on all dropdowns in the filters (matching whatever the current width happens to be); thus the need for !important, and for reiterating the default width.
| adder.onclick = () => { | ||
| try { | ||
| let key = bukuFilters().first().val() || 'buku_search_markers_match_all'; | ||
| setTimeout(() => sync(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a buku filter is added, it's automatically matched to already existing ones (defaulting to the mode used in navbar search).
| let _value = filterInput(this).val(); | ||
| $(this).val(evt.val).triggerHandler('change', '$norecur$'); | ||
| filterInput(this).val(_value); // retaining the last value for other filters | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When one of the buku filter modes is changed, all others get updated as well (and their keywords are restored)
| {% block menu_links %} | ||
| {{ super() }} | ||
| <form class="navbar-form navbar-right" action="{{ url_for('bookmark.index_view') }}" method="GET"> | ||
| <form class="navbar-form navbar-right" action="{{ url_for('admin.search') }}" method="POST"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search splitting is done in the admin.search handler, thus replacing the action= value.
| {{ super() }} | ||
| {{ buku.close_if_popup() }} | ||
| {{ buku.focus('form[action="/"]') }} | ||
| {{ buku.focus('main form[action="/"]') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both forms have the same action now, thus specifying which one we want focused on page load.
| this.title = this.title.replace(/'(.*?)'/g, `'<strong><tt>$1</tt></strong>'`).replace(/\bFULL\b/g, `<strong><em>full</em></strong>`); | ||
| }).attr('data-container', 'body').attr('data-trigger', 'hover').tooltip(); | ||
| </script> | ||
| <style>.tooltip-inner {text-align: left; white-space: pre; max-width: 600px}</style> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling (and prettifying) Bootstrap tooltips
| deep, all_keywords = (x and not regex for x in [form.deep.data, form.all_keywords.data]) | ||
| flt = bs_filters.BookmarkBukuFilter(deep=deep, regex=regex, markers=markers, all_keywords=all_keywords) | ||
| vals = ([('', form.keyword.data)] if not markers else enumerate(buku.split_by_marker(form.keyword.data))) | ||
| url = url_for('bookmark.index_view', **{filter_key(flt, idx): val for idx, val in vals}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old behaviour without markers, otherwise splitting into multiple keywords based on detected markers.
(…Also I removed combinations of regex with deep/all_keywords since those do nothing 😅)
| (1, "http://example.org", None, ",", "", 0), | ||
| (2, "http://google.com", "Google", ",", "", 0), | ||
| (2, "http://example.org", None, ",bar,baz,foo,", "", 0), | ||
| (3, "http://google.com", "Google", ",bar,baz,foo,", "", 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding tags to test their export as well. (…Also fixing indices since these wouldn't be encountered in actual data)
| assert split_by_marker(search_string) == [ | ||
| ' global substring', '.title substring', ':url substring', ':https', | ||
| '> description substring', '#partial,tags:', '#,exact,tags,', '*another global substring ', | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split condition is “one or multiple spaces followed by a marker” (with the marker being retained)
| assert SortKey('foo', ascending=False) < SortKey('bar', ascending=False) | ||
| assert not SortKey('foo', ascending=False) < SortKey('foo', ascending=False) # pylint: disable=unnecessary-negation | ||
| assert not SortKey('foo', ascending=False) > SortKey('foo', ascending=False) # pylint: disable=unnecessary-negation | ||
| assert not SortKey('foo', ascending=False) > SortKey('bar', ascending=False) # pylint: disable=unnecessary-negation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing </> specifically here
| assert not SortKey('foo', ascending=False) > SortKey('bar', ascending=False) # pylint: disable=unnecessary-negation | ||
|
|
||
| custom_order = lambda s: (SortKey(len(s), ascending=False), SortKey(s, ascending=True)) | ||
| assert sorted(['foo', 'bar', 'baz', 'quux'], key=custom_order) == ['quux', 'bar', 'baz', 'foo'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Sort longer strings first, ordering lexicographically when they have the same length”
| " OR (tags LIKE ('%' || ? || '%') AND tags LIKE ('%' || ? || '%') AND tags LIKE ('%' || ? || '%'))"), | ||
| ]) | ||
| def test_search_clause(bukuDb, regex, tokens, args, clauses): | ||
| assert bukuDb()._search_clause(tokens, regex=regex) == (clauses, args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoroughly testing search SQL generation
ff92356 to
a4c5de9
Compare
|
Thank you! |
|
Please update the ToDO list tracker. Also, is the |
Ah, no; forgot about that. …Say; are the completion lists meant to be sorted? I think the last few options that were added to them are misplaced if that's the case 😅 |
fixes #740:
.for title,>for description,:for URL,#for tags;*or no prefix for all fields)--markers), interactive shell (m), bukuserverbukufilter mode (markers*), bukuserver index page/navbar searchalso:
--order), interactive shell (v… more fitting letters are taken already 😅), bukuserver bookmarks filter (order); + aBukuDbmethod for sorting already fetched bookmarks (with matching behaviour)bukufilter modes are synchronized (without resetting them)Note: multiple order filters can be affected by an upstream bug when edited.
Screenshots
CLI
("sort records by tags in ascending order, and those with the same tags by URL in descending order")
buku --order +tags,-url --print | lessbuku --order +tags,-url --print --json | less(“search for
barin titles and.foo/in URLs”)export
interactive shell
webUI – index page
(these are default parameters now since the markers don't really interfere with most searches unless you actually include them)
(navbar search applies the same defaults so it's the same search as above)
webUI – bookmarks
(this is the search that either of these forms will submit when using the query







. bar :.foo/)webUI – filters width
(these are based on primary




@media (min-width: *)values used by Bootstrap)(the last one is the default width which is used as the fallback value when none of the above constraints are met)