Skip to content

Conversation

@bsowell
Copy link
Contributor

@bsowell bsowell commented Sep 25, 2025

The default remains "ltr", but if you pass "rtl", it will sort elements. based on that reading order.

The default remains "ltr", but if you pass "rtl", it will sort elements. based
on that reading order.
@bsowell bsowell requested review from alexaryn and Copilot September 25, 2025 03:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for reading direction ("ltr" or "rtl") to the bbox_sort and xycut sorting algorithms. The default remains "ltr" for backward compatibility, but both sorting methods now respect right-to-left reading order when "rtl" is specified.

  • Added ReadingDirection type and optional reading_direction parameter throughout the sorting pipeline
  • Modified sorting key functions to handle right-to-left ordering by negating x-coordinates
  • Updated xycut algorithm to reverse column order when splitting on x-axis in RTL mode

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
lib/sycamore/sycamore/utils/xycut.py Added reading direction support to xycut sorting with axis-aware RTL handling
lib/sycamore/sycamore/utils/element_sort.py Updated sorting interface with PageSorter protocol and reading direction parameter
lib/sycamore/sycamore/utils/bbox_sort.py Added RTL-aware sorting keys and updated two-column sorting logic
lib/sycamore/sycamore/tests/unit/utils/test_xycut_sort.py Added comprehensive RTL test cases for xycut sorting
lib/sycamore/sycamore/tests/unit/utils/test_bbox_sort.py Added RTL test cases for bbox sorting

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@@ -1,17 +1,23 @@
from typing import Optional, Union
from collections.abc import Callable
from typing import Optional, Union, Literal, Protocol
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing import of collections.abc.Callable which was removed but may still be needed elsewhere in the codebase. The Protocol import should be grouped with other typing imports.

Copilot uses AI. Check for mistakes.
from sycamore.data import Element


ReadingDirection = Literal["ltr", "rtl"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the use of Literal, as mypy tends to complain vociferously if one writes a wrapper function that blindly passing the string through. We could consider having a boolean "left_to_right", unless we plan to support vertical stuff like old Chinese. Even then, we could make the horizontal and vertical orthogonal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

def elem_top_right(elem: Element) -> tuple:
bbox = elem.data.get("bbox")
if bbox:
return (bbox[1], -bbox[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my ideas was to pass through some sort of coordinate transform function (which could be the identity). Also, I wasn't sure if we wanted negative numbers or we wanted 1.0 - bbox[0].

The other question I had was if it was OK to falsify the data, or if it was better just to invert the sorting and comparison logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your concerns here? Happy to do it a different way, but I can't tell if there is a place where this will give incorrect output, or if this is just forward looking. I'm not sure it makes sense to generalize too much here, does it? Are there that many other transforms you would want to apply for sorting bboxes at the moment?



def sort_page(elems: list[Element], *, mode: SortSpec = None) -> None:
def sort_page(elems: list[Element], *, mode: SortSpec = None, reading_direction: ReadingDirection = "ltr") -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like if we're going to the trouble of plumbing a new parameter through, it could be more of a context/options object, rather than this ad-hoc value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it, though again I wonder how much value we are going to get by generalizing at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants