Skip to content

Conversation

@Harold-lkk
Copy link
Collaborator

Delete category in rec json

    "metainfo": {
        "dataset_type": "TextRecogDataset",
        "task_name": "textrecog",
        "category": [
            {
                "id": 0,
                "name": "text"
            }
        ]
    }

TextRecogCropConverter defaults not to padding in crop image

@Harold-lkk
Copy link
Collaborator Author

merge before #1506

@xinke-wang
Copy link
Collaborator

A question here, in 0.x version, the pad ratio was set as 0.4 and 0.2 by default. I am not sure if this matters to the test performance. Though personally, I think it should be set to 0 as this PR, maybe someone should test the model to see if it works as expected.

def crop_img(src_img, box, long_edge_pad_ratio=0.4, short_edge_pad_ratio=0.2):
"""Crop text region with their bounding box.
Args:
src_img (np.array): The original image.
box (list[float | int]): Points of quadrangle.
long_edge_pad_ratio (float): Box pad ratio for long edge
corresponding to font size.
short_edge_pad_ratio (float): Box pad ratio for short edge
corresponding to font size.
"""

@Harold-lkk
Copy link
Collaborator Author

A question here, in 0.x version, the pad ratio was set as 0.4 and 0.2 by default. I am not sure if this matters to the test performance. Though personally, I think it should be set to 0 as this PR, maybe someone should test the model to see if it works as expected.

def crop_img(src_img, box, long_edge_pad_ratio=0.4, short_edge_pad_ratio=0.2):
"""Crop text region with their bounding box.
Args:
src_img (np.array): The original image.
box (list[float | int]): Points of quadrangle.
long_edge_pad_ratio (float): Box pad ratio for long edge
corresponding to font size.
short_edge_pad_ratio (float): Box pad ratio for short edge
corresponding to font size.
"""

I have tested. Difference image shape will affect the performance.
In the 0.x version, the crop image is only used for ocr.py, because of bad performance of text detection

@xinke-wang
Copy link
Collaborator

A question here, in 0.x version, the pad ratio was set as 0.4 and 0.2 by default. I am not sure if this matters to the test performance. Though personally, I think it should be set to 0 as this PR, maybe someone should test the model to see if it works as expected.

def crop_img(src_img, box, long_edge_pad_ratio=0.4, short_edge_pad_ratio=0.2):
"""Crop text region with their bounding box.
Args:
src_img (np.array): The original image.
box (list[float | int]): Points of quadrangle.
long_edge_pad_ratio (float): Box pad ratio for long edge
corresponding to font size.
short_edge_pad_ratio (float): Box pad ratio for short edge
corresponding to font size.
"""

I have tested. Difference image shape will affect the performance. In the 0.x version, the crop image is only used for ocr.py, because of bad performance of text detection

Ok. It is noteworthy that some converters such as totaltext converter also used the crop_image and the default setting. I am not sure if it affects totaltext's performance.

dst_img = crop_img(image, anno['bbox'])

@Harold-lkk
Copy link
Collaborator Author

Harold-lkk commented Nov 15, 2022

A question here, in 0.x version, the pad ratio was set as 0.4 and 0.2 by default. I am not sure if this matters to the test performance. Though personally, I think it should be set to 0 as this PR, maybe someone should test the model to see if it works as expected.

def crop_img(src_img, box, long_edge_pad_ratio=0.4, short_edge_pad_ratio=0.2):
"""Crop text region with their bounding box.
Args:
src_img (np.array): The original image.
box (list[float | int]): Points of quadrangle.
long_edge_pad_ratio (float): Box pad ratio for long edge
corresponding to font size.
short_edge_pad_ratio (float): Box pad ratio for short edge
corresponding to font size.
"""

I have tested. Difference image shape will affect the performance. In the 0.x version, the crop image is only used for ocr.py, because of bad performance of text detection

Ok. It is noteworthy that some converters such as totaltext converter also used the crop_image and the default setting. I am not sure if it affects totaltext's performance.

dst_img = crop_img(image, anno['bbox'])

Maybe it's a bug. I check other data_converters, and they are all crop_img(image, anno['box], 0, 0)

@gaotongxiao
Copy link
Collaborator

gaotongxiao commented Nov 15, 2022

@Harold-lkk Need a followup PR to avoid BC-breaking changes

@gaotongxiao gaotongxiao merged commit 00254f0 into open-mmlab:dev-1.x Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants