Skip to content

Conversation

@lydia-zheng
Copy link
Contributor

Description

Extended eraseattrib to a prototyped standalone flag "--eraseattrib_fromfile" that allows for erasure of metadata from a text file with a list of regex.
Erases all metadata attribute fields that match.

Tests

Added a new test to oiiotool-attribs which has been verified painstakenly to be working!
Also tried running the command directly from dist/bin and verified with oiiotool --info -v.

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable. (Check if there is no
    need to update the documentation, for example if this is a bug fix that
    doesn't change the API.)
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • If I added or modified a C++ API call, I have also amended the
    corresponding Python bindings (and if altering ImageBufAlgo functions, also
    exposed the new functionality as oiiotool options).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format before submitting, I definitely will look at the CI
    test that runs clang-format and fix anything that it highlights as being
    nonconforming.

@lydia-zheng lydia-zheng changed the title feat_eraseattrib_fromfile [feat] eraseattrib_fromfile May 16, 2025
@lydia-zheng lydia-zheng changed the title [feat] eraseattrib_fromfile [feat() eraseattrib_fromfile May 16, 2025
@lydia-zheng lydia-zheng changed the title [feat() eraseattrib_fromfile feat(oiio): eraseattrib_fromfile May 16, 2025
@lgritz
Copy link
Collaborator

lgritz commented May 16, 2025

Hi, Lydia, can you please fix the formatting? If you look at the CI clang-format run (look for the red and green highlights) you can see it's just a few lines that you could easily fix by hand if you don't have clang-format set up to run locally.
https://github.com/AcademySoftwareFoundation/OpenImageIO/actions/runs/15058621627/job/42329426760?pr=4763
(I hope that links shows you what it shows me)

@lydia-zheng
Copy link
Contributor Author

Hi, Lydia, can you please fix the formatting? If you look at the CI clang-format run (look for the red and green highlights) you can see it's just a few lines that you could easily fix by hand if you don't have clang-format set up to run locally. https://github.com/AcademySoftwareFoundation/OpenImageIO/actions/runs/15058621627/job/42329426760?pr=4763 (I hope that links shows you what it shows me)

Yes Larry! Already working on a clang-format fix right now!

What I'm more worried about is the bleeding edge gcc14 test that failed? Digging into the log seems to imply that there are pixel differences from a test field that's completely untouched...

Signed-off-by: Lydia Zheng <[email protected]>
Comment on lines 11 to 15

command += oiiotool (OIIO_TESTSUITE_IMAGEDIR+"/tahoe-gps.jpg --eraseattrib_fromfile src/regex_list.txt -o nomakegps.jpg")
command += info_command ("nomakegps.jpg", safematch=True)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test!

@lgritz
Copy link
Collaborator

lgritz commented May 16, 2025

The "bleeding edge" failure is unrelated. I just submitted a fix here: #4765

@lydia-zheng
Copy link
Contributor Author

Taking a look at incorporating all the comments after redo-ing the format fix!

@lgritz
Copy link
Collaborator

lgritz commented May 16, 2025

Because Lydia and I had talked about this offline, I want to fill in a bit of the motivation here for others who are reading:

It's common in a studio to have tools insert all sorts of metadata into files as they're rendered, etc. But a lot of that are things that SHOULD NOT leave the facility -- like security issues from internal pathnames or network info, or secrets like plot details revealed by sequence or character names. So generally you want to scrub the metadata from the final images before they are shipped to the client or to marketing or whatever.

oiiotool's --eraseattrib is handy for this, but it takes a single regex. But we were thinking that:

  1. You may have several patterns to fully describe the metadata you want to be sure not to ship. This could be an unwieldy regex or a cluttered series of multiple --eraseattrib commands.
  2. That list of patterns may change over time, and it would be error prone to have to change it in every script that calls oiiotool.
  3. That list of patterns may need to be different per show.

We thought a solution to facilitate this might be a version of --eraseattrib that takes a filename that contains a list of regexes, one per line of the text file. So that master list of things to scrub could be maintained in a central place and not need to update every oiiotool-calling script in order to update it. Furthermore, you could have the command say something like

$ oiiotool ready_to_ship.exr --eraseattrib_fromfile /blah/$SHOW/nometadata.txt -o clean.exr

to make a single command that allows per-show customization.

@lgritz
Copy link
Collaborator

lgritz commented May 16, 2025

Question for onlookers: Lydia implemented this as a new command, --eraseattrib_fromfile.

An alternate design we discussed was just using the existing command but with a new optional modifier: --eraseattrib:fromfile=1 that causes the subsequent argument to be interpreted as the name of a file containing the regexes, instead of as a regex itself.

Does anybody have a strong preference? Or any other feedback on the basic design?

Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really close. I pointed out one line that is no longer needed.

I'd like to hold this in review for a few days, simply for the sake of allowing others the chance to comment on design of the feature.

Also, before merging, we really should make sure that the documentation mentions this feature.

@lydia-zheng
Copy link
Contributor Author

lydia-zheng commented May 16, 2025

It's really close. I pointed out one line that is no longer needed.

I'd like to hold this in review for a few days, simply for the sake of allowing others the chance to comment on design of the feature.

Also, before merging, we really should make sure that the documentation mentions this feature.

Sounds good Larry, I was checking the tests and it might have updated the branch to merge, feel free to cancel/ ignore.

Adding a list of todos since there’s a chance that I might not be able to work on this soon:

  • remove unneeded line from get_regex_list_from_file
  • add documentation for eraseattrib_fromfile
  • pending feedback for design

@JGoldstone
Copy link
Contributor

Larry, in re: whether to have a separate command-line option to take the list from the file rather than an immediate argument: is there any precedent one way or the other with existing options, which might argue for consistency in the service of least surprise?

And yeah it's an important thing, I wrote something once (as a shell script, so no one could accuse it of harboring a trojan) to strip out all sorts of stuff from the output of an exrheader command. One doesn't want a camera "notes" field that says "Quicksilver doesn't die in this take" to leak. Even with those precautions though producers wouldn't allow my contact to use it. Maybe they figured that someone could deduce a plot point from the difference between the display and data windows.

@lgritz
Copy link
Collaborator

lgritz commented May 16, 2025

Thanks for the feedback, Joseph.

I don't believe that oiiotool has any other commands at the moment that read arguments from files. But maybe it will in the future, so I agree that we want to think carefully about what we establish as the precedent.

Are there other utilities (outside of oiio) that have similar functionality? Ah, the one that popped into my mind for some reason is "rsync", and its man page includes:

            --exclude=PATTERN       exclude files matching PATTERN
            --exclude-from=FILE     read exclude patterns from FILE
            --include=PATTERN       don't exclude files matching PATTERN
            --include-from=FILE     read include patterns from FILE
            --files-from=FILE       read list of source-file names from FILE

And I checked tar (the modern gnu version), which has:

     -T filename, --files-from filename
             In x or t mode, tar will read the list of names to be extracted
             from filename.  In c mode, tar will read names to be archived
             from filename.

So they are using separate commands, and seem to follow a common "--foo-from filename" pattern.

But they don't the kind of "optional command modifiers" that we do, so maybe they didn't consider that part of the design space?

    # Existing command:
    oiiotool ... --eraseattrib pattern ...

    # Design 1: new command
    oiiotool ... --eraseattrib-from filename-containing-patterns ...

    # Design 2: optional modifier
    oiiotool ... --eraseattrib:fromfile=1 filename-containing-patterns ...

I'm mildly preferring the second option, simply because if someday we have many commands we want to operate like this, we don't have to make a completely new command for each instance. We can just make more commands use the same "fromfile" optional modifier. Use of optional modifiers in this way is already a very common oiiotool idioms that users expect.

That's my current thought, anyway. If everybody strongly prefers Design 1, I won't object.

@nrusch
Copy link
Contributor

nrusch commented May 16, 2025

I'm mildly preferring the second option, simply because if someday we have many commands we want to operate like this, we don't have to make a completely new command for each instance. We can just make more commands use the same "fromfile" optional modifier. Use of optional modifiers in this way is already a very common oiiotool idioms that users expect.

I personally prefer the second option as well, for exactly the reasons you've alluded to:

  1. Scales better with the (already large) number of oiiotool options.
  2. Consistent with prior art for command modifiers.

@lgritz lgritz changed the title feat(oiio): eraseattrib_fromfile feat(oiiotool): eraseattrib_fromfile May 16, 2025
@lgritz
Copy link
Collaborator

lgritz commented May 17, 2025

Your docs are great, thanks.

To do:

  • One very minor suggestions about how to do the string comparison on the command name (I made the suggestion via diffs, if you're ok with that, you can just click to accept the change)
  • I would like you to back out the unintended whitespace changes in the docs. There's no reason to touch so many lines in this file.
  • Let's sit on this until early next week to give people a little more time to think about the design choice and weigh in if they have a preference.

@JGoldstone
Copy link
Contributor

Amazing how far behind one can get on an issue by taking just a single day off. But: I wanted to say I really like design #2, as well.

@lgritz
Copy link
Collaborator

lgritz commented May 19, 2025

Lydia, this looks great now, the stray whitespace changes are definitely cleaned up.

It feel like consensus is leaning toward Design 2. We'll discuss in today's TSC meeting, but I suspect that is the way we'd rather go.

@lydia-zheng lydia-zheng force-pushed the feat_eraseattrib_fromfile branch from f364588 to e46de5c Compare May 20, 2025 23:39
Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I made a couple very minor suggestions about documentation wording and formatting. The diffs should allow you to click to accept them without needing to re-edit and push (assuming you agree with my suggestions).

lydia-zheng and others added 5 commits May 20, 2025 17:41
Co-authored-by: Larry Gritz <[email protected]>
Signed-off-by: Lydia Zheng <[email protected]>
Co-authored-by: Larry Gritz <[email protected]>
Signed-off-by: Lydia Zheng <[email protected]>
Co-authored-by: Larry Gritz <[email protected]>
Signed-off-by: Lydia Zheng <[email protected]>
Signed-off-by: Lydia Zheng <[email protected]>
@lydia-zheng
Copy link
Contributor Author

Looks great!

I made a couple very minor suggestions about documentation wording and formatting. The diffs should allow you to click to accept them without needing to re-edit and push (assuming you agree with my suggestions).

Hi Larry! Those all looked good, I've accepted the documentation wording updates, cheers.

Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats, Lydia, LGTM!

@lgritz lgritz merged commit caa665d into AcademySoftwareFoundation:main May 21, 2025
30 checks passed
@lydia-zheng
Copy link
Contributor Author

Sweet! Thanks for the double checking, and for everyone's input on the design of the feature!

lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request May 21, 2025
)

Extended eraseattrib to be able to read regex patterns from a file,
one per line, with `--eraseattrib:fromfile=1 filename`.
Erases all metadata attribute fields that match.

Added a new test to `oiiotool-attribs` which has been verified
painstakenly to be working!
Also tried running the command directly from dist/bin and verified with
`oiiotool --info -v`.

More detail on the motivation:

It's common in a studio to have tools insert all sorts of metadata into files as they're rendered, etc. But a lot of that are things that SHOULD NOT leave the facility -- like security issues from internal pathnames or network info, or secrets like plot details revealed by sequence or character names. So generally you want to scrub the metadata from the final images before they are shipped to the client or to marketing or whatever.

oiiotool's --eraseattrib is handy for this, but it takes a single regex. But we were thinking that:

-  You may have several patterns to fully describe the metadata you want to be sure not to ship. This could be an unwieldy regex or a cluttered series of multiple --eraseattrib commands.
-  That list of patterns may change over time, and it would be error prone to have to change it in every script that calls oiiotool.
-  That list of patterns may need to be different per show.

We thought a solution to facilitate this might be a version of --eraseattrib that takes a filename that contains a list of regexes, one per line of the text file. So that master list of things to scrub could be maintained in a central place and not need to update every oiiotool-calling script in order to update it. Furthermore, you could have the command say something like

    $ oiiotool ready_to_ship.exr --eraseattrib:fromfile=1 /blah/$SHOW/nometadata.txt -o clean.exr

to make a single command that allows per-show customization.

---------

Signed-off-by: Lydia Zheng <[email protected]>
zachlewis pushed a commit to zachlewis/OpenImageIO that referenced this pull request Aug 1, 2025
)

Extended eraseattrib to be able to read regex patterns from a file,
one per line, with `--eraseattrib:fromfile=1 filename`.
Erases all metadata attribute fields that match.

Added a new test to `oiiotool-attribs` which has been verified
painstakenly to be working!
Also tried running the command directly from dist/bin and verified with
`oiiotool --info -v`.

More detail on the motivation:

It's common in a studio to have tools insert all sorts of metadata into files as they're rendered, etc. But a lot of that are things that SHOULD NOT leave the facility -- like security issues from internal pathnames or network info, or secrets like plot details revealed by sequence or character names. So generally you want to scrub the metadata from the final images before they are shipped to the client or to marketing or whatever.

oiiotool's --eraseattrib is handy for this, but it takes a single regex. But we were thinking that:

-  You may have several patterns to fully describe the metadata you want to be sure not to ship. This could be an unwieldy regex or a cluttered series of multiple --eraseattrib commands.
-  That list of patterns may change over time, and it would be error prone to have to change it in every script that calls oiiotool.
-  That list of patterns may need to be different per show.

We thought a solution to facilitate this might be a version of --eraseattrib that takes a filename that contains a list of regexes, one per line of the text file. So that master list of things to scrub could be maintained in a central place and not need to update every oiiotool-calling script in order to update it. Furthermore, you could have the command say something like

    $ oiiotool ready_to_ship.exr --eraseattrib:fromfile=1 /blah/$SHOW/nometadata.txt -o clean.exr

to make a single command that allows per-show customization.

---------

Signed-off-by: Lydia Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devdays ASWF Dev Days suitable project devdays25 DevDays 2025 project enhancement Improvement of existing/working features. oiiotool oiiotool

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants