Skip to content

Conversation

@rgmz
Copy link
Contributor

@rgmz rgmz commented Dec 11, 2024

Description:

This is a follow-up to #1652.

It attempts to 'intelligently' chunk inputs on subsequent newlines \n\n, rather than indiscriminately breaking text apart. The assumption is that it's safe(r) for a chunk to end at (1), rather than (2) or (3). Of course, the buffer cannot grow indefinitely or the original issue would regress; it will only search for consecutive newlines up to a maxPeekSize.

  // TODO: optimization could be introduced here
  if mimetype, err := filetype.Match(buf[:n]); err != nil {
    return nil
  } else if mimetype.MIME.Type == "application" {
    return nil // skip binary files
  }

>>> (1) <<<  // If the chunk doesn't end in a newli>>> (2) <<<ne, peek|maxPeekSize| until we find one.
  // This hopefully avoids splitting
>>> (3) <<< // See: https://github.com/gitleaks/gitleaks/issues/1651
  var (

The original issue with 60 private keys is solved by this change:

$ ./gitleaks dir /tmp/60keys.txt --no-color

10:22PM INF scan completed in 88ms
10:22PM WRN leaks found: 60

Thoughts

  • Needs to be tested on Windows or files with CRLF.
  • Reading one byte at a time might be slow (idk, seems pretty fast)
  • It might make sense for whitespace not to reset the counter. That way empty lines can match the criteria.
      }
       
    // Foo
    

Checklist:

  • Does your PR pass tests?
  • Have you written new tests for your changes?
  • Have you lint your code locally prior to submission?

@zricethezav
Copy link
Collaborator

@rgmz

Needs to be tested on Windows or files with CRLF.

Ideally I would have a windows machine to test on. We could just merge without testing on windows and if an issue arrises patch it real quick.

It might make sense for whitespace not to reset the counter. That way empty lines can match the criteria.

Could you elaborate on this a little bit?

@rgmz
Copy link
Contributor Author

rgmz commented Dec 12, 2024

Could you elaborate on this a little bit?

It's common to have blank lines only consisting of whitespace.

Foo\n
<tab>\n
Bar

The current logic resets the counter if \n isn't followed by another \n. In my mind, the above is as good of a place for the chunk to end as an empty line.

Foo\n
\n
Bar

@zricethezav
Copy link
Collaborator

Gotcha, I agree with you that the condition for the counter should be expanded. It doesn't have to be in this PR, we can always follow up.

@zricethezav zricethezav merged commit 0bf13fc into gitleaks:master Dec 12, 2024
1 check passed
@rgmz rgmz deleted the feat/selective-chunking branch December 12, 2024 16:27
alayne222 pushed a commit to alayne222/gitleaks that referenced this pull request May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants