Skip to content

Conversation

@Okeanos
Copy link
Contributor

@Okeanos Okeanos commented Apr 20, 2025

I use editorconfig-checker in a variety of different project setups and noticed there's always something I have to ignore on my own. As a result adopting the checker into established projects effectively requires a config file increasing friction.

This gets repetitive very fast. As a result I propose to amend the default exclusions with additional lockfiles / package manager definitions that are outside user control and shouldn't be checked by editorconfig-checker.

To make maintenance (at least in my eyes) somewhat easier I tried sorting the entries and grouping them with inline comments. I can revert this, particularly if the original order had implications I am not aware of.


I have a couple of questions here and the list might need some updates based on that:

A lot of definitions start with the ^ character, that forces an exact file name match and makes the default excludes impractical for e.g. mono-repos with nested structures. Is this intended behaviour? What is preferred here?

The node_modules exclusion specifically starts with [\\/]. What is the purpose of this? Other folders, e.g. .git are not excluded this way but just use their name directly.

Are collapsed or expanded regex definitions preferred? See \\.jpeg$ & \\.jpg$ vs. \\.jpe?g$ for example.

The main_test.go appears to have at least one obsolete exclude directive (see 1 and 2) where .git is excluded manually but it has been part of the default exclude since #427. I guess it should be removed here? Additionally, it probably makes sense to add \\.exe$ to the default exclude as its own pattern as well?

@codecov
Copy link

codecov bot commented Apr 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.59%. Comparing base (af09b21) to head (a37af1b).
Report is 49 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #462      +/-   ##
==========================================
+ Coverage   86.72%   87.59%   +0.86%     
==========================================
  Files          11       11              
  Lines        1017     1048      +31     
==========================================
+ Hits          882      918      +36     
+ Misses        102       99       -3     
+ Partials       33       31       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rasa
Copy link
Contributor

rasa commented Apr 29, 2025

@Okeanos If this PR is accepted, the readme will need to be updated too. Perhaps do that in this PR?

@klaernie
Copy link
Member

First of all: order is most definitely not important - everything on that list is excluded, ordering it "the right way" may shave off nanoseconds of the runtime on a directory with many excluded files.

A lot of definitions start with the ^ character, that forces an exact file name match and makes the default excludes impractical for e.g. mono-repos with nested structures. Is this intended behaviour? What is preferred here?

I think this is mainly caused by the different people adding to this list over the years. We should probably do for unanchored, unless there is a reason to only consider files at the top level - which I think there is none.

The node_modules exclusion specifically starts with [\\/]. What is the purpose of this? Other folders, e.g. .git are not excluded this way but just use their name directly.

Since I added node_modules, I can tell: I only wanted to excluded folders named node_modules exactly - so it is wrapped in path separators. For .git the regex actually matches every path ending in .git - which I would consider a buggy behaviour, but I do not know if anybody depends on this behaviour, so I would consider fixing this (aka: turning it into [\\/]\.git[\\/]) a breaking change.

Are collapsed or expanded regex definitions preferred? See \\.jpeg$ & \\.jpg$ vs. \\.jpe?g$ for example.

I would consider collapsed regexes beneficial - it requires less steps for the regex engine, so increases performance (theoretically) and definitely improves readability.

The main_test.go appears to have at least one obsolete exclude directive (see 1 and 2) where .git is excluded manually but it has been part of the default exclude since #427. I guess it should be removed here? Additionally, it probably makes sense to add \\.exe$ to the default exclude as its own pattern as well?

Yes to all, good catch. We should probably also exclude other known binary files that are likely to occur in source code directories - e.g. binaries produced by compilers like .so, .dll and .pyc, maybe even .jar. In the end the content-type check should catch and exclude them anyway, but running a regex against a name is way cheaper.

@rasa
Copy link
Contributor

rasa commented Apr 30, 2025

Sorry to blow up this PR, but https://github.com/sindresorhus/binary-extensions/blob/main/binary-extensions.json has a very complete collection of binary extensions. What would be the downside of simply including them all?

Also, can we filepath.ToSlash() the path before regex-ing it, so we can use / instead of [\\/]?

@klaernie
Copy link
Member

Sorry to blow up this PR, but https://github.com/sindresorhus/binary-extensions/blob/main/binary-extensions.json has a very complete collection of binary extensions. What would be the downside of simply including them all?

Oh, that is neat! We could add this repository as a git submodule, and goembed the json at compile time. This would reduce the list of excludes we need to maintain drastically, and would also be easy to communicate in the readme.

Also, can we filepath.ToSlash() the path before regex-ing it, so we can use / instead of [\\/]?

You are totally right - we already convert each file path in files.GetRelativePath() - so we can reduce the regexes to only use / as a path separator without any code changes.

@Okeanos
Copy link
Contributor Author

Okeanos commented Apr 30, 2025

Okay, let me try to summarize the discussion and ensure I got everything:

  • Regexes shouldn't start with ^ unless definitely necessary in order to better support e.g. mono-repos
  • folder regexes should start with [\\/] to prevent matches against files of the same name
    • is presumably a breaking change for existing folders in the regex list
  • there should be only one regex per match where feasible, e.g. \\.jpe?g$ instead of \.jpeg$ and \.jpg$
  • main_test.go can be cleaned up
  • binary files should be excluded broadly, where necessary via content-type checks but preferably as regex (because it is cheaper)
  • paths should be converted via filepath.ToSlash() before being matched against regexes; this makes the [\\/] construct unnecessary and allows writing / instead going forward on all platforms.

I can take care of:

  • ^ matches
  • collapsing some regexes
  • main_test.go cleanup
  • adding [\\/] to folders (what do we do about the breaking change implication?)

I propose separate PRs for:

  • use filepath.ToSlash() and simplify [\\/] to /
  • using the external binary extensions list

Does that sound alright?

@klaernie
Copy link
Member

  • paths should be converted via filepath.ToSlash() before being matched against regexes; this makes the [\\/] construct unnecessary and allows writing / instead going forward on all platforms.

This is not needed, since we already do this. Only the regexes need to be touched, so they only use / to match folders.

Okeanos added 4 commits May 1, 2025 19:57
As per editorconfig-checker#462 (comment)
the paths are normalized to unix style (using `/`) before tested against the
regexes so a test for `\` as path separator is unnecessary.
…-repos

The `^` character forces files to be at the root of a directory tree in order to
be excluded, however, in case of mono-repos for example, these files may be
nested deeper in the structure and wouldn't have been excluded despite being
e.g. lockfiles outside of user control.

See editorconfig-checker#462 (comment)
Remove multiple regexes for the same semantic target and replace them with a
unified, singular regex.

See editorconfig-checker#462 (comment)
… files

Removes unnecessary test arguments and instead uses the default excludes.

See editorconfig-checker#462 (comment)
@Okeanos
Copy link
Contributor Author

Okeanos commented May 1, 2025

  • remove leading ^
  • collapse regexes (that semantically target the same thing)
  • remove unnecessary [\\/] and replace it with / because paths are normalized to Unix style before being tested
  • remove unnecessary excludes in main_test.go (added \\.exe$ as ignore regex)

I didn't touch the leading / for folders yet and neither the "advanced binary check list".

Copy link
Member

@klaernie klaernie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks neat! Thanks a lot for the work you put in this!

@theoludwig theoludwig changed the title Updated default ignore feat: update default paths to exclude May 5, 2025
@theoludwig theoludwig merged commit 84c5c55 into editorconfig-checker:main May 5, 2025
3 checks passed
@Okeanos Okeanos deleted the feat/updated-default-ignore branch May 5, 2025 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants