-
Notifications
You must be signed in to change notification settings - Fork 106
Rebase to v2.51.1 #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase to v2.51.1 #805
Conversation
When using the sparse-checkout feature git should not write to the working directory for files with the skip-worktree bit on. With the skip-worktree bit on the file may or may not be in the working directory and if it is not we don't want or need to create it by calling checkout_entry. There are two callers of checkout_target. Both of which check that the file does not exist before calling checkout_target. load_current which make a call to lstat right before calling checkout_target and check_preimage which will only run checkout_taret it stat_ret is less than zero. It sets stat_ret to zero and only if !stat->cached will it lstat the file and set stat_ret to something other than zero. This patch checks if skip-worktree bit is on in checkout_target and just returns so that the entry doesn't not end up in the working directory. This is so that apply will not create a file in the working directory, then update the index but not keep the working directory up to date with the changes that happened in the index. Signed-off-by: Kevin Willford <[email protected]>
Signed-off-by: Kevin Willford <[email protected]>
As of 9e59b38 (object-file: emit corruption errors when detected, 2022-12-14), Git will loudly complain about corrupt objects. That is fine, as long as the idea isn't to re-download locally-corrupted objects. But that's exactly what we want to do in VFS for Git via the `read-object` hook, as per the `GitCorruptObjectTests` code added in microsoft/VFSForGit@2db0c030eb25 (New features: [...] - GVFS can now recover from corrupted git object files [...] , 2018-02-16). So let's support precisely that, and add a regression test that ensures that re-downloading corrupt objects via the `read-object` hook works. While at it, avoid the XOR operator to flip the bits, when we actually want to make sure that they are turned off: Use the AND-NOT operator for that purpose. Helped-by: Matthew John Cheetham <[email protected]> Helped-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
String formatting can be a performance issue when there are hundreds of thousands of trees. Change to stop using the strbuf_addf and just add the strings or characters individually. There are a limited number of modes so added a switch for the known ones and a default case if something comes through that are not a known one for git. In one scenario regarding a huge worktree, this reduces the time required for a `git checkout <branch>` from 44 seconds to 38 seconds, i.e. it is a non-negligible performance improvement. Signed-off-by: Kevin Willford <[email protected]>
The following commands and options are not currently supported when working in a GVFS repo. Add code to detect and block these commands from executing. 1) fsck 2) gc 4) prune 5) repack 6) submodule 8) update-index --split-index 9) update-index --index-version (other than 4) 10) update-index --[no-]skip-worktree 11) worktree Signed-off-by: Ben Peart <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
In earlier versions of `microsoft/git`, we found a user who had set `core.gvfs = false` in their global config. This should not have been necessary, but it also should not have caused a problem. However, it did. The reason was that `gvfs_load_config_value()` was called from `config.c` when reading config key/value pairs from all the config files. The local config should override the global config, and this is done by `config.c` reading the global config first then reading the local config. However, our logic only allowed writing the `core_gvfs` variable once. In v2.51.0, we had to adapt to upstream changes that changed way the `core.gvfs` config value is read, and the special handling is no longer necessary, yet we still want the test case that ensures that this bug does not experience a regression. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
Add the ability to block built-in commands based on if the `core.gvfs` setting has the `GVFS_USE_VIRTUAL_FILESYSTEM` bit set. This allows us to selectively block commands that use the GVFS protocol, but don't use VFS for Git (for example repos cloned via `scalar clone` against Azure DevOps). Signed-off-by: Matthew John Cheetham <[email protected]>
Loosen the blocking of the `repack` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `repack` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <[email protected]>
Loosen the blocking of the `fsck` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `fsck` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <[email protected]>
Loosen the blocking of the `prune` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `prune` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <[email protected]>
Replace the special casing of the `worktree` command being blocked on VFS-enabled repos with the new `BLOCK_ON_VFS_ENABLED` flag. Signed-off-by: Matthew John Cheetham <[email protected]>
On index load, clear/set the skip worktree bits based on the virtual file system data. Use virtual file system data to update skip-worktree bit in unpack-trees. Use virtual file system data to exclude files and folders not explicitly requested. Update 2022-04-05: disable the "present-despite-SKIP_WORKTREE" file removal behavior when 'core.virtualfilesystem' is enabled. Signed-off-by: Ben Peart <[email protected]>
…x has been redirected Fixes #13 Some git commands spawn helpers and redirect the index to a different location. These include "difftool -d" and the sequencer (i.e. `git rebase -i`, `git cherry-pick` and `git revert`) and others. In those instances we don't want to update their temporary index with our virtualization data. Helped-by: Johannes Schindelin <[email protected]> Signed-off-by: Ben Peart <[email protected]>
Add check to see if a directory is included in the virtualfilesystem before checking the directory hashmap. This allows a directory entry like foo/ to find all untracked files in subdirectories.
When our patches to support that hook were upstreamed, the hook's name was eliciting some reviewer suggestions, and it was renamed to `post-index-change`. These patches (with the new name) made it into v2.22.0. However, VFSforGit users may very well have checkouts with that hook installed under the original name. To support this, let's just introduce a hack where we look a bit more closely when we just failed to find the `post-index-change` hook, and allow any `post-indexchanged` hook to run instead (if it exists).
Emit a warning message when the `gvfs.sharedCache` option is set that the `repack` command will not perform repacking on the shared cache. In the future we can teach `repack` to operate on the shared cache, at which point we can drop this commit. Signed-off-by: Matthew John Cheetham <[email protected]>
Teach STATUS to optionally serialize the results of a status computation to a file. Teach STATUS to optionally read an existing serialization file and simply print the results, rather than actually scanning. This is intended for immediate status results on extremely large repos and assumes the use of a service/daemon to maintain a fresh current status snapshot. 2021-10-30: packet_read() changed its prototype in ec9a37d (pkt-line.[ch]: remove unused packet_read_line_buf(), 2021-10-14). 2021-10-30: sscanf() now does an extra check that "%d" goes into an "int" and complains about "uint32_t". Replacing with "%u" fixes the compile-time error. 2021-10-30: string_list_init() was removed by abf897b (string-list.[ch]: remove string_list_init() compatibility function, 2021-09-28), so we need to initialize manually. Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
When using a virtual file system layer, the FSMonitor does not make sense. Signed-off-by: Johannes Schindelin <[email protected]>
Teach status serialization to take an optional pathname on
the command line to direct that cache data be written there
rather than to stdout. When used this way, normal status
results will still be written to stdout.
When no path is given, only binary serialization data is
written to stdout.
Usage:
git status --serialize[=<path>]
Signed-off-by: Jeff Hostetler <[email protected]>
When sparse-checkout is enabled, add the sparse-checkout percentage to the Trace2 data stream. This number was already computed and printed on the console in the "You are in a sparse checkout..." message. It would be helpful to log it too for performance monitoring. Signed-off-by: Jeff Hostetler <[email protected]>
Teach status deserialize code to reject status cache when printing in porcelain V2 and there are unresolved conflicts in the cache file. A follow-on task might extend the cache format to include this additiona data. See code for longer explanation. Signed-off-by: Jeff Hostetler <[email protected]>
Changes to the global or repo-local excludes files can change the results returned by "git status" for untracked files. Therefore, it is important that the exclude-file values used during serialization are still current at the time of deserialization. Teach "git status --serialize" to report metadata on the user's global exclude file (which defaults to "$XDG_HOME/git/ignore") and for the repo-local excludes file (which is in ".git/info/excludes"). Serialize will record the pathnames and mtimes for these files in the serialization header (next to the mtime data for the .git/index file). Teach "git status --deserialize" to validate this new metadata. If either exclude file has changed since the serialization-cache-file was written, then deserialize will reject the cache file and force a full/normal status run. Signed-off-by: Jeff Hostetler <[email protected]>
Teach `git status --deserialize` to either wait indefintely or immediately fail if the status serialization cache file is stale. Signed-off-by: Jeff Hostetler <[email protected]>
With the "--untracked-files=complete" option status computes a superset of the untracked files. We use this when writing the status cache. If subsequent deserialize commands ask for either the complete set or one of the "no", "normal", or "all" subsets, it can still use the cache file because of filtering in the deserialize parser. When running status with the "-uno" option, the long format status would print a "(use -u to show untracked files)" hint. When deserializing with the "-uno" option and using a cache computed with "-ucomplete", the "nothing to commit, working tree clean" message would be printed instead of the hint. It was easy to miss because the correct hint message was printed if the cache was rejected for any reason (and status did the full fallback). The "struct wt_status des" structure was initialized with the content of the status cache (and thus defaulted to "complete"). This change sets "des.show_untracked_files" to the requested subset from the command-line or config. This allows the long format to print the hint. Signed-off-by: Jeff Hostetler <[email protected]>
When using fsmonitor the CE_FSMONITOR_VALID flag should be checked when wanting to know if the entry has been updated. If the flag is set the entry should be considered up to date and the same as if the CE_UPTODATE is set. In order to trust the CE_FSMONITOR_VALID flag, the fsmonitor data needs to be refreshed when the fsmonitor bitmap is applied to the index in tweak_fsmonitor. Since the fsmonitor data is kept up to date for every command, some tests needed to be updated to take that into account. istate->untracked->use_fsmonitor was set in tweak_fsmonitor when the fsmonitor bitmap data was loaded and is now in refresh_fsmonitor since that is being called in tweak_fsmonitor. refresh_fsmonitor will only be called once and any other callers should be setting it when refreshing the fsmonitor data so that code can use the fsmonitor data when checking untracked files. When writing the index, fsmonitor_last_update is used to determine if the fsmonitor bitmap should be created and the extension data written to the index. When running through unpack-trees this is not copied to the result index. This makes the next time a git command is ran do all the work of lstating all files to determine what is clean since all entries in the index are marked as dirty since there wasn't any fsmonitor data saved in the index extension. Copying the fsmonitor_last_update to the result index will cause the extension data for fsmonitor to be in the index for the next git command to use. Signed-off-by: Kevin Willford <[email protected]>
The fsmonitor script that can be used for running all the git tests using watchman was causing some of the tests to fail because it wrote to stderr and created some files for debugging purposes. Add a new debug script to use with debugging and modify the other script to remove the code that would cause tests to fail. Signed-off-by: Kevin Willford <[email protected]>
Disable deserialization when verbose output requested. Verbose mode causes Git to print diffs for modified files. This requires the index to be loaded to have the currently staged OID values. Without loading the index, verbose output make it look like everything was deleted. Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
Add the `cache-local-objects` maintenance task to the list of tasks run by the `scalar run` command. It's often easier for users to run the shorter `scalar run` command than the equivalent `git maintenance` command. Signed-off-by: Matthew John Cheetham <[email protected]>
This helps t0401 pass while under SANITIZE=leak. Signed-off-by: Derrick Stolee <[email protected]>
Introduce a new maintenance task, `cache-local-objects`, that operates on Scalar or VFS for Git repositories with a per-volume, shared object cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose objects from the repository object directory to the shared cache. Older versions of `microsoft/git` incorrectly placed packfiles in the repository object directory instead of the shared cache; this task will help clean up existing clones impacted by that issue. Fixes #716
TODO: drop this, it is a left-over from a patch that was upstreamed as 98f8854 (index-pack: allow revisiting REF_DELTA chains, 2025-04-28). This new test demonstrates some behavior where a valid packfile is being rejected by the Git client due to the order in which it is resolving REF_DELTAs. The thin packfile has a REF_DELTA chain A->B->C where C is not included in the packfile. However, the client repository contains both C and B already. Thus, 'git index-pack' is able to resolve A before resolving B. When resolving B, it then attempts to resolve any other REF_DELTAs that are pointing to B as a base. This "revisits" A and complains as if there is a cycle, but it did not actually detect a cycle. A fix will arrive in the next change. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
The microsoft/git fork includes pre- and post-command hooks, with the
initial intention of using these for VFS for Git. In that environment,
these are important hooks to avoid concurrent issues when the
virtualization is incomplete.
However, in the Office monorepo the post-command hook is used in a
different way. A custom hook is used to update the sparse-checkout, if
necessary. To avoid this hook from being incredibly slow on every Git
command, this hook checks for the existence of a "sentinel file" that is
written by a custom post-index-change hook and no-ops if that file does
not exist.
However, even this "no-op" is 200ms due to the use of two scripts (one
simple script in .git/hooks/ does some environment checking and then
calls a script from the working directory which actually contains the
logic).
Add a new config option, 'postCommand.strategy', that will allow for
multiple possible strategies in the future. For now, the one we are
adding is 'post-index-change' which states that we should write a
sentinel file instead of running the 'post-index-change' hook and then
skip the 'post-command' hook if the proper sentinel file doesn't exist.
(If it does exist, then delete it and run the hook.)
---
This fork contains changes specific to monorepo scenarios. If you are an
external contributor, then please detail your reason for submitting to
this fork:
* [ ] This is an early version of work already under review upstream.
* [ ] This change only applies to interactions with Azure DevOps and the
GVFS Protocol.
* [ ] This change only applies to the virtualization hook and VFS for
Git.
* [x] This change only applies to custom bits in the microsoft/git fork.
This is an early version of work already under review upstream: gitgitgadget#1906
This patch series has been long in the making, ever since Johannes Nicolai and myself spiked this in November/December 2020. Signed-off-by: Johannes Schindelin <[email protected]>
This is no longer needed. Signed-off-by: Johannes Schindelin <[email protected]>
…rimental' This is no longer needed. Signed-off-by: Johannes Schindelin <[email protected]>
…lear more files' This is no longer needed. Signed-off-by: Johannes Schindelin <[email protected]>
1f863fd to
cdb99b8
Compare
Range-diff relative to clean/vfs-2.51.0
|
This represents a cleaned up version of
vfs-2.51.0(which is tree-same), rebased onto Git for Windows v2.51.1.