Interleaved sync didn't save _fetch_times and _local_sync_state to disk.
Phased sync saved them, but incorrectly applied moving average smoothing
repeatedly when fetching submodules, and discarded historical data
during partial syncs.
Move .Save() calls to the end of main sync loops to ensure they run
once. Update _FetchTimes.Save() to merge new data with existing history,
preventing data loss.
Change-Id: I174f98a62ac86859f1eeea1daba65eb35c227852
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/519821
Commit-Queue: Gavin Mak <gavinmak@google.com>
Reviewed-by: Scott Lee <ddoman@google.com>
Tested-by: Gavin Mak <gavinmak@google.com>
When checkout errors occurred in interleaved sync, they were wrapped in
a SyncError with no message, causing blank lines in the final summary.
Refactor _SyncResult to hold a list of exceptions, ensuring the original
error messages are propagated correctly.
Bug: 438178765
Change-Id: Ic25e515068959829cb6290cfd9e4c2d3963bbbea
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/498342
Reviewed-by: Scott Lee <ddoman@google.com>
Tested-by: Gavin Mak <gavinmak@google.com>
Commit-Queue: Gavin Mak <gavinmak@google.com>
Failures in deferred sync actions were not recorded because `_Later.Run`
discarded the `GitError` exception. Record the specific error using
`syncbuf.fail()` and propagate it for proper error aggregation and
reporting.
Bug: 438178765
Change-Id: Iad59e389f9677bd6b8d873ee1ea2aa6ce44c86fa
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/498141
Tested-by: Gavin Mak <gavinmak@google.com>
Reviewed-by: Scott Lee <ddoman@google.com>
Keep track of finished projects, not just successful ones, when deciding
which projects still need to be synced. Also project errors are already
reported by sync workers so stall detection doesn't need to add failed
projects to the error list.
Bug: 438178765
Change-Id: Ibf15aad009ba7295e70c8df2ff158215085e9732
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/498062
Reviewed-by: Scott Lee <ddoman@google.com>
Tested-by: Gavin Mak <gavinmak@google.com>
This fixes two issues:
1. the progress bar could show a count greater than the total if new projects were discovered mid-sync. Update the progress bar total dynamically
2. Make "Stall detected" error message more actionable
Bug: 432206932
Change-Id: Ie2a4ada5b1770cae0302fb06590641c522cbb7e7
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/491941
Tested-by: Gavin Mak <gavinmak@google.com>
Reviewed-by: Scott Lee <ddoman@google.com>
Commit-Queue: Gavin Mak <gavinmak@google.com>
Add support for a new hook type "post-sync" declared in the manifest using
<repo-hooks>. This allows executing a script automatically after a successful
`repo sync`.
This is useful for initializing developer environments, installing project-wide
Git hooks, generating configs, and other post-sync automation tasks.
Example manifest usage:
<project name="myorg/repo-hooks" path="hooks" revision="main" />
<repo-hooks in-project="myorg/repo-hooks" enabled-list="post-sync">
<hook name="post-sync" />
</repo-hooks>
The hook script must be named `post-sync.py` and located at the root of the
hook project.
The post-sync hook does not block `repo sync`; if the script fails, the sync
still completes successfully with a warning.
Test: Added `post-sync.py` in hook project and verified it runs after `repo sync`
Bug: b/421694721
Change-Id: I69f3158f0fc319d73a85028d6e90fea02c1dc8c8
Signed-off-by: Kenny Cheng <chao.shun.cheng.tw@gmail.com>
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/480581
Reviewed-by: Scott Lee <ddoman@google.com>
Reviewed-by: Gavin Mak <gavinmak@google.com>
Dedupe error reporting logic for phased and interleaved sync modes by
extracting it into _ReportErrors.
Error reporting will now distinguish between network and local failures
and lists the specific repos that failed in each phase.
Bug: 421935613
Change-Id: I4604a83943dbbd71d979158d7a1c4b8c243347d2
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/484541
Tested-by: Gavin Mak <gavinmak@google.com>
Reviewed-by: Scott Lee <ddoman@google.com>
Commit-Queue: Gavin Mak <gavinmak@google.com>
The logic for checking for repo self-updates lives in _FetchMain, which
is part of the "phased" sync path.
Extract this logic into a new _UpdateRepoProject helper method. Call
this common helper from _ExecuteHelper before either sync mode begins,
so the repo self-update check is always performed.
Bug: 421935613
Change-Id: I9a804f43fbf6239c4146be446040be531f12fc8a
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/484041
Reviewed-by: Scott Lee <ddoman@google.com>
Commit-Queue: Gavin Mak <gavinmak@google.com>
Tested-by: Gavin Mak <gavinmak@google.com>
For each assigned project, the worker sequentially calls
Sync_NetworkHalf and Sync_LocalHalf, respecting --local-only and
--network-only flags. To prevent scrambled progress bars, all stderr
output from the checkout phase is captured (shown with --verbose).
Result objects now carry status and timing information from the worker
for state updates.
Bug: 421935613
Change-Id: I398602e08a375e974a8914e5fa48ffae673dda9b
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/483301
Commit-Queue: Gavin Mak <gavinmak@google.com>
Reviewed-by: Scott Lee <ddoman@google.com>
Tested-by: Gavin Mak <gavinmak@google.com>
Introduce the parallel orchestration framework for `repo sync
--interleaved`.
The new logic respects project dependencies by processing them in
hierarchical levels. Projects sharing a git object directory are grouped
and processed serially. Also reuse the familiar fetch progress bar UX.
Bug: 421935613
Change-Id: Ia388a231fa96b3220e343f952f07021bc9817d19
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/483281
Commit-Queue: Gavin Mak <gavinmak@google.com>
Tested-by: Gavin Mak <gavinmak@google.com>
Reviewed-by: Scott Lee <ddoman@google.com>
Prepare for an interleaved fetch and checkout mode for `repo sync`. The
goal of the new mode is to significantly speed up syncs by running fetch
and checkout operations in parallel for different projects, rather than
waiting for all fetches to complete before starting any checkouts.
Bug: 421935613
Change-Id: I8c66d1e790c7bba6280e409b95238c5e4e61a9c8
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/482821
Reviewed-by: Scott Lee <ddoman@google.com>
Commit-Queue: Gavin Mak <gavinmak@google.com>
Tested-by: Gavin Mak <gavinmak@google.com>
Warn users if the effective job count specified via `-j`,
`--jobs-network`, or `--jobs-checkout` exceeds a threshold
(currently 100). This encourages users to use more reasonable
values.
Bug: 406868778
Bug: 254914814
Change-Id: I116e2bbaf3dc824c04d1b2fbe52cf9ca5be77b9a
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/466801
Reviewed-by: Mike Frysinger <vapier@google.com>
Commit-Queue: Gavin Mak <gavinmak@google.com>
Tested-by: Gavin Mak <gavinmak@google.com>
If the repo index is stale, reset --keep will refuse to reset workspace.
An index can be stale if there are any modifications to file node,
including mtime, atime, ownership changes, etc.
Bug: b/375423099
Change-Id: Ibef03d9d8d2babbb107041707281687342ab7a77
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/460022
Commit-Queue: Josip Sokcevic <sokcevic@chromium.org>
Tested-by: Josip Sokcevic <sokcevic@chromium.org>
Reviewed-by: Scott Lee <ddoman@google.com>
If repo sync is invoked outside the repo root, and the latest manifest
removes symlinks, repo incorrectly tries to remove symlink - it starts
from `cwd` instead of the repo root.
Bug: b/113935847
Bug: 40010423
Change-Id: Ia50ea70a376e38c94389880f020c80da3c3f453c
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/445901
Tested-by: Josip Sokcevic <sokcevic@chromium.org>
Reviewed-by: Gavin Mak <gavinmak@google.com>
With a large number of sync workers, the sync process may fail on
macOS due to connection errors. The root cause is that multiple
workers may attempt to connect to the multiprocessing manager server
at the same time when handling the first job. This can lead to
connection failures if there are too many pending connections, exceeding
the socket listening backlog.
Bug: 377538810
Change-Id: I1924d318d076ca3be61d75daa37bfa8d7dc23ed7
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/441541
Tested-by: Josip Sokcevic <sokcevic@google.com>
Commit-Queue: Josip Sokcevic <sokcevic@google.com>
Reviewed-by: Josip Sokcevic <sokcevic@google.com>
The command _PostRepoFetch will try to self update
during repo sync. That is beneficial but adds
version uncertainty, fail potential and slow downs
in non-interactive scenarios.
Conditionally skip the update if env variable
REPO_SKIP_SELF_UPDATE is defined.
A call to selfupdate works as before, meaning even
with the variable set, it will run the update.
Change-Id: Iab0ef55dc3d3db3cbf1ba1f506c57fbb58a504c3
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/439967
Tested-by: Fredrik de Groot <fredrik.de.groot@haleytek.com>
Commit-Queue: Josip Sokcevic <sokcevic@google.com>
Reviewed-by: Josip Sokcevic <sokcevic@google.com>
Background:
- Manifest object is large (for projects like Android) in terms of
serialization cost and size (more than 1mb).
- Lots of Project objects usually share only a few manifest objects.
Before this CL, Project objects were passed to workers via function
parameters. Function parameters are pickled separately (in chunk). In
other words, manifests are serialized again and again. The major
serialization overhead of repo sync was
O(manifest_size * projects / chunksize)
This CL uses following tricks to reduce serialization overhead.
- All projects are pickled in one invocation. Because Project objects
share manifests, pickle library remembers which objects are already
seen and avoid the serialization cost.
- Pass the Project objects to workers at worker intialization time.
And pass project index as function parameters instead. The number of
workers is much smaller than the number of projects.
- Worker init state are shared on Linux (fork based). So it requires
zero serialization for Project objects.
On Linux (fork based), the serialization overhead is
O(projects) --- one int per project
On Windows (spawn based), the serialization overhead is
O(manifest_size * min(workers, projects))
Moreover, use chunksize=1 to avoid the chance that some workers are idle
while other workers still have more than one job in their chunk queue.
Using 2.7k projects as the baseline, originally "repo sync" no-op
sync takes 31s for fetch and 25s for checkout on my Linux workstation.
With this CL, it takes 12s for fetch and 1s for checkout.
Bug: b/371638995
Change-Id: Ifa22072ea54eacb4a5c525c050d84de371e87caa
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/439921
Tested-by: Kuang-che Wu <kcwu@google.com>
Reviewed-by: Josip Sokcevic <sokcevic@google.com>
Commit-Queue: Kuang-che Wu <kcwu@google.com>
With 551285fa35, the comment about number
of workers no longer stands - dict is shared among multiprocesses and
real time information is available.
Using 2.7k projects as the baseline, using chunk size of 4 takes close
to 5 minutes. A chunk size of 32 takes this down to 40s - a reduction of
rougly 8 times which matches the increase.
R=gavinmak@google.com
Bug: b/371638995
Change-Id: Ida5fd8f7abc44b3b82c02aa0f7f7ae01dff5eb07
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/438523
Commit-Queue: Josip Sokcevic <sokcevic@google.com>
Tested-by: Josip Sokcevic <sokcevic@google.com>
Reviewed-by: Gavin Mak <gavinmak@google.com>
When using the smart sync option, we try to construct the target that
was "lunched" from the TARGET_PRODUCT and TARGET_BUILD_VARIANT envvars.
However, an android target is now made of three parts,
{TARGET_PRODUCT}-{TARGET_RELEASE}-{TARGET_BUILD_VARIANT}.
I am leaving the option of creating a target if a TARGET_RELEASE is not
specified in case there are other consumers who depend on that option.
BUG=b:358101714
TEST=./run_tests
TEST=smart sync on android repo and manually inspecting
smart_sync_override.xml
Change-Id: I556137e33558783a86a0631f29756910b4a93d92
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/436977
Tested-by: Yiwei Zhang <yiwzhang@google.com>
Reviewed-by: Yiwei Zhang <yiwzhang@google.com>
Commit-Queue: Yiwei Zhang <yiwzhang@google.com>
The current logic to create checkout layers doesn't work in all cases.
For example, let's assume there are three projects: "foo", "foo/bar" and
"foo-bar". Sorting lexicographical order is incorrect as foo-bar would
be placed between foo and foo/bar, breaking layering logic.
Instead, we split filepaths based using path delimiter (always /) and
then use lexicographical sort.
BUG=b:325119758
TEST=./run_tests, manual sync on chromiumos repository
Change-Id: I76924c3cc6ba2bb860d7a3e48406a6bba8f58c10
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/412338
Tested-by: Josip Sokcevic <sokcevic@google.com>
Commit-Queue: Josip Sokcevic <sokcevic@google.com>
Reviewed-by: George Engelbrecht <engeg@google.com>
If a repo manifest is updated so that project B is placed within a
project A, and if project A had content in new B's location in the old
checkout, then repo sync could break depending on checkout order, since
B can't be checked out before A.
This change introduces checkout levels which enforces right sequence of
checkouts while still allowing for parallel checkout. In an example
above, A will always be checked out first before B.
BUG=b:325119758
TEST=./run_tests, manual sync on ChromeOS repository
Change-Id: Ib3b5e4d2639ca56620a1e4c6bf76d7b1ab805250
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/410421
Tested-by: Josip Sokcevic <sokcevic@google.com>
Reviewed-by: Greg Edelston <gredelston@google.com>
Commit-Queue: Josip Sokcevic <sokcevic@google.com>
Reviewed-by: Gavin Mak <gavinmak@google.com>
Prior to this change RepoChangedException would be caught and re-rasied
as a different exception. This would prevent RepoChangedException
handler from running in main.py
Bug: b/323232806
Change-Id: I9055ff95d439d6ff225206c5bf1755cc718bcfcc
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/407144
Tested-by: Josip Sokcevic <sokcevic@google.com>
Reviewed-by: Josip Sokcevic <sokcevic@google.com>
Commit-Queue: Josip Sokcevic <sokcevic@google.com>
Most times a repo sync after some time (week+) results in a bunch of
messages, which are not very useful for average user:
- discarding 1 commits
- Deleting obsolete checkout.
Bug: N/A
Test: repo sync
Change-Id: I881eab61f9f261e98f3656c09e73ddd159ce288c
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/397038
Reviewed-by: Josip Sokcevic <sokcevic@google.com>
Commit-Queue: Josip Sokcevic <sokcevic@google.com>
Tested-by: Tomasz Wasilczyk <twasilczyk@google.com>
In the case of a project being removed from the manifest, and in the
path in which the project used to exist, and symlink is place to another
project repo will start to warn about partial syncs when a partial sync
did not occur.
Repro steps:
1) Create a manifest with two projects. Project a -> a/ and project b -> b/
2) Run `repo sync`
3) Remove project b from the manifest.
4) Use `link` in the manifest to link all of Project a to b/
Bug: 314161804
Change-Id: I4a4ac4f70a7038bc7e0c4e0e51ae9fc942411a34
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/395640
Reviewed-by: Gavin Mak <gavinmak@google.com>
Tested-by: Matt Schulte <matsch@google.com>
Commit-Queue: Gavin Mak <gavinmak@google.com>
When a new shared project is added to manifest, there's a short window
where objects can be deleted that are used by other projects.
To close that window, set preciousObjects during git init. For
non-shared projects, repo should correct the state in the same execution
instance.
Bug: 288102993
Change-Id: I366f524535ac58c820d51a88599ae2108df9ab48
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/390234
Commit-Queue: Josip Sokcevic <sokcevic@google.com>
Tested-by: Josip Sokcevic <sokcevic@google.com>
Reviewed-by: Mike Frysinger <vapier@google.com>