When multiple repository operations execute concurrently on shared pool
directories, race conditions could cause .deb files to be deleted despite
appearing in repository metadata, resulting in apt 404 errors.
Three distinct but related race conditions were identified and fixed:
1. Package addition vs publish race: When packages are added to a local
repository that is already published, the publish operation could read
stale package references before the add transaction commits. Fixed by
locking all published repositories that reference the local repo during
package addition.
2. Pool file deletion race: When multiple published repositories share the
same pool directory (same storage+prefix) and publish concurrently, cleanup
operations could delete each other's newly created files. The cleanup in
thread B would:
- Query database for referenced files (not seeing thread A's uncommitted files)
- Scan pool directory (seeing thread A's files)
- Delete thread A's files as "orphaned"
Fixed by implementing pool-sibling locking: acquire locks on ALL published
repositories sharing the same storage and prefix before publish/cleanup.
3. Concurrent cleanup on same prefix: Multiple distributions publishing to the
same prefix concurrently could have cleanup operations delete shared files.
Fixed by:
- Adding prefix-level locking to serialize cleanup operations
- Removing ref subtraction that incorrectly marked shared files as orphaned
- Forcing database reload before cleanup to see recent commits
The existing task system serializes operations based on resource locks,
preventing these race conditions when proper lock sets are acquired.
Test coverage includes concurrent publish scenarios that reliably reproduced
all three bugs before the fixes.
Initially found by automated repository health checks used by Termux
in https://github.com/termux/termux-packages/issues/27472
The root problem was 4.3.5a comparing less than 4.3.5-rc1-1 by aptly
According to debian "4.3.5a" > "4.3.5-rc1-1"
This is because dpkg splits hyphen for revision at the first hyphen,
whereas aptly was splitting at the last hyphen which is different from
dpkg's behaviour.
dpkg behaviour: https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/lib/dpkg/parsehelp.c#n242
Perhaps this wasn't detected as there was broken tests in the repository
since the initial commit of aptly. This also fixes those tests
Enabling coverage near-doubles the incremental build time and adds
overhead to individual tests on the order of **5-10x** or more. It's not
essential to have this for quick local system-test runs, so add an option
to disable it.
When using rootless podman, the *current user* gets mapped to uid 0,
which results in the aptly user being unable to write to the build
directory. We can instead map the current user to the corresponding uid
in the container via `PODMAN_USERNS=keep-id`, which matches up with what
docker-wrapper wants...but then that will *enter the container as the
current uid*, which messes with the ability to set permissions on
`/var/lib/aptly`. That can be fixed by explicitly passing `--user 0:0`,
which should be a no-op on docker (since the container's default user is
already root).
Additionally, this adds `--security-opt label=disable` to avoid
permission errors when running on systems with SELinux enforcing.
This fixes the race condition that happens when you call publish
concurrently. It adds a valuable test that reproduces the error almost
deterministically, it's hard to say always but I have run this in loop
100 times and it reproduces the error consistently without the patch and
after the patch it works consistently.