Commit Graph

2719 Commits

Author SHA1 Message Date
André Roth 38ba5bbcc6 fix(snapshot): eliminate race conditions by using fresh factory inside task closures
Affected endpoints: apiSnapshotsCreate, apiSnapshotsUpdate, apiSnapshotsDrop,
apiSnapshotsMerge, apiSnapshotsPull.

All five endpoints shared the same architectural flaw as the previously fixed
repos and publish endpoints: operations were performed outside the task lock,
with stale DB state used inside the lock.

Issues Fixed:

1. apiSnapshotsCreate - Source snapshots loaded before task lock
   Problem: snapshotCollection and collectionFactory created before task lock.
   Source snapshots and destination check done with stale factory.
   Concurrent creates both load pre-task state, second overwrites first.

   Fix: Create fresh taskCollectionFactory inside task, fresh loads of all
   sources after lock acquired, pre-task duplicate check for destination,
   use fresh sources and collections for snapshot creation.

2. apiSnapshotsUpdate - Snapshot loaded before task lock
   Problem: snapshot loaded outside task, duplicate check with stale factory.
   Concurrent renames both load pre-task state, both pass check, second
   overwrites first.

   Fix: Create fresh taskCollectionFactory inside task, fresh load of snapshot
   after lock acquired, fresh duplicate check inside lock, pre-task validation
   of new name, atomic rename with fresh copy.

3. apiSnapshotsDrop - Collections created before task lock
   Problem: snapshotCollection and publishedCollection created before task lock.
   Concurrent snapshot/published modifications not detected. Can delete snapshot
   that becomes published between pre-task and task.

   Fix: Create fresh taskCollectionFactory inside task, fresh load of snapshot,
   fresh collections for all checks (published, source dependency), all checks
   inside lock.

4. apiSnapshotsMerge - Source snapshots loaded before task lock
   Problem: snapshotCollection created before task lock. Source snapshots
   loaded outside task, LoadComplete called on stale copies. Concurrent
   merges both load pre-task state, merge result doesn't include source changes.

   Fix: Create fresh taskCollectionFactory inside task, fresh load of all
   sources after lock acquired, LoadComplete on fresh copies, merge using
   fresh RefLists, save using fresh factory.

5. apiSnapshotsPull - Snapshots loaded before task lock
   Problem: toSnapshot and sourceSnapshot loaded outside task,
   collectionFactory created before task. LoadComplete called on stale copies.
   Concurrent pulls load pre-task state, pull doesn't include source changes.

   Fix: Create fresh taskCollectionFactory inside task, fresh load of both
   snapshots after lock acquired, LoadComplete on fresh copies, all filtering
   and pulling on fresh RefLists, save using fresh factory.

Root cause analysis:

The fundamental issue is the split between pre-task work and task-protected
work. Collections and objects were being loaded before lock acquisition, then
stale copies used inside the lock.

Correct pattern (from fixed publish.go and repos.go):

1. HTTP Handler (before task lock):
   - Shallow load for 404 check only
   - Extract resource keys
   - Submit task with resources

2. Task Closure (after lock acquired):
   - Create fresh collectionFactory
   - Fresh load of all objects
   - LoadComplete on fresh copies
   - All mutations on fresh state
   - All checks atomic inside lock
   - Save using fresh collections

This ensures:
- Concurrent operations are serialized by task queue
- No stale DB state used for mutations
- No lost updates from concurrent modifications
- No TOCTOU races on duplicate checks
- No DB handle issues from pre-task factory capture
2026-05-25 19:57:22 +02:00
André Roth 8477274bb0 fix(repos): eliminate race conditions by using fresh factory inside task closures
Affected endpoints: apiReposDrop, apiReposPackagesAddDelete,
apiReposPackageFromDir, apiReposCopyPackage, apiReposIncludePackageFromDir,
apiReposEdit, apiReposCreate.

All seven endpoints shared the same architectural flaw as the previously
fixed publish endpoints: operations were performed outside the task lock,
with stale DB state used inside the lock.

Issues Fixed:

1. apiReposDrop - Collections created before task lock
   Problem: snapshotCollection, publishedCollection captured from pre-task
   factory. Concurrent snapshot/published modifications not detected.

   Fix: Create fresh taskCollectionFactory inside task, re-read repo after
   lock acquired, use fresh collections for checks.

2. apiReposPackagesAddDelete - Repo and factory stale before lock
   Problem: repo loaded outside task, collectionFactory created before lock.
   Concurrent add/delete operations both load same pre-task state, last
   write wins, packages lost.

   Fix: Create fresh taskCollectionFactory inside task, re-read repo after
   lock acquired, use fresh factory for all operations.

3. apiReposPackageFromDir - Repo and factory stale before lock
   Problem: repo loaded outside task, collectionFactory created before lock.
   Concurrent file imports both load same pre-task state, last write wins.

   Fix: Create fresh taskCollectionFactory inside task, re-read repo after
   lock acquired, use fresh factory for imports.

4. apiReposCopyPackage - Both repos and factory stale before lock
   Problem: dstRepo and srcRepo loaded outside task, collectionFactory
   created before lock. Concurrent copy operations race on stale state.

   Fix: Create fresh taskCollectionFactory inside task, re-read both repos
   after lock acquired, use fresh factory for all operations.

5. apiReposIncludePackageFromDir - Repo and factory stale before lock
   Problem: repo loaded outside task, collectionFactory created before lock.
   Concurrent .changes file processing races on stale state.

   Fix: Create fresh taskCollectionFactory inside task, use fresh factory
   for import operations.

6. apiReposEdit - No serialization, concurrent modification race
   Problem: Direct update without task locking. Two concurrent renames can
   both pass duplicate check, second overwrites first.

   Fix: Convert to async task. Duplicate check and update now atomic inside
   lock, after fresh load from DB.

7. apiReposCreate - No serialization, TOCTOU on duplicate check
   Problem: Duplicate check outside task lock, add outside lock. Two
   concurrent creates with same name both pass check, second overwrites first.

   Fix: Convert to async task. Duplicate check and add now atomic inside
   lock, after fresh load from DB.

Root cause analysis:

The fundamental issue is the split between pre-task work and task-protected
work. Collections and objects were being loaded before lock acquisition, then
stale copies used inside the lock.

Correct pattern (now applied consistently across all 7 endpoints):

1. HTTP Handler (before task lock):
   - Shallow load for 404 check only
   - Extract resource keys
   - Submit task with resources

2. Task Closure (after lock acquired):
   - Create fresh collectionFactory
   - Fresh load of all objects
   - LoadComplete on fresh copies
   - All mutations on fresh state
   - All checks atomic inside lock
   - Save using fresh collections

This ensures:
- Concurrent operations are serialized by task queue
- No stale DB state used for mutations
- No lost updates from concurrent modifications
- No TOCTOU races on duplicate checks
- No DB handle issues from pre-task factory capture
2026-05-25 18:36:26 +02:00
André Roth 68814ff1f0 docs: fix typo 2026-05-25 11:57:47 +02:00
André Roth d44ae522ac fix(publish): lock source repos/snapshots on publish update endpoint 2026-05-25 09:47:22 +00:00
André Roth 8f2b335409 fix(publish): lock source repos/snapshots on publish update switch
Affected endpoint: apiPublishUpdateSwitch (PUT /api/publish/{prefix}/{distribution}).

The handler registered only the published repo key as a task resource.
The underlying source repos (for local) or snapshots (for snapshot-based
published repos) were not locked.  Concurrent updates to a source repo
or snapshot while a publish-update/switch task was running could produce
inconsistent published indexes:

  Task A: apiPublishUpdateSwitch loads published, reads source repo/snapshot
  Request B: modifies same source repo or snapshot (add/remove packages, etc)
  Task A: Update() + Publish() reads stale/modified source -> inconsistent
           published index, or partial write if source deleted mid-task.

Fix: for SourceLocalRepo, iterate published.Sources (component -> source
UUID), look up each local repo via localRepoCollection.ByUUID and append
string(repo.Key()) to resources.  For SourceSnapshot, iterate b.Snapshots,
look up each snapshot via snapshotCollection.ByName and append
string(snapshot.ResourceKey()) to resources.  Task queue now serialises
against both the published repo and all its sources.
2026-05-25 10:36:21 +02:00
André Roth 9ecbc844e7 fix(publish): warn when distribution missing and resource key cannot be pre-registered
When b.Distribution is empty, the pre-registered resource key
U<storage>:<prefix>>><distribution> cannot be constructed, so
concurrent POST requests to the same prefix are not serialized by the
task queue.  Add a log warning so operators are aware of the gap.
2026-05-25 10:36:21 +02:00
André Roth 9e91ee4c4a fix(publish): reload published inside task for create/drop endpoints
Affected endpoints: apiPublishRepoOrSnapshot (POST /api/publish/{prefix}),
apiPublishDrop (DELETE /api/publish/{prefix}/{distribution}).

Both handlers used the outer-scoped collectionFactory and collection
variables inside the task closure.  These were captured before the task
lock was acquired, so under concurrent load each task operated on a
stale DB view:

  apiPublishRepoOrSnapshot:  snapshot/localRepo LoadComplete,
  NewPublishedRepo, CheckDuplicate, Publish, and collection.Add all
  used the pre-lock collectionFactory/collection.  Two concurrent
  POST to same prefix could both pass CheckDuplicate (neither sees
  the other in the stale DB view) and race on disk writes.

  apiPublishDrop:  collection.Remove used pre-lock collection,
  potentially racing with concurrent updates/other drops.

Fix: inside the task closure create a fresh taskCollectionFactory and
taskCollection.  All DB reads (LoadComplete) and writes
(CheckDuplicate, Add, Remove, Publish) now run against the authoritative
DB state after the lock is held.
2026-05-25 10:36:16 +02:00
André Roth b7969c7a2d fix(publish): reload published inside task for update/switch endpoints
Affected endpoints: apiPublishUpdateSwitch (PUT), apiPublishUpdate (POST).

Both handlers loaded the published repo and mutated scalar fields
(Label, Origin, SkipContents, SkipBz2, AcquireByHash, SignedBy,
MultiDist, Version) outside the task closure, before the lock was
acquired.  Inside the task, LoadComplete only refreshed sourceItems —
it did not reload scalar fields or the Revision.  Two concurrent
requests therefore each operated on a stale base:

  Request A loads published (Label="old"), sets Label="A"
  Request B loads published (Label="old"), sets Label="B"
  Task A runs: Update() + Publish() + collection.Update() -> saves Label="A"
  Task B runs: Update() on B's stale copy -> saves Label="B",
               silently discarding A's Label change and potentially
               reconciling a Revision built against the pre-A state.

Fix: remove all field mutations and the LoadComplete call from the HTTP
handler.  Inside the task, a fresh taskCollectionFactory is created, the
published repo is re-read via ByStoragePrefixDistribution + LoadComplete
(obtaining the current DB state after the lock is held), and then all
field mutations are applied before Update / Publish / collection.Update.
2026-05-23 13:54:50 +02:00
André Roth 2a5992c74e fix(publish): reload published inside task for source-management endpoints
Affected endpoints: apiPublishAddSource, apiPublishSetSources,
apiPublishUpdateSource, apiPublishRemoveSource, apiPublishDropChanges.

All five handlers shared the same flawed pattern: they loaded the
published repo from the DB and mutated it (ObtainRevision / DropRevision)
outside the task closure, before the task lock was acquired.  Each task
closure then just wrote back the already-mutated, pre-lock object.

Because the task queue serialises tasks that share a resource key, two
concurrent requests appear safe — but each task closure holds a stale
copy of the object captured before the lock was taken:

  Request A loads published: revision = {}
  Request B loads published: revision = {}   <- same DB state
  A mutates: revision = {main: snap1}
  B mutates: revision = {contrib: snap2}
  Task A runs: saves {main: snap1}           OK
  Task B runs: saves {contrib: snap2}        <- clobbers A's change

Fix: perform only a shallow ByStoragePrefixDistribution outside the task
(for the early 404 response, resource key, and task name).  Inside the
task closure a dedicated taskCollectionFactory is created, the published
repo is re-read fresh from the DB (after the lock is acquired), and
LoadComplete + all mutations + Update are executed against that
authoritative copy.
2026-05-23 13:54:50 +02:00
André Roth 2827620cfe fix(publish): pre-register published repo key before task submission
apiPublishRepoOrSnapshot appended published.Key() to resources inside
the task closure, after maybeRunTaskInBackground had already been called.
The task's locked-resource set is fixed at submission time, so that append
had no effect — the published repo key was never registered as a resource.

Two concurrent POST /api/publish/{prefix} requests for the same
prefix/distribution therefore did not conflict in the task queue: both
ran in parallel, each loaded an empty PublishedRepoCollection from the DB,
both passed CheckDuplicate, and the second Add silently overwrote the first.

Fix: compute the published repo key ("U{storagePrefix}>>{distribution}")
from the already-known storage/prefix/distribution values and append it to
resources before calling maybeRunTaskInBackground, so concurrent creates
for the same destination are serialised by the task queue.  The now-dead
append inside the closure is removed.
2026-05-23 13:54:50 +02:00
André Roth 8dc61cf362 ci: use correct ubuntu 26.04 codename 2026-05-17 10:16:01 +02:00
André Roth 4a9ddbdc34 Merge pull request #1565 from muresan/fix/aptly-crash-db-recover
Crash in aptly db recover
2026-05-15 16:51:51 +02:00
André Roth c316ea9b73 Merge pull request #1567 from aptly-dev/fix/doc-typos
docs: fix typos
2026-05-15 16:49:14 +02:00
André Roth d027a251ba Merge pull request #1571 from aptly-dev/feature/ubu26.04
ci: build for ubuntu 26.04
2026-05-15 16:48:55 +02:00
André Roth 16b6348710 ci: build for ubuntu 26.04 2026-05-15 00:05:35 +02:00
Catalin Muresan 1c1abe6b10 Added tests to please codeconv 2026-05-14 23:33:27 +02:00
Catalin Muresan c4bfbe52ca Fix crash in aptly db recover 2026-05-14 23:33:27 +02:00
André Roth c723fea807 docs: fix typos 2026-05-04 11:35:55 +02:00
André Roth 0d31298f37 Merge pull request #1568 from aptly-dev/fix/launchpad-test-dependency
Fix/launchpad test dependency
2026-05-04 11:30:56 +02:00
André Roth bba6bd7db5 system tests: do not depend on launchpad.net 2026-05-04 11:05:04 +02:00
André Roth faeaad0378 config: allow setting PPA Base URL 2026-05-04 11:05:04 +02:00
André Roth a20eb6866a document prometheus API
* enable in dev and test env
* fix api/repos doc
2026-04-26 23:56:05 +02:00
André Roth 809ab47042 Merge pull request #1559 from linuxdataflow/feat/pls/api-package-count
feat(api): add NumPackages to mirrors/repos/snapshots list responses
2026-04-26 18:39:24 +02:00
André Roth 0b84009b4a tests: add new arguments 2026-04-26 18:37:36 +02:00
Pierig Le Saux 92d7561d49 test(api): add coverage for NumPackages list handlers and error paths 2026-04-26 18:37:36 +02:00
Pierig Le Saux e908531bef feat(api): add NumPackages to mirrors/repos/snapshots list responses
add API response wrappers with NumPackages derived from RefList length; keep show endpoint payloads unchanged for backward compatibility; add API tests for list endpoint NumPackages; update swagger response schemas for list endpoints
2026-04-26 18:37:36 +02:00
André Roth f8620d10b2 Merge pull request #1558 from linuxdataflow/feat/pls/gpg-list-and-delete
list and delete gpg keys
2026-04-26 18:36:30 +02:00
Pierig Le Saux 8be72b48a1 update 2026-04-26 17:44:25 +02:00
Pierig Le Saux 5655480e00 add codecoverage 2026-04-26 17:44:25 +02:00
Pierig Le Saux 3c8defa304 update 2026-04-26 17:44:25 +02:00
Pierig Le Saux 1ed50697ec fix: delete is interactive 2026-04-26 17:44:25 +02:00
Pierig Le Saux 3b432d42b5 documentation 2026-04-26 17:44:25 +02:00
Pierig Le Saux 89e3bdfa07 delete a gpg key 2026-04-26 17:44:25 +02:00
André Roth f8d2d3cb8d fix lint errors 2026-04-26 17:41:12 +02:00
André Roth 01004e19c0 Merge pull request #1546 from aptly-dev/dependabot/go_modules/google.golang.org/grpc-1.79.3
build(deps): bump google.golang.org/grpc from 1.64.1 to 1.79.3
2026-04-26 17:11:45 +02:00
dependabot[bot] 92bb28149c build(deps): bump google.golang.org/grpc from 1.64.1 to 1.79.3
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.64.1 to 1.79.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.64.1...v1.79.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-26 14:43:41 +00:00
André Roth 652210acfa Merge pull request #1554 from aptly-dev/dependabot/go_modules/github.com/aws/aws-sdk-go-v2/service/s3-1.97.3
build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.67.1 to 1.97.3
2026-04-26 16:38:23 +02:00
André Roth 45f3da256b Merge pull request #1543 from PhilipCramer/feat/appstream-mirror-support
Add appstream support
2026-04-26 15:36:07 +02:00
dependabot[bot] 3c5e83366a build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3
Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.67.1 to 1.97.3.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.67.1...service/s3/v1.97.3)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-version: 1.97.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-26 13:12:54 +00:00
Philip Cramer a7a4bb7001 test: improve test coverage for AppStream feature 2026-04-26 15:04:38 +02:00
Philip Cramer 2f7f726d4c fix: reject AppStream flag for flat repos instead of silently skipping 2026-04-26 15:04:38 +02:00
Philip Cramer 43d7284657 docs: update man page and AUTHORS for AppStream support 2026-04-26 15:04:37 +02:00
Philip Cramer 02423af931 fix: prevent db cleanup from deleting AppStream pool files 2026-04-26 15:04:17 +02:00
Philip Cramer 2a228625e2 test: add system test for AppStream publish pass-through 2026-04-26 15:04:17 +02:00
Philip Cramer 16e0302f30 test: update snapshot golden files for AppStream field 2026-04-26 15:04:17 +02:00
Philip Cramer 6ecbc9ba90 test: add system tests for AppStream mirror create, edit, and update 2026-04-26 15:04:17 +02:00
Philip Cramer 7276b9621f feat: add --with-appstream to bash/zsh shell completions 2026-04-26 15:04:17 +02:00
Philip Cramer fb7734b5b0 test: add unit tests for AppStream pass-through feature 2026-04-26 15:04:17 +02:00
Philip Cramer 29c37293b9 feat: wire AppStream support through CLI, API, and publish 2026-04-26 15:04:17 +02:00
Philip Cramer f25ba2e6b0 feat: propagate AppStreamFiles through snapshots 2026-04-26 15:04:17 +02:00