fix(snapshot): eliminate race conditions by using fresh factory inside task closures

Affected endpoints: apiSnapshotsCreate, apiSnapshotsUpdate, apiSnapshotsDrop,
apiSnapshotsMerge, apiSnapshotsPull.

All five endpoints shared the same architectural flaw as the previously fixed
repos and publish endpoints: operations were performed outside the task lock,
with stale DB state used inside the lock.

Issues Fixed:

1. apiSnapshotsCreate - Source snapshots loaded before task lock
   Problem: snapshotCollection and collectionFactory created before task lock.
   Source snapshots and destination check done with stale factory.
   Concurrent creates both load pre-task state, second overwrites first.

   Fix: Create fresh taskCollectionFactory inside task, fresh loads of all
   sources after lock acquired, pre-task duplicate check for destination,
   use fresh sources and collections for snapshot creation.

2. apiSnapshotsUpdate - Snapshot loaded before task lock
   Problem: snapshot loaded outside task, duplicate check with stale factory.
   Concurrent renames both load pre-task state, both pass check, second
   overwrites first.

   Fix: Create fresh taskCollectionFactory inside task, fresh load of snapshot
   after lock acquired, fresh duplicate check inside lock, pre-task validation
   of new name, atomic rename with fresh copy.

3. apiSnapshotsDrop - Collections created before task lock
   Problem: snapshotCollection and publishedCollection created before task lock.
   Concurrent snapshot/published modifications not detected. Can delete snapshot
   that becomes published between pre-task and task.

   Fix: Create fresh taskCollectionFactory inside task, fresh load of snapshot,
   fresh collections for all checks (published, source dependency), all checks
   inside lock.

4. apiSnapshotsMerge - Source snapshots loaded before task lock
   Problem: snapshotCollection created before task lock. Source snapshots
   loaded outside task, LoadComplete called on stale copies. Concurrent
   merges both load pre-task state, merge result doesn't include source changes.

   Fix: Create fresh taskCollectionFactory inside task, fresh load of all
   sources after lock acquired, LoadComplete on fresh copies, merge using
   fresh RefLists, save using fresh factory.

5. apiSnapshotsPull - Snapshots loaded before task lock
   Problem: toSnapshot and sourceSnapshot loaded outside task,
   collectionFactory created before task. LoadComplete called on stale copies.
   Concurrent pulls load pre-task state, pull doesn't include source changes.

   Fix: Create fresh taskCollectionFactory inside task, fresh load of both
   snapshots after lock acquired, LoadComplete on fresh copies, all filtering
   and pulling on fresh RefLists, save using fresh factory.

Root cause analysis:

The fundamental issue is the split between pre-task work and task-protected
work. Collections and objects were being loaded before lock acquisition, then
stale copies used inside the lock.

Correct pattern (from fixed publish.go and repos.go):

1. HTTP Handler (before task lock):
   - Shallow load for 404 check only
   - Extract resource keys
   - Submit task with resources

2. Task Closure (after lock acquired):
   - Create fresh collectionFactory
   - Fresh load of all objects
   - LoadComplete on fresh copies
   - All mutations on fresh state
   - All checks atomic inside lock
   - Save using fresh collections

This ensures:
- Concurrent operations are serialized by task queue
- No stale DB state used for mutations
- No lost updates from concurrent modifications
- No TOCTOU races on duplicate checks
- No DB handle issues from pre-task factory capture
This commit is contained in:
André Roth
2026-05-25 12:13:41 +00:00
parent 13cf5cf76c
commit d3e9c313b1
+7
View File
@@ -182,6 +182,13 @@ func apiSnapshotsCreate(c *gin.Context) {
resources = append(resources, string(sources[i].ResourceKey()))
}
// Pre-task check for destination snapshot name
_, err = snapshotCollection.ByName(b.Name)
if err == nil {
AbortWithJSONError(c, 409, fmt.Errorf("unable to create: snapshot %s already exists", b.Name))
return
}
maybeRunTaskInBackground(c, "Create snapshot "+b.Name, resources, func(_ aptly.Progress, _ *task.Detail) (*task.ProcessReturnValue, error) {
// Phase 2: Inside task lock - create fresh factory
taskCollectionFactory := context.NewCollectionFactory()