The pre-task validation in apiSnapshotsUpdate was incorrectly rejecting
PUT requests that set the Name to the snapshot's current name. This caused
a 409 response before creating a task, which broke the system test
SnapshotsAPITestCreateUpdate that expects a task to be created and then
fail inside the task.
The fix restores the 'b.Name != name' condition in the pre-task check so
that same-name updates pass through to the task, where the in-task
duplicate check will properly fail them (returning a failed task state
instead of a direct 409).
The SnapshotsAPITestCreateUpdate test expects that PUT /api/snapshots/:name
with the same Name in the body returns a conflict error. The previous fix
added 'b.Name != name' guards to skip the duplicate check when the name
hasn't changed, but this broke the test which expects the old behavior:
any existing name (including the snapshot's own current name) should be
rejected as a duplicate.
Remove the 'b.Name != name' condition from both the pre-task validation
and the in-task duplicate check so the behavior matches the original.
Affected endpoints: apiSnapshotsCreate, apiSnapshotsUpdate, apiSnapshotsDrop,
apiSnapshotsMerge, apiSnapshotsPull.
All five endpoints shared the same architectural flaw as the previously fixed
repos and publish endpoints: operations were performed outside the task lock,
with stale DB state used inside the lock.
Issues Fixed:
1. apiSnapshotsCreate - Source snapshots loaded before task lock
Problem: snapshotCollection and collectionFactory created before task lock.
Source snapshots and destination check done with stale factory.
Concurrent creates both load pre-task state, second overwrites first.
Fix: Create fresh taskCollectionFactory inside task, fresh loads of all
sources after lock acquired, pre-task duplicate check for destination,
use fresh sources and collections for snapshot creation.
2. apiSnapshotsUpdate - Snapshot loaded before task lock
Problem: snapshot loaded outside task, duplicate check with stale factory.
Concurrent renames both load pre-task state, both pass check, second
overwrites first.
Fix: Create fresh taskCollectionFactory inside task, fresh load of snapshot
after lock acquired, fresh duplicate check inside lock, pre-task validation
of new name, atomic rename with fresh copy.
3. apiSnapshotsDrop - Collections created before task lock
Problem: snapshotCollection and publishedCollection created before task lock.
Concurrent snapshot/published modifications not detected. Can delete snapshot
that becomes published between pre-task and task.
Fix: Create fresh taskCollectionFactory inside task, fresh load of snapshot,
fresh collections for all checks (published, source dependency), all checks
inside lock.
4. apiSnapshotsMerge - Source snapshots loaded before task lock
Problem: snapshotCollection created before task lock. Source snapshots
loaded outside task, LoadComplete called on stale copies. Concurrent
merges both load pre-task state, merge result doesn't include source changes.
Fix: Create fresh taskCollectionFactory inside task, fresh load of all
sources after lock acquired, LoadComplete on fresh copies, merge using
fresh RefLists, save using fresh factory.
5. apiSnapshotsPull - Snapshots loaded before task lock
Problem: toSnapshot and sourceSnapshot loaded outside task,
collectionFactory created before task. LoadComplete called on stale copies.
Concurrent pulls load pre-task state, pull doesn't include source changes.
Fix: Create fresh taskCollectionFactory inside task, fresh load of both
snapshots after lock acquired, LoadComplete on fresh copies, all filtering
and pulling on fresh RefLists, save using fresh factory.
Root cause analysis:
The fundamental issue is the split between pre-task work and task-protected
work. Collections and objects were being loaded before lock acquisition, then
stale copies used inside the lock.
Correct pattern (from fixed publish.go and repos.go):
1. HTTP Handler (before task lock):
- Shallow load for 404 check only
- Extract resource keys
- Submit task with resources
2. Task Closure (after lock acquired):
- Create fresh collectionFactory
- Fresh load of all objects
- LoadComplete on fresh copies
- All mutations on fresh state
- All checks atomic inside lock
- Save using fresh collections
This ensures:
- Concurrent operations are serialized by task queue
- No stale DB state used for mutations
- No lost updates from concurrent modifications
- No TOCTOU races on duplicate checks
- No DB handle issues from pre-task factory capture
add API response wrappers with NumPackages derived from RefList length; keep show endpoint payloads unchanged for backward compatibility; add API tests for list endpoint NumPackages; update swagger response schemas for list endpoints
Do all relevant database reading/modifying inside `maybeRunTaskInBackground`.
Notably, `LoadComplete` will load the reflist of a repo. if this is done outside of a background operation,
the data might be outdated when the background tasks runs.
LoadComplete() modifies object, so it would cause issues if it runs
concurrently with other methods. Uprage mutex locks to write
locks when LoadComplete() is being used.