Affected endpoints: apiReposDrop, apiReposPackagesAddDelete,
apiReposPackageFromDir, apiReposCopyPackage, apiReposIncludePackageFromDir,
apiReposEdit, apiReposCreate.
All seven endpoints shared the same architectural flaw as the previously
fixed publish endpoints: operations were performed outside the task lock,
with stale DB state used inside the lock.
Issues Fixed:
1. apiReposDrop - Collections created before task lock
Problem: snapshotCollection, publishedCollection captured from pre-task
factory. Concurrent snapshot/published modifications not detected.
Fix: Create fresh taskCollectionFactory inside task, re-read repo after
lock acquired, use fresh collections for checks.
2. apiReposPackagesAddDelete - Repo and factory stale before lock
Problem: repo loaded outside task, collectionFactory created before lock.
Concurrent add/delete operations both load same pre-task state, last
write wins, packages lost.
Fix: Create fresh taskCollectionFactory inside task, re-read repo after
lock acquired, use fresh factory for all operations.
3. apiReposPackageFromDir - Repo and factory stale before lock
Problem: repo loaded outside task, collectionFactory created before lock.
Concurrent file imports both load same pre-task state, last write wins.
Fix: Create fresh taskCollectionFactory inside task, re-read repo after
lock acquired, use fresh factory for imports.
4. apiReposCopyPackage - Both repos and factory stale before lock
Problem: dstRepo and srcRepo loaded outside task, collectionFactory
created before lock. Concurrent copy operations race on stale state.
Fix: Create fresh taskCollectionFactory inside task, re-read both repos
after lock acquired, use fresh factory for all operations.
5. apiReposIncludePackageFromDir - Repo and factory stale before lock
Problem: repo loaded outside task, collectionFactory created before lock.
Concurrent .changes file processing races on stale state.
Fix: Create fresh taskCollectionFactory inside task, use fresh factory
for import operations.
6. apiReposEdit - No serialization, concurrent modification race
Problem: Direct update without task locking. Two concurrent renames can
both pass duplicate check, second overwrites first.
Fix: Convert to async task. Duplicate check and update now atomic inside
lock, after fresh load from DB.
7. apiReposCreate - No serialization, TOCTOU on duplicate check
Problem: Duplicate check outside task lock, add outside lock. Two
concurrent creates with same name both pass check, second overwrites first.
Fix: Convert to async task. Duplicate check and add now atomic inside
lock, after fresh load from DB.
Root cause analysis:
The fundamental issue is the split between pre-task work and task-protected
work. Collections and objects were being loaded before lock acquisition, then
stale copies used inside the lock.
Correct pattern (now applied consistently across all 7 endpoints):
1. HTTP Handler (before task lock):
- Shallow load for 404 check only
- Extract resource keys
- Submit task with resources
2. Task Closure (after lock acquired):
- Create fresh collectionFactory
- Fresh load of all objects
- LoadComplete on fresh copies
- All mutations on fresh state
- All checks atomic inside lock
- Save using fresh collections
This ensures:
- Concurrent operations are serialized by task queue
- No stale DB state used for mutations
- No lost updates from concurrent modifications
- No TOCTOU races on duplicate checks
- No DB handle issues from pre-task factory capture
add API response wrappers with NumPackages derived from RefList length; keep show endpoint payloads unchanged for backward compatibility; add API tests for list endpoint NumPackages; update swagger response schemas for list endpoints
Do all relevant database reading/modifying inside `maybeRunTaskInBackground`.
Notably, `LoadComplete` will load the reflist of a repo. if this is done outside of a background operation,
the data might be outdated when the background tasks runs.
For any action which is multi-step (requires updating more than 1 DB
key), use transaction to make update atomic.
Also pack big chunks of updates (importing packages for importing and
mirror updates) into single transaction to improve aptly performance and
get some isolation.
Note that still layers up (Collections) provide some level of isolation,
so this is going to shine with the future PRs to remove collection
locks.
Spin-off of #459
LoadComplete() modifies object, so it would cause issues if it runs
concurrently with other methods. Uprage mutex locks to write
locks when LoadComplete() is being used.