Commit Graph

230 Commits

Author SHA1 Message Date
Nick Bozhenko fcb9bd7bd6 Add comprehensive test coverage across all packages
- Add unit tests for API endpoints (db, error, files, gpg, graph, metrics, publish, repos, snapshot, storage, task)
- Add command-line interface tests for all cmd subcommands
- Add database tests including race condition tests for etcddb and goleveldb
- Add package management tests (collections, contents, graph, import, index files, dependencies)
- Add infrastructure tests (pgp, systemd activation, task management)
- Add utility tests (checksum, config accessor, logging, sanitize)
- Add race condition tests for concurrent operations

Test coverage improves reliability and helps catch regressions early.
2025-07-16 14:19:04 -04:00
Nick Bozhenko 1693863499 Fix S3 concurrent map writes causing pod crashes
PROBLEM:
- Pod crashes with "fatal error: concurrent map writes" during S3 publications
- Root cause: pathCache map in PublishedStorage accessed without synchronization
- Occurs during concurrent LinkFromPool operations in S3 publishing

SOLUTION:
- Add sync.RWMutex to PublishedStorage struct for thread-safe map access
- Implement double-check locking pattern for cache initialization
- Protect all map operations (read/write/delete) with appropriate locks

CHANGES:
- s3/public.go: Add pathCacheMutex field and protect all map operations
  * Cache initialization with double-check locking in LinkFromPool
  * Read operations protected with RLock/RUnlock
  * Write operations protected with Lock/Unlock
  * Delete operations in Remove() and RemoveDirs() protected

IMPACT:
- Eliminates concurrent map writes panic
- Prevents pod crashes during S3 publications
- Maintains performance with minimal synchronization overhead
- Uses read-write locks allowing concurrent reads while serializing writes
2025-07-15 22:46:37 -04:00
Nick Bozhenko 40ba104838 Add comprehensive CI/CD improvements and test coverage
This commit introduces major enhancements to the CI/CD pipeline and testing infrastructure:

CI/CD Improvements:
- Consolidated modern and legacy CI workflows into a single comprehensive pipeline
- Removed all publishing functionality from CI (no longer needed)
- Added 8 new advanced testing jobs for pull requests:
  * advanced-coverage: Detailed coverage analysis with base branch comparison
  * performance-profile: CPU and memory profiling with benchmarks
  * fuzz-test: Automated fuzz testing for supported packages
  * deep-analysis: Multiple static analysis tools (shadow, ineffassign, gosec, staticcheck)
  * mutation-test: Tests effectiveness of test suite on changed files
  * dependency-audit: Security vulnerabilities and outdated dependency checks
  * stress-test: Race detection with 100 iterations and parallel testing
  * test-report-summary: Aggregates all reports into a single PR comment
- Enabled RUN_LONG_TESTS by default for thorough testing
- Added automatic PR comment generation with all test results

Testing Infrastructure:
- Added comprehensive test files across all packages to improve coverage
- Implemented unit tests for previously untested packages
- Added race condition tests for concurrent operations
- Created integration tests for API endpoints
- Added storage backend tests (etcd, goleveldb)
- Implemented command-line interface tests

Local Testing Support:
- Added act configuration for testing GitHub Actions locally
- Created docker-compose.ci.yml for full CI environment simulation
- Updated CONTRIBUTING.md with detailed local testing instructions

Documentation Updates:
- Added comprehensive CI documentation to CONTRIBUTING.md
- Removed obsolete references to Travis CI
- Updated Go version requirements to 1.24
- Added act usage instructions and examples

Other Improvements:
- Updated .gitignore to exclude coverage reports and build artifacts
- Added test-act.yml workflow for testing act functionality
- Created CI_SUMMARY.md documenting all CI capabilities

These changes transform aptly's CI from a basic testing pipeline into a comprehensive quality assurance system that provides immediate feedback on code quality, performance, security, and test effectiveness.
2025-07-10 12:00:54 -04:00
Nick Bozhenko 463c34a38e Fix race conditions and improve etcd timeout handling
This commit addresses several critical race conditions and improves the reliability
of etcd operations through better timeout and retry handling.

## Race Condition Fixes

1. **Task Resource Management Bug**
   - Fixed incorrect variable usage in task/list.go:78
   - Was using completed task's resources instead of idle task's resources
   - This caused resource conflicts and potential deadlocks

2. **Database Channel Initialization**
   - Added sync.Once pattern to ensure thread-safe channel initialization
   - Prevents panic from concurrent access during startup
   - Created initDBRequests() function for safe initialization

3. **Published Storage Double-Checked Locking**
   - Implemented double-checked locking pattern in GetPublishedStorage
   - Reduces lock contention while preventing concurrent initialization
   - Improves performance for frequently accessed storage

4. **File Operation Synchronization**
   - Created FileLockRegistry in utils/filelock.go
   - Prevents concurrent file operations (create, rename, delete, link)
   - Implements deadlock prevention for multi-file operations
   - Critical for preventing file corruption during parallel publishes

5. **WaitGroup Miscount Prevention**
   - Added defer pattern to ensure Done() is always called
   - Protects against panics during task execution
   - Prevents "negative WaitGroup counter" errors

## etcd Improvements

1. **Timeout Protection**
   - Replaced global context.TODO() with per-operation timeout contexts
   - Default timeout: 60 seconds (configurable)
   - Prevents indefinite hangs when etcd is unresponsive

2. **Environment Variable Configuration**
   - APTLY_ETCD_TIMEOUT: Operation timeout (default: 60s)
   - APTLY_ETCD_DIAL_TIMEOUT: Connection timeout (default: 60s)
   - APTLY_ETCD_KEEPALIVE: Keep-alive timeout (default: 7200s)
   - APTLY_ETCD_MAX_MSG_SIZE: Max message size (default: 50MB)

3. **Retry Logic for Read Operations**
   - Get operations retry up to 3 times with exponential backoff
   - Only retries on temporary/network errors
   - Improves reliability without risking data inconsistency

4. **Enhanced Error Logging**
   - All etcd errors now logged with operation context
   - Replaces silent failures with actionable error messages
   - Improves debugging and monitoring capabilities

5. **Increased Message Size Limits**
   - Default increased from 10MB to 50MB
   - Configurable via environment variable
   - Prevents "message too large" errors for large operations

## Testing

- Added comprehensive tests for etcd timeout functionality
- Tests verify context timeout, retry logic, and configuration
- All existing tests pass with the new implementation

## Documentation

- Updated README.rst with etcd configuration section
- Documented all environment variables and their defaults
- Added examples and feature descriptions

These changes significantly improve the reliability and debuggability of aptly
when using etcd as the database backend, while also fixing critical race
conditions that could cause data corruption or service crashes.
2025-07-10 10:05:49 -04:00
Nick Bozhenko 660cee2ce3 Fix concurrent map access race conditions in config publish roots
This commit addresses critical race conditions that were causing "map write failed"
errors and pod crashes in production environments. The issue occurred when multiple
goroutines accessed shared configuration maps simultaneously without proper synchronization.

Root Cause:
The global utils.Config structure contains several maps (FileSystemPublishRoots,
S3PublishRoots, SwiftPublishRoots, AzurePublishRoots) that were being accessed
directly by concurrent HTTP handlers. While context.Config() uses a mutex, it
returns a pointer to the global config, leaving subsequent map access unprotected.

Changes Made:

1. Added safe accessor methods in utils/config.go:
   - GetFileSystemPublishRoots() - returns defensive copy of map
   - GetS3PublishRoots() - returns defensive copy of map
   - GetSwiftPublishRoots() - returns defensive copy of map
   - GetAzurePublishRoots() - returns defensive copy of map

2. Updated API handlers to use safe accessors:
   - api/s3.go: apiS3List() now uses GetS3PublishRoots()
   - api/router.go: reposListInAPIMode() now uses GetFileSystemPublishRoots()

3. Updated context package storage initialization:
   - context/context.go: GetPublishedStorage() now uses safe accessors for all
     storage type configurations (filesystem, s3, swift, azure)

Impact:
- Eliminates "concurrent map writes" panics that were causing service instability
- Prevents pod crashes and restarts in Kubernetes environments
- Ensures thread-safe access to configuration maps during concurrent API requests
- Minimal performance overhead (microseconds) from creating map copies

The fix is backward compatible and requires no configuration changes. The defensive
copying approach ensures that even if config maps are modified after initialization
(which shouldn't happen in production), concurrent readers remain safe.

This addresses the production issues observed in lf-aptly-* pods where multiple
parallel publish requests or API calls were triggering race conditions.
2025-07-10 01:35:09 -04:00
André Roth ad4d0c7b96 doc: add swagger doc for /api/gpg/key
- cleanup swagger validation errors
2025-06-08 14:24:27 +02:00
André Roth f7057a9517 go1.24: fix lint, unit and system tests
- development env: base on debian trixie with go1.24
- lint: run with default config
- fix lint errors
- fix unit tests
- fix system test
2025-04-26 13:29:50 +02:00
André Roth c07bf2b108 s3: add debug logs for commands
* initialize zerolog for commands
* Change default log format: remote colors and timestamp
2025-04-24 12:13:38 +02:00
André Roth e062df68c5 go1.23: update golangci-lint version
and fix warnings.
2025-04-20 20:32:55 +02:00
André Roth 9abbd74a9f improve doc
do not set default value for FromSnapshot when creating a repo
2024-12-21 20:23:52 +01:00
André Roth 93650efddb Merge pull request #1404 from schoenherrg/fix/with-sources-ignored
Fix `-with-sources` not fetching differently named source packages
2024-12-11 13:01:30 +01:00
André Roth e319f3cd14 update doc
make descrptions consistent
2024-12-11 11:19:46 +01:00
André Roth 1f469e23b5 fix optional params 2024-12-11 10:40:44 +01:00
André Roth d8b9777b40 swagger: document params 2024-12-11 10:40:44 +01:00
André Roth e5e3c49ace swagger: document async 2024-12-11 10:40:44 +01:00
André Roth c6e0a06b14 swagger: cleanup 2024-12-11 10:40:44 +01:00
André Roth 75e5f95277 task-dummy: remove internal testing API 2024-12-11 10:40:44 +01:00
André Roth 4ff3c894fa swagger: cleanup Snapshots 2024-12-11 10:40:44 +01:00
André Roth abfad37640 swagger: cleanup files doc 2024-12-11 10:40:44 +01:00
André Roth a69c00a5bc swagger: improve layout
and fix lint
2024-12-11 10:40:44 +01:00
André Roth 4f229a5bcf update doc 2024-12-11 10:40:44 +01:00
André Roth 397362bb1a fix swagger build 2024-12-11 10:40:44 +01:00
iofq d5571c41c7 Update files api docs 2024-12-11 10:40:44 +01:00
iofq 39921809ee Update db api docs 2024-12-11 10:40:44 +01:00
iofq 68fe2bc852 Update gpg, graph api docs 2024-12-11 10:40:44 +01:00
iofq 398fec13b0 Update packages api docs 2024-12-11 10:40:44 +01:00
iofq 9fc7ebdac2 Update repos, task, snapshot api docs 2024-12-11 10:40:44 +01:00
André Roth 2171c05ef8 fix lint 2024-12-11 10:40:44 +01:00
André Roth 8f8de4bd29 update 2024-12-11 10:40:44 +01:00
André Roth 9b8f6b1d56 fix conflict 2024-12-11 10:40:43 +01:00
André Roth 69a1e2561d docs: improve swagger
- use markdown files in swagger
- automate version, use swager.conf template
- embed swagger ui index.html as docs.html
2024-12-11 10:40:43 +01:00
André Roth ba86851d07 add api documentation stubs 2024-12-11 10:40:43 +01:00
Gordian Schoenherr 3b785e4165 Refactor Filter options into a struct
It was already a lot of options for one method and I am going to add
another one in the next commit.
2024-12-09 13:17:41 +09:00
André Roth 9ca9569714 fix build and golangci-lint 2024-11-17 14:09:37 +01:00
Mauro Regli 1357d246d8 rename addon files to skel files 2024-11-17 14:09:37 +01:00
Mauro Regli c75c2c7594 pass down addonpath from api and cmd context 2024-11-17 14:09:37 +01:00
André Roth eafec74c29 allow to exclude provided packages from list.Search 2024-11-04 17:02:54 +01:00
André Roth f79423a4ee update swagger documentation 2024-11-01 17:48:03 +01:00
André Roth eb94211053 fix race conditions 2024-11-01 17:48:03 +01:00
André Roth bd01cd4033 update swagger documentation 2024-11-01 17:48:03 +01:00
Christoph Fiehe 451de79666 Improve consistency between API and Swagger docs.
Signed-off-by: Christoph Fiehe <c.fiehe@eurodata.de>
2024-11-01 17:48:03 +01:00
André Roth 755fdfaca2 update swagger documentation
- add default values
-  set default values
2024-11-01 17:48:03 +01:00
André Roth f4057850b9 fix compile and lint errors 2024-11-01 17:47:50 +01:00
André Roth 4d6688d68e sanitize archs 2024-10-22 16:58:15 +02:00
Christoph Fiehe 7a7ff1142c Minor code and documentation changes.
Signed-off-by: Christoph Fiehe <c.fiehe@eurodata.de>
2024-10-22 16:58:15 +02:00
Christoph Fiehe 8cceed12f7 Fix tests.
Signed-off-by: Christoph Fiehe <c.fiehe@eurodata.de>
2024-10-22 16:58:15 +02:00
Christoph Fiehe f8f28e9554 Fixing tests and fix cleanup.
Signed-off-by: Christoph Fiehe <c.fiehe@eurodata.de>
2024-10-22 16:58:15 +02:00
Christoph Fiehe ac5ecf946d Cleanup improved and code redundant code removed.
Signed-off-by: Christoph Fiehe <c.fiehe@eurodata.de>
2024-10-22 16:58:15 +02:00
Christoph Fiehe d87d8bac92 Fix test cases.
Signed-off-by: Christoph Fiehe <c.fiehe@eurodata.de>
2024-10-22 16:58:15 +02:00
Christoph Fiehe 14c29ff912 Fixing tests.
Signed-off-by: Christoph Fiehe <c.fiehe@eurodata.de>
2024-10-22 16:58:15 +02:00