This commit introduces major enhancements to the CI/CD pipeline and testing infrastructure:
CI/CD Improvements:
- Consolidated modern and legacy CI workflows into a single comprehensive pipeline
- Removed all publishing functionality from CI (no longer needed)
- Added 8 new advanced testing jobs for pull requests:
* advanced-coverage: Detailed coverage analysis with base branch comparison
* performance-profile: CPU and memory profiling with benchmarks
* fuzz-test: Automated fuzz testing for supported packages
* deep-analysis: Multiple static analysis tools (shadow, ineffassign, gosec, staticcheck)
* mutation-test: Tests effectiveness of test suite on changed files
* dependency-audit: Security vulnerabilities and outdated dependency checks
* stress-test: Race detection with 100 iterations and parallel testing
* test-report-summary: Aggregates all reports into a single PR comment
- Enabled RUN_LONG_TESTS by default for thorough testing
- Added automatic PR comment generation with all test results
Testing Infrastructure:
- Added comprehensive test files across all packages to improve coverage
- Implemented unit tests for previously untested packages
- Added race condition tests for concurrent operations
- Created integration tests for API endpoints
- Added storage backend tests (etcd, goleveldb)
- Implemented command-line interface tests
Local Testing Support:
- Added act configuration for testing GitHub Actions locally
- Created docker-compose.ci.yml for full CI environment simulation
- Updated CONTRIBUTING.md with detailed local testing instructions
Documentation Updates:
- Added comprehensive CI documentation to CONTRIBUTING.md
- Removed obsolete references to Travis CI
- Updated Go version requirements to 1.24
- Added act usage instructions and examples
Other Improvements:
- Updated .gitignore to exclude coverage reports and build artifacts
- Added test-act.yml workflow for testing act functionality
- Created CI_SUMMARY.md documenting all CI capabilities
These changes transform aptly's CI from a basic testing pipeline into a comprehensive quality assurance system that provides immediate feedback on code quality, performance, security, and test effectiveness.
This commit addresses several critical race conditions and improves the reliability
of etcd operations through better timeout and retry handling.
## Race Condition Fixes
1. **Task Resource Management Bug**
- Fixed incorrect variable usage in task/list.go:78
- Was using completed task's resources instead of idle task's resources
- This caused resource conflicts and potential deadlocks
2. **Database Channel Initialization**
- Added sync.Once pattern to ensure thread-safe channel initialization
- Prevents panic from concurrent access during startup
- Created initDBRequests() function for safe initialization
3. **Published Storage Double-Checked Locking**
- Implemented double-checked locking pattern in GetPublishedStorage
- Reduces lock contention while preventing concurrent initialization
- Improves performance for frequently accessed storage
4. **File Operation Synchronization**
- Created FileLockRegistry in utils/filelock.go
- Prevents concurrent file operations (create, rename, delete, link)
- Implements deadlock prevention for multi-file operations
- Critical for preventing file corruption during parallel publishes
5. **WaitGroup Miscount Prevention**
- Added defer pattern to ensure Done() is always called
- Protects against panics during task execution
- Prevents "negative WaitGroup counter" errors
## etcd Improvements
1. **Timeout Protection**
- Replaced global context.TODO() with per-operation timeout contexts
- Default timeout: 60 seconds (configurable)
- Prevents indefinite hangs when etcd is unresponsive
2. **Environment Variable Configuration**
- APTLY_ETCD_TIMEOUT: Operation timeout (default: 60s)
- APTLY_ETCD_DIAL_TIMEOUT: Connection timeout (default: 60s)
- APTLY_ETCD_KEEPALIVE: Keep-alive timeout (default: 7200s)
- APTLY_ETCD_MAX_MSG_SIZE: Max message size (default: 50MB)
3. **Retry Logic for Read Operations**
- Get operations retry up to 3 times with exponential backoff
- Only retries on temporary/network errors
- Improves reliability without risking data inconsistency
4. **Enhanced Error Logging**
- All etcd errors now logged with operation context
- Replaces silent failures with actionable error messages
- Improves debugging and monitoring capabilities
5. **Increased Message Size Limits**
- Default increased from 10MB to 50MB
- Configurable via environment variable
- Prevents "message too large" errors for large operations
## Testing
- Added comprehensive tests for etcd timeout functionality
- Tests verify context timeout, retry logic, and configuration
- All existing tests pass with the new implementation
## Documentation
- Updated README.rst with etcd configuration section
- Documented all environment variables and their defaults
- Added examples and feature descriptions
These changes significantly improve the reliability and debuggability of aptly
when using etcd as the database backend, while also fixing critical race
conditions that could cause data corruption or service crashes.
This commit addresses critical race conditions that were causing "map write failed"
errors and pod crashes in production environments. The issue occurred when multiple
goroutines accessed shared configuration maps simultaneously without proper synchronization.
Root Cause:
The global utils.Config structure contains several maps (FileSystemPublishRoots,
S3PublishRoots, SwiftPublishRoots, AzurePublishRoots) that were being accessed
directly by concurrent HTTP handlers. While context.Config() uses a mutex, it
returns a pointer to the global config, leaving subsequent map access unprotected.
Changes Made:
1. Added safe accessor methods in utils/config.go:
- GetFileSystemPublishRoots() - returns defensive copy of map
- GetS3PublishRoots() - returns defensive copy of map
- GetSwiftPublishRoots() - returns defensive copy of map
- GetAzurePublishRoots() - returns defensive copy of map
2. Updated API handlers to use safe accessors:
- api/s3.go: apiS3List() now uses GetS3PublishRoots()
- api/router.go: reposListInAPIMode() now uses GetFileSystemPublishRoots()
3. Updated context package storage initialization:
- context/context.go: GetPublishedStorage() now uses safe accessors for all
storage type configurations (filesystem, s3, swift, azure)
Impact:
- Eliminates "concurrent map writes" panics that were causing service instability
- Prevents pod crashes and restarts in Kubernetes environments
- Ensures thread-safe access to configuration maps during concurrent API requests
- Minimal performance overhead (microseconds) from creating map copies
The fix is backward compatible and requires no configuration changes. The defensive
copying approach ensures that even if config maps are modified after initialization
(which shouldn't happen in production), concurrent readers remain safe.
This addresses the production issues observed in lf-aptly-* pods where multiple
parallel publish requests or API calls were triggering race conditions.
This adds support for storing packages directly on Azure, with no truly
"local" (on-disk) repo used. The existing Azure PublishedStorage
implementation was refactored to move the shared code to a separate
context struct, which can then be re-used by the new PackagePool. In
addition, the files package's mockChecksumStorage was made public so
that it could be used in the Azure PackagePool tests as well.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
This adds a new configuration setting: AzurePublishEndpoints, similar
to the existing S3PublishEndpoints and SwiftPublishEndpoints.
For each endpoint, the following has to be defined:
- accountName
- accountKey
- container
- prefix
Azure tests require the following environment variables to be set:
- AZURE_STORAGE_ACCOUNT
- AZURE_STORAGE_ACCESS_KEY
With either of these not set, Azure-specific tests are skipped.
This is spin-off of changes from #459.
Transactions are not being used yet, but batches are updated to work
with the new API.
`database/` package was refactored to split abstract interfaces and
implementation via goleveldb. This should make it easier to implement
new database types.
Apply retries as global, config-level option `downloadRetries` so that
it can be applied to any aptly command which downloads objects.
Unwrap `errors.Wrap` which is used in downloader.
Unwrap `*url.Error` which should be the actual error returned from the
HTTP client, catch more cases, be more specific around failures.
* aptly can sign and verify without issues with GnuPG 1.x and 2.x
* aptly auto-detects GnuPG version and adapts accordingly
* aptly automatically finds suitable GnuPG version
Majority of the work was to get unit-tests which can work with GnuPG 1.x & 2.x.
Locally I've verified that aptly supports GnuPG 1.4.x & 2.2.x. Travis CI
environment is based on trusty, so it runs gpg2 tests with GnuPG 2.0.x.
Configuration parameter gpgProvider now supports three values for GnuPG:
* gpg (same as before, default): use GnuPG 1.x if available (checks gpg, gpg1),
otherwise uses GnuPG 2.x; for aptly users who already have GnuPG 1.x
environment (as it was the only supported version) nothing should change; new
users might start with GnuPG 2.x if that's their installed version
* gpg1 looks for GnuPG 1.x only, fails otherwise
* gpg2 looks for GnuPG 2.x only, fails otherwise
Init is actually never called and I have no clue why it is there if it is
not called.
Take this opportunity to introduce a New function which only does the
helper lookup and panics iff that fails. Panic may be a bit too aggressive,
but seems the most certain way to get out of not finding a suitable gpg1
binary.
There are two fixes here:
1. Abort package download immediately as ^C is pressed.
2. Import all the already downloaded files into package pool,
so that next time mirror is updated, aptly won't download them
once again.
Allow database to be initialized without opening, unify all the
open paths to retry on failure.
In API router make sure open requests are matched with acks in explicit
way.
This also enables re-open attempts in all the aptly commands, so it
should make running aptly CLI much easier now hopefully.
Fix up system tests for oldoldstable ;)