The cleanup phase needs to list out all the files in each component in
order to determine what's still in use. When there's a large number of
sources (e.g. from having many snapshots), the time spent just loading
the package information becomes substantial. However, in many cases,
most of the packages being loaded are actually shared across the
sources; if you're taking frequent snapshots, for instance, most of the
packages in each snapshot will be the same as other snapshots. In these
cases, re-reading the packages repeatedly is just a waste of time.
To improve this, we maintain a list of refs that we know were processed
for each component. When listing the refs from a source, only the ones
that have not yet been processed will be examined. Some tests were also
added specifically to check listing the files in a component.
With this change, listing the files in components on a copy of our
production database went from >10 minutes to ~10 seconds, and the newly
added benchmark went from ~300ms to ~43ms.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
For any action which is multi-step (requires updating more than 1 DB
key), use transaction to make update atomic.
Also pack big chunks of updates (importing packages for importing and
mirror updates) into single transaction to improve aptly performance and
get some isolation.
Note that still layers up (Collections) provide some level of isolation,
so this is going to shine with the future PRs to remove collection
locks.
Spin-off of #459
This is spin-off of changes from #459.
Transactions are not being used yet, but batches are updated to work
with the new API.
`database/` package was refactored to split abstract interfaces and
implementation via goleveldb. This should make it easier to implement
new database types.
Apply retries as global, config-level option `downloadRetries` so that
it can be applied to any aptly command which downloads objects.
Unwrap `errors.Wrap` which is used in downloader.
Unwrap `*url.Error` which should be the actual error returned from the
HTTP client, catch more cases, be more specific around failures.
See #765, #761
Collections were relying on keeping in-memory list of all the objects
for any kind of operation which doesn't scale well the number of
objects in the database.
With this rewrite, objects are loaded only on demand which might
be pessimization in some edge cases but should improve performance
and memory footprint signifcantly.
Allow database to be initialized without opening, unify all the
open paths to retry on failure.
In API router make sure open requests are matched with acks in explicit
way.
This also enables re-open attempts in all the aptly commands, so it
should make running aptly CLI much easier now hopefully.
Fix up system tests for oldoldstable ;)
Break up URL into base part and relative path. Match checksum against relative path
and never against full URL.
This might be fixing security issue if aptly was incorrectly matching against
wrong part of Release file.
`PackageDownloadTask` is just a reference to file now. Whole process
was rewritten to follow pattern: download to temp location inside the pool,
verify/update checksums, import into pool as final step.
This removes a lot of edge cases when aptly internal state might be broken
if updating from rogue mirror.
Also this changes whole memory model: package list/files are kept in memory
now during the duration of `mirror update` command and saved to disk
only in the end.