updates with contents generation were super syscall-heavy. for each path
in a package (so at least 2-4, but ordinarily >4) we'd do a db.Put in
ContentsIndex which results in one syscall.Write. so, for every package in
a published repo we'd have to do *at least* 2 but ordinarily >4 syscalls.
this gets abysmally slow very quickly depending on the available system
specs.
instead, start a batch inside each package and finish it when we are done
with the package. this should keep the memory footprint negligible, but
reduce the write() calls from N to 1.
on one of KDE's servers I have seen update publishing of 7600 packages go
from ~28s to ~9s when using batch putting on an HDD.
on my local system the same set of packages go from ~14s to ~6s on an SSD.
(all inodes in cache in both cases)
the logic here was wrong.
if we managed to find the link target (the physical index file) pointed to
by our old symlink we want to remove it (this is basically "cleaning up old
index" logic).
previously we'd try to only delete it when the ReadLink came back with
error. which had two serious issues with it:
a) linkTarget was empty, so we basically called Remove("") which would
delete the storage -> root <- directory if the root is a symlink!
b) we'd leak old indexes as the cleanup logic only ran if there was en
error which would ordinarily never be
new code correctly cleans up unless there was an error.
this relates to a previous bugfix of readLink which incorrectly returned
absolute paths ultimately rendering the Remove call also broken.
- new publish calls can now enable AcquireByHash by right away (previously
one would have had to create a new publishing endpoint and then
explicitly switch it to AcquireByHash)
- all json marshals of PublishedRepo now contain AcquireByHash (allows
inspecting if a given endpoint has AcquireByHash enabled already; also
enables verification that a switch/update actually applied a
potential AcquireByHash change
- update all tests to reflect that default state of AcquireByHash
- update creation and switch testing to explicitly toggle AcquireByHash to
make sure state mutation works as expected
The added "aptly publish repo" option "-access-by-hash" publishes
the index files (Packages*, Sources*) also as hardlinked hashes.
Example:
/dists/yakkety/main/binary-amd64/by-hash/SHA512/31833ec39acc...
The Release files indicate this with the option "Acquire-By-Hash: yes"
This is used by apt >= 1.2.0 and prevents the "Hash sum mismatch" race
condition between a server side "aptly publish repo" and "apt-get update"
on a client.
See: http://www.chiark.greenend.org.uk/~cjwatson/blog/no-more-hash-sum-mismatch-errors.html
This implementation uses symlinks in the by-hash/*/ directory for keeping
only two versions of the index files and deleting older files
automatically.
Note: this only works with aptly.FileSystemPublishedStorage
Closes: #536
Signed-off-by: André Roth <neolynx@gmail.com>
There are two fixes here:
1. Abort package download immediately as ^C is pressed.
2. Import all the already downloaded files into package pool,
so that next time mirror is updated, aptly won't download them
once again.
upstream switched the alignment check backend and in doing so fails to run
if the old backend is defined in the config.
also skip alignment linting on a struct we use for byte decoding as we have
no choice in its member order.
Sometimes source packages reference files already present in the pool.
Allow for those file to be omitted when importing packages either via
`repo add` or `repo include`. If file is missing, aptly would make
an attempt to look up file in the package pool (by checksum) and
use it.
Fixes: #278
When `-dep-follow-all-variants` option is enabled, dependency resolving
process shouldn't stop even if dependency is already satisfied - there
mgiht be other ways to satisfy dependency.
Also fix issue with parsing multiarch specs like
`python:any`.
When searching for packages which might satisfy given dependency,
aptly was first returning packages which `Provides` mentioned
name. By default aptly is picking up only first match (unless
follow all variants options is enabled), so `Provides:` takes
precedence over exact package name match.
Invert this logic by searching first for package name match.
Allow skipping unreferenced files cleanup on publish switch/update/drop
via the -skip-cleanup command line option.
Also support API SkipCleanup parameter.
Fixes#570.
While updating mirror, if package file is already in pool path,
field `PoolPath` was left as empty which results in package file
being unavailable later on while publishing.
Allow database to be initialized without opening, unify all the
open paths to retry on failure.
In API router make sure open requests are matched with acks in explicit
way.
This also enables re-open attempts in all the aptly commands, so it
should make running aptly CLI much easier now hopefully.
Fix up system tests for oldoldstable ;)
Break up URL into base part and relative path. Match checksum against relative path
and never against full URL.
This might be fixing security issue if aptly was incorrectly matching against
wrong part of Release file.
This requires splitting up import file phase as separate step in then end,
it should be pretty fast, as it only does file move (hardlink) and
DB update for new checksums.