This adds support for storing packages directly on Azure, with no truly
"local" (on-disk) repo used. The existing Azure PublishedStorage
implementation was refactored to move the shared code to a separate
context struct, which can then be re-used by the new PackagePool. In
addition, the files package's mockChecksumStorage was made public so
that it could be used in the Azure PackagePool tests as well.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Several sections of the code *required* a LocalPackagePool, but they
could still perform their operations with a standard PackagePool.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
The contents of `os.Stat` are rather fitted towards local package pools,
but the method is in the generic PackagePool interface. This moves it to
LocalPackagePool, and the use case of simply finding a file's size is
delegated to a new, more generic PackagePool.Size() method.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Before, a "partial" URL (either "localhost:port" or an endpoint URL
*without* the account name as the subdomain) would be specified, and the
full one would automatically be inferred. Although this is somewhat
nice, it means that the endpoint string doesn't match the official Azure
syntax:
https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string
This also raises issues for the creation of functional tests for Azure,
as the code to determine the endpoint string needs to be duplicated
there as well.
Instead, it's just easiest to follow Azure's own standard, and then
sidestep the need for any custom logic in the functional tests.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
read_path() can read in binary, which the S3 tests don't support (simply
because they don't need it)...but it needs to be able to take the `mode`
argument anyway.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
This commit blocks concurrent calls to RunTaskInBackground which is
intended to fix the quirky behaviour where concurrent PUT calls to
api/publish/<prefix>/<distribution> would immedietly reuturn an error.
The solution proposed in this commit is not elegant and probaly has
unintended side-effects. The intention of this commit is to highlight
the area that actually needs to be addressed.
Ideally this patch is amended or dropped entierly in favor of a better
fixup.
This commit introduces a test which runs concurrent publishes (which
could be parallell with multiproccessing, python is fun).
The test purposly fails (at the point in history that this patch is
written) in order to make it as easy as possible to verify later patches,
which hopefully addresses concurrency problems.
The same behaviour can easily be tested outside of the system tests with
the following (or similar) shell
$ aptly serve -listen=:8080 -no-lock
$ aptly repo create create -distributions=testing local-repo
$ atply publish repo -architectures=amd64 local-repo
$ apt download aptly
$ aptly repo add local-repo ./aptly*.deb
$ for _ in $(seq 10); do curl -X PUT 127.0.0.1:8080/api/publish//testing
In the local testing perfomed (on a dual core vm) the first 1-4 jobs
would typically succeed and the rest would error out.
this change imports downloaded packages into the pool immediately after download.
in case mirroring is aborted and later resumed, already downloaded packages will not be downloaded anymore.
This change makes it possible to publish multiple distributions
with packages named the same but with different content by changing
structure of the generated pool hierarchy. The option not enabled
by default as this changes the structure of the output which could
break the expectations of other tools.
on hosts which have wildcard dns domains in their local domain search
list, builds failed because "nosuch.host" could actually be resolved.
Since ".host" isn't a recommended TLD by RFC2606, we use ".invalid" now.
And since this is not enough to fix the problem, we use now absoulte
domain names (having a '.' at the end)
The previous reflist logic would early-exit the loop body if one of the
lists was empty, but that skips the compacting logic entirely.
Instead of doing the early-exit, we can leave a list's ref as nil when
the list end is reached and then flip the comparison result, which will
essentially treat it as being greater than all others. This should
preserve the general behavior without omitting the compaction.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
The output doesn't actually depend on the reflists, and loading them for
every published repo starts to take substantial time and memory.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Reflists are basically stored as arrays of strings, which are quite
space-efficient in MessagePack. Thus, using zero-copy decoding results
in nice performance and memory savings, because the overhead of separate
allocations ends up far exceeding the overhead of the original slice.
With the included benchmark run for 20s with -benchmem, the runtime,
memory usage, and allocations go from ~740us/op, ~192KiB/op, and 4100
allocs/op to ~240us/op, ~97KiB/op, and 13 allocs/op, respectively.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
The cleanup phase needs to list out all the files in each component in
order to determine what's still in use. When there's a large number of
sources (e.g. from having many snapshots), the time spent just loading
the package information becomes substantial. However, in many cases,
most of the packages being loaded are actually shared across the
sources; if you're taking frequent snapshots, for instance, most of the
packages in each snapshot will be the same as other snapshots. In these
cases, re-reading the packages repeatedly is just a waste of time.
To improve this, we maintain a list of refs that we know were processed
for each component. When listing the refs from a source, only the ones
that have not yet been processed will be examined. Some tests were also
added specifically to check listing the files in a component.
With this change, listing the files in components on a copy of our
production database went from >10 minutes to ~10 seconds, and the newly
added benchmark went from ~300ms to ~43ms.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
When merging reflists with ignoreConflicting set to true and
overrideMatching set to false, the individual ref components are never
examined, but the refs are still split anyway. Avoiding the split when
we never use the components brings a massive speedup: on my system, the
included benchmark goes from ~1500 us/it to ~180 us/it.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
In some local tests w/ a slowed down filesystem, this massively cut down
on the time to clean up a repository by ~3x, bringing a total 'publish
update' time from ~16s to ~13s.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
- use s3 mirror instead of internet download
- reduce download verbosity
- do not use venv in docker-system-tests
- be more verbose on test output
- do not run golangci-lint in system-tests