Commit Graph

2363 Commits

Author SHA1 Message Date
André Roth
d8c1e432c6 add test 2024-07-03 18:08:58 +02:00
André Roth
3a286ae07f fix unit tests 2024-07-03 18:08:58 +02:00
André Roth
a93ccd4100 fix tests 2024-07-03 18:08:58 +02:00
André Roth
c1f7e5fe96 handle GpgDisableVerify and ignore-signatures consistently
and be less verbose
2024-07-03 18:08:58 +02:00
André Roth
d16110068c allow not signed mirrors without InRelease file 2024-07-03 18:08:58 +02:00
André Roth
4661913265 add system test for maximumVersion filter 2024-06-24 17:44:40 +02:00
hudeng
ecc88e7a40 feat: repo and snapshots packages filter api add 'maximumVersion' query parameter support
example: `curl http://localhost:8080/api/repos/test/packages\?maximumVersion\=1`

Change-Id: Ie9ffd36146bf017bbb353737f32360f7b73d6b0a
2024-06-24 17:44:40 +02:00
André Roth
5cf8c54cb2 fix test 2024-06-20 23:40:46 +02:00
André Roth
b758033ccb fix compilation 2024-06-20 23:40:46 +02:00
Kevin Martin
13f4bb441d Check if S3 bucket is encrypted by default.
Adds check to see if the S3 bucket is encrypted by default. If so this
uses the existing workaround for object etags not matching file MD5s.
2024-06-20 23:40:46 +02:00
Kevin Martin
1af09069f7 Check both MD5 locations for S3 KMS support.
If the S3 bucket used to house a repo has KMS encryption enabled then
the etag of an object may not match the MD5 of the file. This may
cause an incorrect error to be reported stating the file already
exists and is different.

A mechanism exists to work around this issue by using the MD5 stored
in object metadata. This check doesn't always cover the case where KMS
is enabled as the fallback is only used if the etag is not 32
characters long.

This commit changes the fallback mechanism so that it is used in any
case where the object's etag does not match the source MD5. This will
incur a performance penalty of an extra head request for each object
with a mismatch.
2024-06-20 23:40:46 +02:00
Ryan Gonzalez
b5bf2cbcda Fix functional tests' '--capture' on Python 3
None of the commands' output is ever treated as binary, so we can just
always decode it as text.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
André Roth
fb3436b23d fix golangci-lint errors 2024-06-17 11:51:18 +02:00
André Roth
1a3cfea348 replace io/ioutil
fixes golangci-lint errors
2024-06-17 11:51:18 +02:00
André Roth
c33141d49b fix golangci-lint and compilation errors 2024-06-17 11:51:18 +02:00
Ryan Gonzalez
f9325fbc91 Add support for Azure package pools
This adds support for storing packages directly on Azure, with no truly
"local" (on-disk) repo used. The existing Azure PublishedStorage
implementation was refactored to move the shared code to a separate
context struct, which can then be re-used by the new PackagePool. In
addition, the files package's mockChecksumStorage was made public so
that it could be used in the Azure PackagePool tests as well.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
Ryan Gonzalez
810df17009 Clean up temporary files when mirroring
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
Ryan Gonzalez
19255debb9 Reduce required usage of LocalPackagePool
Several sections of the code *required* a LocalPackagePool, but they
could still perform their operations with a standard PackagePool.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
Ryan Gonzalez
1ebd37f9ad Move Stat() into LocalPackagePool
The contents of `os.Stat` are rather fitted towards local package pools,
but the method is in the generic PackagePool interface. This moves it to
LocalPackagePool, and the use case of simply finding a file's size is
delegated to a new, more generic PackagePool.Size() method.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
Ryan Gonzalez
8e37813129 Add support for custom package pool locations
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
Ryan Gonzalez
91574b53d9 Add functional tests for Azure publishing
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
Ryan Gonzalez
b8e0aba3cf Document Azure configuration
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
Ryan Gonzalez
0eccaf2b60 Change Azure endpoint syntax to require a full URL
Before, a "partial" URL (either "localhost:port" or an endpoint URL
*without* the account name as the subdomain) would be specified, and the
full one would automatically be inferred. Although this is somewhat
nice, it means that the endpoint string doesn't match the official Azure
syntax:

https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string

This also raises issues for the creation of functional tests for Azure,
as the code to determine the endpoint string needs to be duplicated
there as well.

Instead, it's just easiest to follow Azure's own standard, and then
sidestep the need for any custom logic in the functional tests.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
Ryan Gonzalez
859edb10cd Fix S3 tests on Python 3
read_path() can read in binary, which the S3 tests don't support (simply
because they don't need it)...but it needs to be able to take the `mode`
argument anyway.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-06-17 11:51:18 +02:00
André Roth
f0bf519d36 cleanup 2024-06-15 19:18:14 +02:00
André Roth
d8cfafccc9 system tests: improve log output 2024-06-15 19:18:14 +02:00
André Roth
34c1c1f83a system-tests: make docker abortable and use colors 2024-06-15 19:18:14 +02:00
André Roth
3e1485faf5 queue sync calls 2024-06-15 19:18:14 +02:00
André Roth
efc5573b8e Makefile: run unit tests in docker 2024-06-15 19:18:14 +02:00
André Roth
45035802be implement task queue waiting for resources 2024-06-15 19:18:14 +02:00
André Roth
2d97ba2bbd fix flake8 error 2024-06-15 19:18:14 +02:00
André Roth
05fed16f6d fix golangci-lint error 2024-06-15 19:18:14 +02:00
Ramón N.Rodriguez
9b1f4272c0 AUTHORS: claiming the fame and glory 2024-06-15 19:18:14 +02:00
Ramón N.Rodriguez
1987220f1e api: publish: block on concurrent calls
This commit blocks concurrent calls to RunTaskInBackground which is
intended to fix the quirky behaviour where concurrent PUT calls to
api/publish/<prefix>/<distribution> would immedietly reuturn an error.

The solution proposed in this commit is not elegant and probaly has
unintended side-effects. The intention of this commit is to highlight
the area that actually needs to be addressed.
Ideally this patch is amended or dropped entierly in favor of a better
fixup.
2024-06-15 19:18:14 +02:00
Ramón N.Rodriguez
47696a3303 api:publish: test concurrency
This commit introduces a test which runs concurrent publishes (which
could be parallell with multiproccessing, python is fun).
The test purposly fails (at the point in history that this patch is
written) in order to make it as easy as possible to verify later patches,
which hopefully addresses concurrency problems.

The same behaviour can easily be tested outside of the system tests with
the following (or similar) shell

$ aptly serve -listen=:8080 -no-lock
$ aptly repo create create -distributions=testing local-repo
$ atply publish repo -architectures=amd64 local-repo
$ apt download aptly
$ aptly repo add local-repo ./aptly*.deb
$ for _ in $(seq 10); do curl -X PUT 127.0.0.1:8080/api/publish//testing

In the local testing perfomed (on a dual core vm) the first 1-4 jobs
would typically succeed and the rest would error out.
2024-06-15 19:18:14 +02:00
André Roth
787f954833 mirror: do not download already downloaded packages
this change imports downloaded packages into the pool immediately after download.
in case mirroring is aborted and later resumed, already downloaded packages will not be downloaded anymore.
2024-06-15 16:15:23 +02:00
André Roth
e9bdb983c8 tasks: improve log level 2024-06-15 16:15:23 +02:00
Noa Resare
b4cd86aa14 Introduce option multi-dist to the publish commands
This change makes it possible to publish multiple distributions
with packages named the same but with different content by changing
structure of the generated pool hierarchy. The option not enabled
by default as this changes the structure of the output which could
break the expectations of other tools.
2024-06-15 11:27:26 +02:00
Cal Jurgella
4bd26f5977 Enable SkipArchitectureCheck and IgnoreSignatures in mirror API 2024-06-14 14:30:29 +02:00
André Roth
57ff7c69cd mirror api: add test for SkipArchitectureCheck and SkipComponentCheck 2024-06-14 14:30:29 +02:00
André Roth
c843709eab add github sponsors 2024-06-12 13:01:40 +02:00
Jens Reinsberger
4bc2180eed fix failing build on hosts with wildcard dns
on hosts which have wildcard dns domains in their local domain search
list, builds failed because "nosuch.host" could actually be resolved.

Since ".host" isn't a recommended TLD by RFC2606, we use ".invalid" now.
And since this is not enough to fix the problem, we use now absoulte
domain names (having a '.' at the end)
2024-05-15 16:42:37 +02:00
Paul Cacheux
5353890b24 use tagged version of ProtonMail/go-crypto in go.mod 2024-05-15 16:41:35 +02:00
Ryan Gonzalez
79975bf2b6 Fix reflist diffs failing to compact when one of the inputs ends
The previous reflist logic would early-exit the loop body if one of the
lists was empty, but that skips the compacting logic entirely.

Instead of doing the early-exit, we can leave a list's ref as nil when
the list end is reached and then flip the comparison result, which will
essentially treat it as being greater than all others. This should
preserve the general behavior without omitting the compaction.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-04-24 17:36:36 +02:00
Ryan Gonzalez
8d09c202db Skip loading reflists when listing published repos
The output doesn't actually depend on the reflists, and loading them for
every published repo starts to take substantial time and memory.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-04-24 17:35:44 +02:00
André Roth
27013c0b2b apply go mod tidy 2024-04-24 16:46:16 +02:00
André Roth
756c499314 fix go mod tidy and use go 1.19
let's be compatible with debian/bookworm
2024-04-24 16:46:16 +02:00
Ryan Gonzalez
6c6d1b18ba Use zero-copy decoding for reflists
Reflists are basically stored as arrays of strings, which are quite
space-efficient in MessagePack. Thus, using zero-copy decoding results
in nice performance and memory savings, because the overhead of separate
allocations ends up far exceeding the overhead of the original slice.

With the included benchmark run for 20s with -benchmem, the runtime,
memory usage, and allocations go from ~740us/op, ~192KiB/op, and 4100
allocs/op to ~240us/op, ~97KiB/op, and 13 allocs/op, respectively.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-04-24 16:46:16 +02:00
Ryan Gonzalez
8cb1236a8c Improve publish cleanup perf when sources share most of their packages
The cleanup phase needs to list out all the files in each component in
order to determine what's still in use. When there's a large number of
sources (e.g. from having many snapshots), the time spent just loading
the package information becomes substantial. However, in many cases,
most of the packages being loaded are actually shared across the
sources; if you're taking frequent snapshots, for instance, most of the
packages in each snapshot will be the same as other snapshots. In these
cases, re-reading the packages repeatedly is just a waste of time.

To improve this, we maintain a list of refs that we know were processed
for each component. When listing the refs from a source, only the ones
that have not yet been processed will be examined. Some tests were also
added specifically to check listing the files in a component.

With this change, listing the files in components on a copy of our
production database went from >10 minutes to ~10 seconds, and the newly
added benchmark went from ~300ms to ~43ms.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-04-24 16:46:16 +02:00
Ryan Gonzalez
5636a9990b Improve performance of simple reflist merges
When merging reflists with ignoreConflicting set to true and
overrideMatching set to false, the individual ref components are never
examined, but the refs are still split anyway. Avoiding the split when
we never use the components brings a massive speedup: on my system, the
included benchmark goes from ~1500 us/it to ~180 us/it.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2024-04-24 16:46:16 +02:00