Commit Graph

420 Commits

Author SHA1 Message Date
William Manley
86dc10028f Stanza.WriteTo: Sort extra fields alphabetically
This makes the output deterministic.  This is important to me as I am
using `Packages` index files as a kind of lockfile and committing it
to my git repository.  Without this we get a lot of noise in the diff
whenever the file is regenerated because
[go randomises map iteration order][1].

[1]: https://nathanleclaire.com/blog/2014/04/27/a-surprising-feature-of-golang-that-colored-me-impressed/
2019-01-08 15:12:34 +00:00
Oliver Sauder
e23e30eb44 Merge branch 'master' into with_installer 2018-09-21 13:26:15 +02:00
Andrey Smirnov
699323e2e0 Reimplement DB collections for mirrors, repos and snapshots
See #765, #761

Collections were relying on keeping in-memory list of all the objects
for any kind of operation which doesn't scale well the number of
objects in the database.

With this rewrite, objects are loaded only on demand which might
be pessimization in some edge cases but should improve performance
and memory footprint signifcantly.
2018-08-21 01:08:14 +03:00
Andrey Smirnov
de38011dd2 Add simple benchmark for SnapshotCollection.ForEach() 2018-08-14 00:56:15 +03:00
Andrey Smirnov
0f4bbc4752 Implement lazy iteration (ForEach) over collections
See #761

aptly had a concept of loading small amount of info per each object
into memory once collection is accessed for the first time.

This might have simplified some operations, but it doesn't scale well
with huge aptly databases.

This is just intermediate step towards better memory management -
list of objects is not loaded unless some method is called.
`ForEach` method (mainly used in cleanup) is reimplemented to
iterate over database without ever loading all the objects into memory.

Memory was even worse with previous approach, as for each item usually
`LoadComplete()` is called, which pulls even more data into memory
and item stays in memory till the end of the iteration as it is referenced
from `collection.list`.

For the subsequent PR: reimplement `ByUUID()` and probably other methods
to avoid loading all the items into memory, at least for all the collecitons
except for published repos. When published repository is being loaded, it
might pull source local repo which in turn would trigger loading for all the
local repos which is not acceptable.
2018-08-04 00:26:02 +03:00
Andrey Smirnov
747b9752ce Keep checksum of not compressed index file even if it's not uploaded
Fixes: #756
2018-07-14 00:17:36 +03:00
Oliver Sauder
b1a2523ef0 Add unit test for remote and http 2018-07-06 15:02:37 +02:00
Oliver Sauder
b7323db31b Add detached signature to installer hashsum file 2018-07-06 15:02:37 +02:00
Oliver Sauder
0075ead526 Simplify package function signature LinkFromPool 2018-07-06 15:02:37 +02:00
Oliver Sauder
6df4a746f1 Clarify doc strings 2018-07-06 15:02:37 +02:00
Oliver Sauder
108b0ea226 Add support to mirror non package installer files 2018-07-06 15:02:37 +02:00
aviau
7dfc12d138 switch to packaged lzma package 2018-06-22 12:44:23 -04:00
aviau
814ac6c28c dep: use official uuid package 2018-06-21 16:12:45 -04:00
Oliver Sauder
9509629bcf Add changes test to increase coverage 2018-06-19 15:40:38 +02:00
Oliver Sauder
f1882cfe2c Expose repo include through API 2018-06-19 15:39:09 +02:00
Strajan Sebastian Ioan
d31144b9ae Buffer increase (#738)
Increase Scanner buffer size for Stanza reader
2018-05-14 17:41:33 +03:00
Andrey Smirnov
b8c5303fdb Fix paths after repository transfer to aptly-dev 2018-04-18 21:19:43 +03:00
Andrey Smirnov
5b85522400 Implement 'legacy' Contents indexes to match Ubuntu <=16.04
Another index is created which unifies data for all the components.

This certainly requires more resources as we have to build yet another
index.
2018-04-11 00:57:15 +03:00
Harald Sitter
9125745416 batch updates to the temporary db when publishing
updates with contents generation were super syscall-heavy. for each path
in a package (so at least 2-4, but ordinarily >4) we'd do a db.Put in
ContentsIndex which results in one syscall.Write. so, for every package in
a published repo we'd have to do *at least* 2 but ordinarily >4 syscalls.
this gets abysmally slow very quickly depending on the available system
specs.

instead, start a batch inside each package and finish it when we are done
with the package. this should keep the memory footprint negligible, but
reduce the write() calls from N to 1.

on one of KDE's servers I have seen update publishing of 7600 packages go
from ~28s to ~9s when using batch putting on an HDD.
on my local system the same set of packages go from ~14s to ~6s on an SSD.
(all inodes in cache in both cases)
2018-02-26 16:19:15 +01:00
Harald Sitter
00bb0ca8f3 fix a serious file leak in the by-index publishing
the logic here was wrong.
if we managed to find the link target (the physical index file) pointed to
by our old symlink we want to remove it (this is basically "cleaning up old
index" logic).
previously we'd try to only delete it when the ReadLink came back with
error. which had two serious issues with it:

a) linkTarget was empty, so we basically called Remove("") which would
   delete the storage -> root <- directory if the root is a symlink!
b) we'd leak old indexes as the cleanup logic only ran if there was en
   error which would ordinarily never be

new code correctly cleans up unless there was an error.

this relates to a previous bugfix of readLink which incorrectly returned
absolute paths ultimately rendering the Remove call also broken.
2018-02-19 17:32:06 +01:00
Harald Sitter
2d0baef3b1 make code less repetitive and more readable
by using the power of variables!
2018-02-19 17:22:36 +01:00
Harald Sitter
75c4d6da3b properly expose AcquireByHash through the api
- new publish calls can now enable AcquireByHash by right away (previously
  one would have had to create a new publishing endpoint and then
  explicitly switch it to AcquireByHash)
- all json marshals of PublishedRepo now contain AcquireByHash (allows
  inspecting if a given endpoint has AcquireByHash enabled already; also
  enables verification that a switch/update actually applied a
  potential AcquireByHash change
- update all tests to reflect that default state of AcquireByHash
- update creation and switch testing to explicitly toggle AcquireByHash to
  make sure state mutation works as expected
2018-01-15 17:04:05 +01:00
Andrey Smirnov
9cb2a302f8 Merge pull request #683 from smira/545-download-contxt
Use Go context to abort gracefully mirror updates
2017-12-01 00:27:26 +03:00
Andrey Smirnov
d836334767 Merge pull request #682 from tirolerstefan/remove-buildinfo
#679: added *.buildinfo file to processedFile list (will be removed)
2017-12-01 00:23:49 +03:00
Oliver Sauder
b2bf4f7884 Adjust FileExists to differentiate between error and actual file existence 2017-11-30 09:46:02 +01:00
Oliver Sauder
3efa1052fa Implement FileExists in files storage as simple stat to improve performance 2017-11-30 09:46:02 +01:00
Oliver Sauder
2e488608ca Simplify packaging indexing by hash and stop when there is an error 2017-11-30 09:46:02 +01:00
Oliver Sauder
d6b4b795a5 Fix linting errors 2017-11-30 09:46:02 +01:00
Oliver Sauder
092a7ed8f3 Rename AccessByHash to AcquireByHash for consistency with other flags 2017-11-30 09:46:02 +01:00
Oliver Sauder
7498fd8fc8 Extend s3 storage with link and file exists methods 2017-11-30 09:46:02 +01:00
André Roth
e07912770e Extend PublishedStorage interface for Acquire-By-Hash
Signed-off-by: André Roth <neolynx@gmail.com>
2017-11-30 09:46:02 +01:00
André Roth
bb2db7e500 Support Acquire-By-Hash for index files
The added "aptly publish repo" option "-access-by-hash" publishes
the index files (Packages*, Sources*) also as hardlinked hashes.
Example:
 /dists/yakkety/main/binary-amd64/by-hash/SHA512/31833ec39acc...
The Release files indicate this with the option "Acquire-By-Hash: yes"

This is used by apt >= 1.2.0 and prevents the "Hash sum mismatch" race
condition between a server side "aptly publish repo" and "apt-get update"
on a client.
See: http://www.chiark.greenend.org.uk/~cjwatson/blog/no-more-hash-sum-mismatch-errors.html

This implementation uses symlinks in the by-hash/*/ directory for keeping
only two versions of the index files and deleting older files
automatically.

Note: this only works with aptly.FileSystemPublishedStorage

Closes: #536

Signed-off-by: André Roth <neolynx@gmail.com>
2017-11-30 09:46:02 +01:00
Stefan
c94e048198 Merge branch 'master' into remove-buildinfo 2017-11-30 06:34:50 +01:00
Stefan Felkel
3b4c06d28d gofmt 2017-11-30 06:31:50 +01:00
Andrey Smirnov
15618c8ea8 Use Go context to abort gracefully mirror updates
There are two fixes here:

1. Abort package download immediately as ^C is pressed.
2. Import all the already downloaded files into package pool,
so that next time mirror is updated, aptly won't download them
once again.
2017-11-30 00:49:37 +03:00
Oliver Sauder
5d301fb1b7 Prepare archive root when editing it 2017-11-27 11:08:31 +01:00
Stefan Felkel
8a4d866810 #679: added *.buildinfo file to processedFile list (will be removed, afterwards) 2017-11-24 14:23:26 +01:00
Felix Abecassis
e682639b20 Handle SHA512 in Release files
Fix: #656
2017-11-08 11:54:19 +03:00
Harald Sitter
1885cbd6a2 make deb reader handle new control.tar options introduced in dpkg 1.17.6
newly supported is uncompressed control.tar and xz compressed
control.tar.xz. latter is used by ubuntu for dbgsym ddebs.

Fixes #655
2017-10-31 14:43:00 +01:00
Harald Sitter
46c2182ade fix linting by using new maligned linter instead of aligncheck
upstream switched the alignment check backend and in doing so fails to run
if the old backend is defined in the config.

also skip alignment linting on a struct we use for byte decoding as we have
no choice in its member order.
2017-10-31 12:24:31 +01:00
Andrey Smirnov
0d94f29c27 Allow using files from the pool while importing source packages
Sometimes source packages reference files already present in the pool.

Allow for those file to be omitted when importing packages either via
`repo add` or `repo include`. If file is missing, aptly would make
an attempt to look up file in the package pool (by checksum) and
use it.

Fixes: #278
2017-09-29 22:39:51 +03:00
Andrey Smirnov
b4deedda01 Merge branch 'master' into skipCleanup 2017-09-27 00:14:24 +03:00
Andrey Smirnov
e5198178a5 Fix incomplete dependencies with follow-all-variants
When `-dep-follow-all-variants` option is enabled, dependency resolving
process shouldn't stop even if dependency is already satisfied - there
mgiht be other ways to satisfy dependency.

Also fix issue with parsing multiarch specs like
`python:any`.
2017-09-26 00:09:15 +03:00
Andrey Smirnov
6d2f265980 Prefer exact match on package name over provides match
When searching for packages which might satisfy given dependency,
aptly was first returning packages which `Provides` mentioned
name. By default aptly is picking up only first match (unless
follow all variants options is enabled), so `Provides:` takes
precedence over exact package name match.

Invert this logic by searching first for package name match.
2017-09-25 18:24:45 +03:00
Andrey Smirnov
31f4af5722 Merge branch 'master' into skipCleanup 2017-09-20 17:53:24 +03:00
Andrey Smirnov
9ca81ff3bc Fix lint warning & add Go 1.9 to the mix 2017-09-15 22:54:39 +03:00
Andrey Smirnov
a27b489ba2 Allow package queries to return duplicate entries on PackageCollection
Fixes #446
2017-08-17 00:40:34 +03:00
Ludovico Cavedon
d6a3917141 Add -skip-cleanup option for publish commands.
Allow skipping unreferenced files cleanup on publish switch/update/drop
via the -skip-cleanup command line option.
Also support API SkipCleanup parameter.

Fixes #570.
2017-08-15 19:08:17 -07:00
Andrey Smirnov
a584b2e058 Fix bug with PoolPath field being overwritten on mirror update
While updating mirror, if package file is already in pool path,
field `PoolPath` was left as empty which results in package file
being unavailable later on while publishing.
2017-08-11 20:05:55 +03:00
Andrey Smirnov
84ef963d7d Trim slashes while parsing publish prefix
Fixes: #607
2017-08-09 01:26:47 +03:00