Commit Graph

429 Commits

Author SHA1 Message Date
Andrey Smirnov 67e38955ae Refactor database code to support standalone batches, transactions.
This is spin-off of changes from #459.

Transactions are not being used yet, but batches are updated to work
with the new API.

`database/` package was refactored to split abstract interfaces and
implementation via goleveldb. This should make it easier to implement
new database types.
2019-08-09 00:46:40 +03:00
Andrey Smirnov f0a370db24 Rework HTTP downloader retry logic
Apply retries as global, config-level option `downloadRetries` so that
it can be applied to any aptly command which downloads objects.

Unwrap `errors.Wrap` which is used in downloader.

Unwrap `*url.Error` which should be the actual error returned from the
HTTP client, catch more cases, be more specific around failures.
2019-08-07 20:23:05 +03:00
Shengjing Zhu 906cbf1e6f Fix time.Time msgpack decoding backwards compatibility
See https://github.com/ugorji/go-codec/issues/269
2019-07-15 21:51:09 +03:00
Shengjing Zhu 5aefc741f2 Add codec tag to fields which are ignored in new codec package
github.com/ugorji/go/codec 1.1.4 ignores field with json:"-" tag
2019-07-15 21:51:09 +03:00
Andrey Smirnov 2c91bcdc30 Bump Go versions for Travis, fix tests
Replace gometalinter with golangci-lint.

Fix system tests (wheezy is gone, replace with stretch).

Fix linter warnings.
2019-07-04 00:16:12 +03:00
Andrey Smirnov 89537b1521 Merge branch 'master' into deterministic-stanza-WriteTo 2019-01-25 01:27:31 +03:00
Andrey Smirnov f104e53fd4 Ignore 'NoSuchBucket' error when deleting S3 objects
Also ignore any removal errors when `-force-drop` is used.
2019-01-23 18:17:08 +03:00
William Manley fd99ae0e59 Merge branch 'master' into deterministic-stanza-WriteTo 2019-01-21 13:48:07 +00:00
Andrey Smirnov 3b5840e248 Fix linter list and fix errors discovered by new staticcheck 2019-01-20 00:01:17 +03:00
William Manley 86dc10028f Stanza.WriteTo: Sort extra fields alphabetically
This makes the output deterministic.  This is important to me as I am
using `Packages` index files as a kind of lockfile and committing it
to my git repository.  Without this we get a lot of noise in the diff
whenever the file is regenerated because
[go randomises map iteration order][1].

[1]: https://nathanleclaire.com/blog/2014/04/27/a-surprising-feature-of-golang-that-colored-me-impressed/
2019-01-08 15:12:34 +00:00
Oliver Sauder e23e30eb44 Merge branch 'master' into with_installer 2018-09-21 13:26:15 +02:00
Andrey Smirnov 699323e2e0 Reimplement DB collections for mirrors, repos and snapshots
See #765, #761

Collections were relying on keeping in-memory list of all the objects
for any kind of operation which doesn't scale well the number of
objects in the database.

With this rewrite, objects are loaded only on demand which might
be pessimization in some edge cases but should improve performance
and memory footprint signifcantly.
2018-08-21 01:08:14 +03:00
Andrey Smirnov de38011dd2 Add simple benchmark for SnapshotCollection.ForEach() 2018-08-14 00:56:15 +03:00
Andrey Smirnov 0f4bbc4752 Implement lazy iteration (ForEach) over collections
See #761

aptly had a concept of loading small amount of info per each object
into memory once collection is accessed for the first time.

This might have simplified some operations, but it doesn't scale well
with huge aptly databases.

This is just intermediate step towards better memory management -
list of objects is not loaded unless some method is called.
`ForEach` method (mainly used in cleanup) is reimplemented to
iterate over database without ever loading all the objects into memory.

Memory was even worse with previous approach, as for each item usually
`LoadComplete()` is called, which pulls even more data into memory
and item stays in memory till the end of the iteration as it is referenced
from `collection.list`.

For the subsequent PR: reimplement `ByUUID()` and probably other methods
to avoid loading all the items into memory, at least for all the collecitons
except for published repos. When published repository is being loaded, it
might pull source local repo which in turn would trigger loading for all the
local repos which is not acceptable.
2018-08-04 00:26:02 +03:00
Andrey Smirnov 747b9752ce Keep checksum of not compressed index file even if it's not uploaded
Fixes: #756
2018-07-14 00:17:36 +03:00
Oliver Sauder b1a2523ef0 Add unit test for remote and http 2018-07-06 15:02:37 +02:00
Oliver Sauder b7323db31b Add detached signature to installer hashsum file 2018-07-06 15:02:37 +02:00
Oliver Sauder 0075ead526 Simplify package function signature LinkFromPool 2018-07-06 15:02:37 +02:00
Oliver Sauder 6df4a746f1 Clarify doc strings 2018-07-06 15:02:37 +02:00
Oliver Sauder 108b0ea226 Add support to mirror non package installer files 2018-07-06 15:02:37 +02:00
aviau 7dfc12d138 switch to packaged lzma package 2018-06-22 12:44:23 -04:00
aviau 814ac6c28c dep: use official uuid package 2018-06-21 16:12:45 -04:00
Oliver Sauder 9509629bcf Add changes test to increase coverage 2018-06-19 15:40:38 +02:00
Oliver Sauder f1882cfe2c Expose repo include through API 2018-06-19 15:39:09 +02:00
Strajan Sebastian Ioan d31144b9ae Buffer increase (#738)
Increase Scanner buffer size for Stanza reader
2018-05-14 17:41:33 +03:00
Andrey Smirnov b8c5303fdb Fix paths after repository transfer to aptly-dev 2018-04-18 21:19:43 +03:00
Andrey Smirnov 5b85522400 Implement 'legacy' Contents indexes to match Ubuntu <=16.04
Another index is created which unifies data for all the components.

This certainly requires more resources as we have to build yet another
index.
2018-04-11 00:57:15 +03:00
Harald Sitter 9125745416 batch updates to the temporary db when publishing
updates with contents generation were super syscall-heavy. for each path
in a package (so at least 2-4, but ordinarily >4) we'd do a db.Put in
ContentsIndex which results in one syscall.Write. so, for every package in
a published repo we'd have to do *at least* 2 but ordinarily >4 syscalls.
this gets abysmally slow very quickly depending on the available system
specs.

instead, start a batch inside each package and finish it when we are done
with the package. this should keep the memory footprint negligible, but
reduce the write() calls from N to 1.

on one of KDE's servers I have seen update publishing of 7600 packages go
from ~28s to ~9s when using batch putting on an HDD.
on my local system the same set of packages go from ~14s to ~6s on an SSD.
(all inodes in cache in both cases)
2018-02-26 16:19:15 +01:00
Harald Sitter 00bb0ca8f3 fix a serious file leak in the by-index publishing
the logic here was wrong.
if we managed to find the link target (the physical index file) pointed to
by our old symlink we want to remove it (this is basically "cleaning up old
index" logic).
previously we'd try to only delete it when the ReadLink came back with
error. which had two serious issues with it:

a) linkTarget was empty, so we basically called Remove("") which would
   delete the storage -> root <- directory if the root is a symlink!
b) we'd leak old indexes as the cleanup logic only ran if there was en
   error which would ordinarily never be

new code correctly cleans up unless there was an error.

this relates to a previous bugfix of readLink which incorrectly returned
absolute paths ultimately rendering the Remove call also broken.
2018-02-19 17:32:06 +01:00
Harald Sitter 2d0baef3b1 make code less repetitive and more readable
by using the power of variables!
2018-02-19 17:22:36 +01:00
Harald Sitter 75c4d6da3b properly expose AcquireByHash through the api
- new publish calls can now enable AcquireByHash by right away (previously
  one would have had to create a new publishing endpoint and then
  explicitly switch it to AcquireByHash)
- all json marshals of PublishedRepo now contain AcquireByHash (allows
  inspecting if a given endpoint has AcquireByHash enabled already; also
  enables verification that a switch/update actually applied a
  potential AcquireByHash change
- update all tests to reflect that default state of AcquireByHash
- update creation and switch testing to explicitly toggle AcquireByHash to
  make sure state mutation works as expected
2018-01-15 17:04:05 +01:00
Andrey Smirnov 9cb2a302f8 Merge pull request #683 from smira/545-download-contxt
Use Go context to abort gracefully mirror updates
2017-12-01 00:27:26 +03:00
Andrey Smirnov d836334767 Merge pull request #682 from tirolerstefan/remove-buildinfo
#679: added *.buildinfo file to processedFile list (will be removed)
2017-12-01 00:23:49 +03:00
Oliver Sauder b2bf4f7884 Adjust FileExists to differentiate between error and actual file existence 2017-11-30 09:46:02 +01:00
Oliver Sauder 3efa1052fa Implement FileExists in files storage as simple stat to improve performance 2017-11-30 09:46:02 +01:00
Oliver Sauder 2e488608ca Simplify packaging indexing by hash and stop when there is an error 2017-11-30 09:46:02 +01:00
Oliver Sauder d6b4b795a5 Fix linting errors 2017-11-30 09:46:02 +01:00
Oliver Sauder 092a7ed8f3 Rename AccessByHash to AcquireByHash for consistency with other flags 2017-11-30 09:46:02 +01:00
Oliver Sauder 7498fd8fc8 Extend s3 storage with link and file exists methods 2017-11-30 09:46:02 +01:00
André Roth e07912770e Extend PublishedStorage interface for Acquire-By-Hash
Signed-off-by: André Roth <neolynx@gmail.com>
2017-11-30 09:46:02 +01:00
André Roth bb2db7e500 Support Acquire-By-Hash for index files
The added "aptly publish repo" option "-access-by-hash" publishes
the index files (Packages*, Sources*) also as hardlinked hashes.
Example:
 /dists/yakkety/main/binary-amd64/by-hash/SHA512/31833ec39acc...
The Release files indicate this with the option "Acquire-By-Hash: yes"

This is used by apt >= 1.2.0 and prevents the "Hash sum mismatch" race
condition between a server side "aptly publish repo" and "apt-get update"
on a client.
See: http://www.chiark.greenend.org.uk/~cjwatson/blog/no-more-hash-sum-mismatch-errors.html

This implementation uses symlinks in the by-hash/*/ directory for keeping
only two versions of the index files and deleting older files
automatically.

Note: this only works with aptly.FileSystemPublishedStorage

Closes: #536

Signed-off-by: André Roth <neolynx@gmail.com>
2017-11-30 09:46:02 +01:00
Stefan c94e048198 Merge branch 'master' into remove-buildinfo 2017-11-30 06:34:50 +01:00
Stefan Felkel 3b4c06d28d gofmt 2017-11-30 06:31:50 +01:00
Andrey Smirnov 15618c8ea8 Use Go context to abort gracefully mirror updates
There are two fixes here:

1. Abort package download immediately as ^C is pressed.
2. Import all the already downloaded files into package pool,
so that next time mirror is updated, aptly won't download them
once again.
2017-11-30 00:49:37 +03:00
Oliver Sauder 5d301fb1b7 Prepare archive root when editing it 2017-11-27 11:08:31 +01:00
Stefan Felkel 8a4d866810 #679: added *.buildinfo file to processedFile list (will be removed, afterwards) 2017-11-24 14:23:26 +01:00
Felix Abecassis e682639b20 Handle SHA512 in Release files
Fix: #656
2017-11-08 11:54:19 +03:00
Harald Sitter 1885cbd6a2 make deb reader handle new control.tar options introduced in dpkg 1.17.6
newly supported is uncompressed control.tar and xz compressed
control.tar.xz. latter is used by ubuntu for dbgsym ddebs.

Fixes #655
2017-10-31 14:43:00 +01:00
Harald Sitter 46c2182ade fix linting by using new maligned linter instead of aligncheck
upstream switched the alignment check backend and in doing so fails to run
if the old backend is defined in the config.

also skip alignment linting on a struct we use for byte decoding as we have
no choice in its member order.
2017-10-31 12:24:31 +01:00
Andrey Smirnov 0d94f29c27 Allow using files from the pool while importing source packages
Sometimes source packages reference files already present in the pool.

Allow for those file to be omitted when importing packages either via
`repo add` or `repo include`. If file is missing, aptly would make
an attempt to look up file in the package pool (by checksum) and
use it.

Fixes: #278
2017-09-29 22:39:51 +03:00