Commit Graph

48 Commits

Author SHA1 Message Date
Ryan Gonzalez 19a705f80d Split reflists to share their contents across snapshots
In current aptly, each repository and snapshot has its own reflist in
the database. This brings a few problems with it:

- Given a sufficiently large repositories and snapshots, these lists can
  get enormous, reaching >1MB. This is a problem for LevelDB's overall
  performance, as it tends to prefer values around the confiruged block
  size (defaults to just 4KiB).
- When you take these large repositories and snapshot them, you have a
  full, new copy of the reflist, even if only a few packages changed.
  This means that having a lot of snapshots with a few changes causes
  the database to basically be full of largely duplicate reflists.
- All the duplication also means that many of the same refs are being
  loaded repeatedly, which can cause some slowdown but, more notably,
  eats up huge amounts of memory.
- Adding on more and more new repositories and snapshots will cause the
  time and memory spent on things like cleanup and publishing to grow
  roughly linearly.

At the core, there are two problems here:

- Reflists get very big because there are just a lot of packages.
- Different reflists can tend to duplicate much of the same contents.

*Split reflists* aim at solving this by separating reflists into 64
*buckets*. Package refs are sorted into individual buckets according to
the following system:

- Take the first 3 letters of the package name, after dropping a `lib`
  prefix. (Using only the first 3 letters will cause packages with
  similar prefixes to end up in the same bucket, under the assumption
  that packages with similar names tend to be updated together.)
- Take the 64-bit xxhash of these letters. (xxhash was chosen because it
  relatively good distribution across the individual bits, which is
  important for the next step.)
- Use the first 6 bits of the hash (range [0:63]) as an index into the
  buckets.

Once refs are placed in buckets, a sha256 digest of all the refs in the
bucket is taken. These buckets are then stored in the database, split
into roughly block-sized segments, and all the repositories and
snapshots simply store an array of bucket digests.

This approach means that *repositories and snapshots can share their
reflist buckets*. If a snapshot is taken of a repository, it will have
the same contents, so its split reflist will point to the same buckets
as the base repository, and only one copy of each bucket is stored in
the database. When some packages in the repository change, only the
buckets containing those packages will be modified; all the other
buckets will remain unchanged, and thus their contents will still be
shared. Later on, when these reflists are loaded, each bucket is only
loaded once, short-cutting loaded many megabytes of data. In effect,
split reflists are essentially copy-on-write, with only the changed
buckets stored individually.

Changing the disk format means that a migration needs to take place, so
that task is moved into the database cleanup step, which will migrate
reflists over to split reflists, as well as delete any unused reflist
buckets.

All the reflist tests are also changed to additionally test out split
reflists; although the internal logic is all shared (since buckets are,
themselves, just normal reflists), some special additions are needed to
have native versions of the various reflist helper methods.

In our tests, we've observed the following improvements:

- Memory usage during publish and database cleanup, with
  `GOMEMLIMIT=2GiB`, goes down from ~3.2GiB (larger than the memory
  limit!) to ~0.7GiB, a decrease of ~4.5x.
- Database size decreases from 1.3GB to 367MB.

*In my local tests*, publish times had also decreased down to mere
seconds but the same effect wasn't observed on the server, with the
times staying around the same. My suspicions are that this is due to I/O
performance: my local system is an M1 MBP, which almost certainly has
much faster disk speeds than our DigitalOcean block volumes. Split
reflists include a side effect of requiring more random accesses from
reading all the buckets by their keys, so if your random I/O
performance is slower, it might cancel out the benefits. That being
said, even in that case, the memory usage and database size advantages
still persist.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2025-02-15 23:49:21 +01:00
Mauro Regli 1357d246d8 rename addon files to skel files 2024-11-17 14:09:37 +01:00
Mauro Regli c75c2c7594 pass down addonpath from api and cmd context 2024-11-17 14:09:37 +01:00
Christoph Fiehe bd64232eb6 Allow management of components
This commit allows to add, remove and update components of published repositories without the need to recreate them.

Signed-off-by: Christoph Fiehe <c.fiehe@eurodata.de>
2024-10-22 16:58:15 +02:00
André Roth 861260198a publish: persist multidist flag 2024-10-08 22:28:12 +02:00
Noa Resare b4cd86aa14 Introduce option multi-dist to the publish commands
This change makes it possible to publish multiple distributions
with packages named the same but with different content by changing
structure of the generated pool hierarchy. The option not enabled
by default as this changes the structure of the output which could
break the expectations of other tools.
2024-06-15 11:27:26 +02:00
André Roth e61a4dd53c fix golangci-lint errors 2024-02-06 20:49:35 +01:00
Steven Stone a59cad6f20 Enable the ability to pass in a custom codename
While testing out Aptly, the `apt-get` client complains with the following error, since the `codename` was switched from the InRelease files that are baked out by Aptly:

```
E: Repository 'http://debianrepo.example.com/bionic testing InRelease' changed its 'Codename' value from '' to 'testing'
```
2022-08-29 15:54:29 +02:00
Sjoerd Simons f61514edaf Allow disabling bzip2 compression for index files
Using bzip2 generates smaller index files (roughly 20% smaller Packages
files) but it comes with a big performance penalty.  When publishing a
debian mirror snapshot (amd64, arm64, armhf, source) without contents
skipping bzip speeds things up around 1.8 times.

```
$ hyperfine -w 1 -L skip-bz2 true,false  -m 3 -p "aptly -config aptly.conf publish drop bullseye || true" "aptly -config aptly.conf  publish snapshot  --skip-bz2={skip-bz2} --skip-contents --skip-signing bullseye"
Benchmark 1: aptly -config aptly.conf  publish snapshot  --skip-bz2=true --skip-contents --skip-signing bullseye
  Time (mean ± σ):     35.567 s ±  0.307 s    [User: 39.366 s, System: 10.075 s]
  Range (min … max):   35.311 s … 35.907 s    3 runs

Benchmark 2: aptly -config aptly.conf  publish snapshot  --skip-bz2=false --skip-contents --skip-signing bullseye
  Time (mean ± σ):     64.740 s ±  0.135 s    [User: 68.565 s, System: 10.129 s]
  Range (min … max):   64.596 s … 64.862 s    3 runs

Summary
  'aptly -config aptly.conf  publish snapshot  --skip-bz2=true --skip-contents --skip-signing bullseye' ran
    1.82 ± 0.02 times faster than 'aptly -config aptly.conf  publish snapshot  --skip-bz2=false --skip-contents --skip-signing bullseye'
```

Allow skipping bz2 creation for setups where faster publishing is more
important then Package file size.

Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
2022-06-22 11:25:45 +02:00
Oliver Sauder 208a2151c1 every go routine needs to have its own collection factory
this is needed so concurrent reads and writes are possible.
2022-01-27 09:30:14 +01:00
Raúl Benencia ae61cbb4c0 Allow definition of custom Suite 2019-09-06 23:42:56 +03:00
Stephan Eicher aa02c5cbe9 Fix #827 - passhprase typos 2019-09-02 23:26:37 +03:00
Andrey Smirnov b8c5303fdb Fix paths after repository transfer to aptly-dev 2018-04-18 21:19:43 +03:00
Oliver Sauder 092a7ed8f3 Rename AccessByHash to AcquireByHash for consistency with other flags 2017-11-30 09:46:02 +01:00
André Roth bb2db7e500 Support Acquire-By-Hash for index files
The added "aptly publish repo" option "-access-by-hash" publishes
the index files (Packages*, Sources*) also as hardlinked hashes.
Example:
 /dists/yakkety/main/binary-amd64/by-hash/SHA512/31833ec39acc...
The Release files indicate this with the option "Acquire-By-Hash: yes"

This is used by apt >= 1.2.0 and prevents the "Hash sum mismatch" race
condition between a server side "aptly publish repo" and "apt-get update"
on a client.
See: http://www.chiark.greenend.org.uk/~cjwatson/blog/no-more-hash-sum-mismatch-errors.html

This implementation uses symlinks in the by-hash/*/ directory for keeping
only two versions of the index files and deleting older files
automatically.

Note: this only works with aptly.FileSystemPublishedStorage

Closes: #536

Signed-off-by: André Roth <neolynx@gmail.com>
2017-11-30 09:46:02 +01:00
Oliver Sauder e3f1880ad4 Added support for NotAutomatic, ButAutomaticUpgrades and Origin fields 2017-07-05 15:08:02 +02:00
Andrey Smirnov 470165a419 Enable goconst & interfacer linters 2017-05-17 00:53:10 +03:00
Clemens Rabe 25f9c29f00 Implemented filesystem endpoint with support for hardlinks, symlinks and copy. 2017-04-13 20:25:40 +02:00
Andrey Smirnov 516dd7b044 Switch to gometalinter
Only small amount of required checks is enabled,
plan is to enable more linters as issues are fixed in the code.
2017-03-23 01:51:08 +03:00
Andrey Smirnov f50e008763 Make 'SkipContents' configurable in API. #345
Also add global configuration to disable 'skipContents' by
default for all new published repos/snapshots.
2016-02-14 14:49:16 +03:00
Andrey Smirnov 933b019f71 Fix -skip-contents + system tests. #142 2015-04-05 21:55:41 +03:00
Andrey Smirnov 6293ca3206 Add -skip-contents flag. #142 2015-04-05 21:27:35 +03:00
Andrey Smirnov d586f31247 Move ParsePrefix into common code. #116 2014-12-23 00:50:28 +03:00
Andrey Smirnov a85aa11ecd Update flag description/include it everywhere. #122 2014-10-10 17:50:08 +04:00
Andrey Smirnov 8a787d2c35 Refactor by separating AptlyContext into separate package. #116 2014-10-06 21:54:15 +04:00
Andrey Smirnov 97158ef37b Support for --passphrase & --passphrase-file arguments on publishing. #94 2014-09-01 15:13:07 +04:00
Andrey Smirnov bb6593d21e Add -force-overwrite flag to publish update, switch, snapshot and repo commands. #90
Includes new and updated system tests.
2014-08-05 17:01:18 +04:00
Andrey Smirnov d558791070 Add -force-overwrite command flag. #90 2014-08-05 15:47:38 +04:00
Andrey Smirnov bf91744078 <endpoint> in command usage. #15 2014-07-28 15:03:55 +04:00
Andrey Smirnov 915b0d1697 Integrate PublishedRepos with storages & context. #15 2014-07-21 17:43:12 +04:00
Andrey Smirnov 96e878a2e0 Separate out LocalPublishedStorage interface. #15 2014-07-18 17:44:54 +04:00
Andrey Smirnov 2dae9b01a1 Grammar fix. #36 2014-06-06 02:09:00 +04:00
Andrey Smirnov 9a34b4ff1f Update commands to handle multiple components repositories. #36 2014-06-04 17:43:16 +04:00
Andrey Smirnov 7192049c16 Update to new PublishedRepo with multiple components. #36
Multiple component publishing doesn't work yet, but old features are working.
2014-06-03 17:09:00 +04:00
Andrey Smirnov e1dbab6988 Allow publishing of empty snapshots and local repos. #55 2014-05-31 21:13:30 +04:00
Andrey Smirnov 05a42f4cba aptly exits with 2 on command/flag parse error. #52 2014-05-16 00:22:51 +04:00
Andrey Smirnov b85f46547b Allow to customize Origin/Label during publishing. #29 2014-04-15 11:47:21 +04:00
Andrey Smirnov 470571c7db Fix variable shadowing. 2014-04-08 11:53:02 +04:00
Andrey Smirnov ff045f9a48 Fixups after renaming debian -> deb. #21 2014-04-07 21:22:58 +04:00
Andrey Smirnov 2c3553ef0b Major refactoring: access to context happens in methods. #13 2014-04-05 16:10:51 +04:00
Andrey Smirnov f648c9547c Support for switching to smira/commander with free placement of flags. #17 2014-04-03 00:16:18 +04:00
Andrey Smirnov d84226a054 Switch to own fork of commander/flag. 2014-03-28 23:05:54 +04:00
Andrey Smirnov 37ea845fd9 Command aptly publish repo. #10 2014-03-25 18:42:56 +04:00
Andrey Smirnov 32717e92ba First round of support for localRepos as source for publishing. Also more intelligent algo to get publishing defaults. #10 #12 2014-03-19 16:43:42 +04:00
Andrey Smirnov 4c81f0f52a Update integrated help. 2014-03-10 19:42:27 +04:00
Andrey Smirnov c55733fc05 Progress during publishing. 2014-03-07 17:24:45 +04:00
Andrey Smirnov 91a45a2fde Display deb-src line if snapshot contains sources. 2014-02-27 21:10:52 +04:00
Andrey Smirnov e72f178ca9 Split publish into subcommands. 2014-02-19 15:12:16 +04:00