Commit Graph

2547 Commits

Author SHA1 Message Date
André Roth
2a07494910 fix unit tests 2025-02-15 23:49:21 +01:00
André Roth
174bdc2b5e fix golangci-lint errors 2025-02-15 23:49:21 +01:00
André Roth
1a3346a8fa fix golangci-lint error 2025-02-15 23:49:21 +01:00
Ryan Gonzalez
19a705f80d Split reflists to share their contents across snapshots
In current aptly, each repository and snapshot has its own reflist in
the database. This brings a few problems with it:

- Given a sufficiently large repositories and snapshots, these lists can
  get enormous, reaching >1MB. This is a problem for LevelDB's overall
  performance, as it tends to prefer values around the confiruged block
  size (defaults to just 4KiB).
- When you take these large repositories and snapshot them, you have a
  full, new copy of the reflist, even if only a few packages changed.
  This means that having a lot of snapshots with a few changes causes
  the database to basically be full of largely duplicate reflists.
- All the duplication also means that many of the same refs are being
  loaded repeatedly, which can cause some slowdown but, more notably,
  eats up huge amounts of memory.
- Adding on more and more new repositories and snapshots will cause the
  time and memory spent on things like cleanup and publishing to grow
  roughly linearly.

At the core, there are two problems here:

- Reflists get very big because there are just a lot of packages.
- Different reflists can tend to duplicate much of the same contents.

*Split reflists* aim at solving this by separating reflists into 64
*buckets*. Package refs are sorted into individual buckets according to
the following system:

- Take the first 3 letters of the package name, after dropping a `lib`
  prefix. (Using only the first 3 letters will cause packages with
  similar prefixes to end up in the same bucket, under the assumption
  that packages with similar names tend to be updated together.)
- Take the 64-bit xxhash of these letters. (xxhash was chosen because it
  relatively good distribution across the individual bits, which is
  important for the next step.)
- Use the first 6 bits of the hash (range [0:63]) as an index into the
  buckets.

Once refs are placed in buckets, a sha256 digest of all the refs in the
bucket is taken. These buckets are then stored in the database, split
into roughly block-sized segments, and all the repositories and
snapshots simply store an array of bucket digests.

This approach means that *repositories and snapshots can share their
reflist buckets*. If a snapshot is taken of a repository, it will have
the same contents, so its split reflist will point to the same buckets
as the base repository, and only one copy of each bucket is stored in
the database. When some packages in the repository change, only the
buckets containing those packages will be modified; all the other
buckets will remain unchanged, and thus their contents will still be
shared. Later on, when these reflists are loaded, each bucket is only
loaded once, short-cutting loaded many megabytes of data. In effect,
split reflists are essentially copy-on-write, with only the changed
buckets stored individually.

Changing the disk format means that a migration needs to take place, so
that task is moved into the database cleanup step, which will migrate
reflists over to split reflists, as well as delete any unused reflist
buckets.

All the reflist tests are also changed to additionally test out split
reflists; although the internal logic is all shared (since buckets are,
themselves, just normal reflists), some special additions are needed to
have native versions of the various reflist helper methods.

In our tests, we've observed the following improvements:

- Memory usage during publish and database cleanup, with
  `GOMEMLIMIT=2GiB`, goes down from ~3.2GiB (larger than the memory
  limit!) to ~0.7GiB, a decrease of ~4.5x.
- Database size decreases from 1.3GB to 367MB.

*In my local tests*, publish times had also decreased down to mere
seconds but the same effect wasn't observed on the server, with the
times staying around the same. My suspicions are that this is due to I/O
performance: my local system is an M1 MBP, which almost certainly has
much faster disk speeds than our DigitalOcean block volumes. Split
reflists include a side effect of requiring more random accesses from
reading all the buckets by their keys, so if your random I/O
performance is slower, it might cancel out the benefits. That being
said, even in that case, the memory usage and database size advantages
still persist.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2025-02-15 23:49:21 +01:00
Ryan Gonzalez
4be09fd407 Use github.com/saracen/walker for file walk operations
In some local tests w/ a slowed down filesystem, this massively cut down
on the time to clean up a repository by ~3x, bringing a total 'publish
update' time from ~16s to ~13s.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
2025-02-15 23:49:21 +01:00
André Roth
ab18da351d ci: add release notes
and update Releasing.md
v1.6.1
2025-02-15 22:25:56 +01:00
André Roth
1abb735bfa Merge pull request #1430 from aptly-dev/release/1.6.1
Release/1.6.1
2025-02-15 19:10:42 +01:00
André Roth
9397d8ab36 add releasing doc 2025-02-15 16:23:53 +01:00
André Roth
82300d6944 update changelog 2025-02-15 16:17:37 +01:00
André Roth
cf3841e35c Merge pull request #1425 from aptly-dev/fix/debian-compliance
postrm: remove aptly-api user and home directory on purge
2025-01-24 00:49:15 +01:00
Sébastien Delafond
1a0bffdc51 postrm: remove aptly-api user and home directory on purge 2025-01-22 21:48:02 +01:00
André Roth
666b5c9700 Merge pull request #1422 from aptly-dev/fix/empty-mirror-snapshot
Allow snapshotting empty mirrors
2025-01-13 12:36:01 +01:00
André Roth
2eabc6045f go mod tidy 2025-01-12 00:05:00 +01:00
André Roth
cc32e79f2a Merge pull request #1423 from mikelolasagasti/google-uuid
Switch to google/uuid module
2025-01-11 23:56:23 +01:00
Mikel Olasagasti Uranga
7074fc8856 Switch to google/uuid module
Current used github.com/pborman/uuid hasn't seen any updates in years.

Signed-off-by: Mikel Olasagasti Uranga <mikel@olasagasti.info>
2025-01-11 23:18:50 +01:00
André Roth
a7d85e5905 Merge pull request #1187 from aptly-dev/dependabot/go_modules/github.com/gin-gonic/gin-1.9.1
Bump github.com/gin-gonic/gin from 1.7.7 to 1.9.1
2025-01-11 22:15:59 +01:00
André Roth
cad4233d0d Bump github.com/gin-gonic/gin from 1.7.7 to 1.9.1
Bumps [github.com/gin-gonic/gin](https://github.com/gin-gonic/gin) from 1.7.7 to 1.9.1.
- [Release notes](https://github.com/gin-gonic/gin/releases)
- [Changelog](https://github.com/gin-gonic/gin/blob/master/CHANGELOG.md)
- [Commits](https://github.com/gin-gonic/gin/compare/v1.7.7...v1.9.1)

---
updated-dependencies:
- dependency-name: github.com/gin-gonic/gin
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
# Conflicts:
#	go.mod
#	go.sum
2025-01-11 21:48:14 +01:00
André Roth
9b9894c07d update README 2025-01-11 21:33:40 +01:00
André Roth
8546cf31ce add test: snapshot empty mirror 2025-01-11 20:00:42 +01:00
André Roth
aa0830ff0c Revert "fix empty mirror check"
This reverts commit 09a44ba409.
2025-01-11 19:17:28 +01:00
dependabot[bot]
4076941bd7 Bump golang.org/x/net from 0.28.0 to 0.33.0
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.28.0 to 0.33.0.
- [Commits](https://github.com/golang/net/compare/v0.28.0...v0.33.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-11 15:58:10 +01:00
André Roth
4170c9e995 update README 2025-01-11 15:58:10 +01:00
André Roth
a862192bc4 ci: more relaxed aptly upload 2025-01-11 15:58:10 +01:00
André Roth
5a18428666 aptly.conf: fix s3 example 2025-01-11 15:25:53 +01:00
August Feng
0b5a627c84 update goleveldb dependency 2025-01-11 14:35:28 +01:00
André Roth
65820cdf7a update man page 2024-12-24 19:02:38 +01:00
André Roth
2c3a107e00 update changelog 2024-12-24 18:57:40 +01:00
André Roth
e028db585f fix man page 2024-12-21 22:32:50 +01:00
André Roth
d523ca8186 update Makefile PHONY 2024-12-21 22:13:26 +01:00
André Roth
f008f245dc update man page 2024-12-21 21:35:06 +01:00
André Roth
f2f3196368 fix AUTHORS for man page
only US ASCII seems to be supported
2024-12-21 21:34:46 +01:00
Karol Swiderski
29eccc9226 improve doc
add instructions for macos users
2024-12-21 21:29:26 +01:00
André Roth
9abbd74a9f improve doc
do not set default value for FromSnapshot when creating a repo
2024-12-21 20:23:52 +01:00
André Roth
846fe5e08a update changelog 2024-12-21 19:41:59 +01:00
André Roth
da29961052 Revert "debian: do not conflict with gnupg1"
This reverts commit 2f540a8026.
2024-12-21 18:55:49 +01:00
André Roth
e5b8315859 Merge pull request #1411 from schoenherrg/feature/filter-using-file
Feature: Support Reading Filter Expressions from a File
2024-12-21 18:54:44 +01:00
André Roth
c6bb5f76f7 cmd filter: add comment and cleanup 2024-12-21 11:37:15 +01:00
André Roth
fea7acb56e Merge pull request #1407 from aptly-dev/dependabot/go_modules/golang.org/x/crypto-0.31.0
Bump golang.org/x/crypto from 0.26.0 to 0.31.0
2024-12-20 11:29:07 +01:00
Gordian Schoenherr
50d3676847 Update man page 2024-12-20 12:55:56 +09:00
Gordian Schoenherr
8830354027 Extend system tests for @file filter syntax 2024-12-20 10:59:29 +09:00
Gordian Schoenherr
2467674fca Update system tests 2024-12-19 16:05:21 +09:00
Gordian Schoenherr
9691b0f518 Refactor query reading from file, update docs
Add support for @file syntax in more places.
2024-12-19 15:02:10 +09:00
Christof Warlich
005114839a Generalize to read filter from file or stdin. 2024-12-13 11:24:54 +09:00
Christof Warlich
a5d322252a Allow reading package query for -filter option from a file. 2024-12-13 11:24:47 +09:00
dependabot[bot]
b49630d6fc Bump golang.org/x/crypto from 0.26.0 to 0.31.0
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.26.0 to 0.31.0.
- [Commits](https://github.com/golang/crypto/compare/v0.26.0...v0.31.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-12 00:24:55 +00:00
André Roth
93650efddb Merge pull request #1404 from schoenherrg/fix/with-sources-ignored
Fix `-with-sources` not fetching differently named source packages
2024-12-11 13:01:30 +01:00
André Roth
d87327835e Merge pull request #1401 from aptly-dev/feature/yaml-config
Feature/yaml config
2024-12-11 12:38:47 +01:00
André Roth
0d90ff96b9 debian: add build dependency for yaml 2024-12-11 12:02:52 +01:00
André Roth
b14595cb2d cleanup makefile 2024-12-11 12:02:52 +01:00
André Roth
e50a5e175f update documentation and man page 2024-12-11 12:02:52 +01:00