aptly

mirror of https://github.com/aptly-dev/aptly.git synced 2026-02-16 09:11:28 +00:00

Author	SHA1	Message	Date
Ryan Gonzalez	19a705f80d	Split reflists to share their contents across snapshots In current aptly, each repository and snapshot has its own reflist in the database. This brings a few problems with it: - Given a sufficiently large repositories and snapshots, these lists can get enormous, reaching >1MB. This is a problem for LevelDB's overall performance, as it tends to prefer values around the confiruged block size (defaults to just 4KiB). - When you take these large repositories and snapshot them, you have a full, new copy of the reflist, even if only a few packages changed. This means that having a lot of snapshots with a few changes causes the database to basically be full of largely duplicate reflists. - All the duplication also means that many of the same refs are being loaded repeatedly, which can cause some slowdown but, more notably, eats up huge amounts of memory. - Adding on more and more new repositories and snapshots will cause the time and memory spent on things like cleanup and publishing to grow roughly linearly. At the core, there are two problems here: - Reflists get very big because there are just a lot of packages. - Different reflists can tend to duplicate much of the same contents. Split reflists aim at solving this by separating reflists into 64 buckets. Package refs are sorted into individual buckets according to the following system: - Take the first 3 letters of the package name, after dropping a `lib` prefix. (Using only the first 3 letters will cause packages with similar prefixes to end up in the same bucket, under the assumption that packages with similar names tend to be updated together.) - Take the 64-bit xxhash of these letters. (xxhash was chosen because it relatively good distribution across the individual bits, which is important for the next step.) - Use the first 6 bits of the hash (range [0:63]) as an index into the buckets. Once refs are placed in buckets, a sha256 digest of all the refs in the bucket is taken. These buckets are then stored in the database, split into roughly block-sized segments, and all the repositories and snapshots simply store an array of bucket digests. This approach means that repositories and snapshots can share their reflist buckets. If a snapshot is taken of a repository, it will have the same contents, so its split reflist will point to the same buckets as the base repository, and only one copy of each bucket is stored in the database. When some packages in the repository change, only the buckets containing those packages will be modified; all the other buckets will remain unchanged, and thus their contents will still be shared. Later on, when these reflists are loaded, each bucket is only loaded once, short-cutting loaded many megabytes of data. In effect, split reflists are essentially copy-on-write, with only the changed buckets stored individually. Changing the disk format means that a migration needs to take place, so that task is moved into the database cleanup step, which will migrate reflists over to split reflists, as well as delete any unused reflist buckets. All the reflist tests are also changed to additionally test out split reflists; although the internal logic is all shared (since buckets are, themselves, just normal reflists), some special additions are needed to have native versions of the various reflist helper methods. In our tests, we've observed the following improvements: - Memory usage during publish and database cleanup, with `GOMEMLIMIT=2GiB`, goes down from ~3.2GiB (larger than the memory limit!) to ~0.7GiB, a decrease of ~4.5x. - Database size decreases from 1.3GB to 367MB. In my local tests, publish times had also decreased down to mere seconds but the same effect wasn't observed on the server, with the times staying around the same. My suspicions are that this is due to I/O performance: my local system is an M1 MBP, which almost certainly has much faster disk speeds than our DigitalOcean block volumes. Split reflists include a side effect of requiring more random accesses from reading all the buckets by their keys, so if your random I/O performance is slower, it might cancel out the benefits. That being said, even in that case, the memory usage and database size advantages still persist. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>	2025-02-15 23:49:21 +01:00
André Roth	93650efddb	Merge pull request #1404 from schoenherrg/fix/with-sources-ignored Fix `-with-sources` not fetching differently named source packages	2024-12-11 13:01:30 +01:00
André Roth	e319f3cd14	update doc make descrptions consistent	2024-12-11 11:19:46 +01:00
André Roth	c6e0a06b14	swagger: cleanup	2024-12-11 10:40:44 +01:00
André Roth	ba86851d07	add api documentation stubs	2024-12-11 10:40:43 +01:00
Gordian Schoenherr	3b785e4165	Refactor Filter options into a struct It was already a lot of options for one method and I am going to add another one in the next commit.	2024-12-09 13:17:41 +09:00
André Roth	37a9fbe530	api: fix OOM with sync tasks since sync API calls also use tasks internally, this lead to out of memory due to aptly never removing them.	2024-08-03 14:36:04 +02:00
hudeng	ecc88e7a40	feat: repo and snapshots packages filter api add 'maximumVersion' query parameter support example: `curl http://localhost:8080/api/repos/test/packages\?maximumVersion\=1` Change-Id: Ie9ffd36146bf017bbb353737f32360f7b73d6b0a	2024-06-24 17:44:40 +02:00
André Roth	3e1485faf5	queue sync calls	2024-06-15 19:18:14 +02:00
André Roth	e9bdb983c8	tasks: improve log level	2024-06-15 16:15:23 +02:00
Markus Muellner	8e62195eb5	implement structured logging	2023-02-20 13:42:50 +01:00
Markus Muellner	ecc41f0c0f	replace AbortWithError calls by custom function that sets the content type correctly	2023-01-23 10:42:57 +01:00
Markus Muellner	2020ca9971	add ready and healthy probe endpoints	2022-12-12 13:39:07 +01:00
Lorenzo Bolla	b281819cba	Make truthy function less surprising	2022-01-27 09:30:14 +01:00
Lorenzo Bolla	ff51c46915	More informative return value for task.Process	2022-01-27 09:30:14 +01:00
Lorenzo Bolla	0914cd16af	Use global async flag as fallback on per-request flag This way, if no pre-request flag is specified, the globally configured default is used.	2022-01-27 09:30:14 +01:00
Lorenzo Bolla	9b28d8984f	Configurable background task execution	2022-01-27 09:30:14 +01:00
Oliver Sauder	6ab5e60833	Add task api and resource locking ability	2022-01-27 09:30:14 +01:00
Oliver Sauder	208a2151c1	every go routine needs to have its own collection factory this is needed so concurrent reads and writes are possible.	2022-01-27 09:30:14 +01:00
Andrey Smirnov	b8c5303fdb	Fix paths after repository transfer to aptly-dev	2018-04-18 21:19:43 +03:00
Andrey Smirnov	3756db2491	Upgrade gin-gonic to latest master, fix compatibility issues	2017-09-28 00:33:59 +03:00
Andrey Smirnov	211ac0501f	Rework the way database is open/re-open in aptly Allow database to be initialized without opening, unify all the open paths to retry on failure. In API router make sure open requests are matched with acks in explicit way. This also enables re-open attempts in all the aptly commands, so it should make running aptly CLI much easier now hopefully. Fix up system tests for oldoldstable ;)	2017-07-05 00:17:48 +03:00
Andrey Smirnov	11d828b3b1	Add govet/golint into Travis CI build Fix current issues	2017-03-22 21:49:16 +03:00
Andrey Smirnov	18d04c7977	Fix failure not being reported from API. #290	2016-03-01 12:52:54 +03:00
Andrey Smirnov	d6c7a9a89c	Flush collection contents on each DB unlock in API. See #343	2016-02-13 13:36:35 +03:00
Vincent Bernat	7f6a52019f	Add a flag to unlock database after each API request After the first API request, the database was locked as long as the API server is running. This prevents a user to also use the command-line client. This commit adds a new flag `-no-lock` that will close the database after each API request. Closes #234	2015-10-02 20:04:48 +02:00
Vincent Bernat	16101b56fe	Fix lock handling in cache flusher for API Unlocking the different elements in cache flusher was deferred to the end of the function. Unfortunately, being a for loop wrapped in a goroutine, deferred were never executed.	2015-10-02 19:59:47 +02:00
Andrey Smirnov	c737b8c544	Flush CollectionFactory every 15 minutes. #116	2015-02-16 00:46:31 +03:00
Andrey Smirnov	76ee53e9f8	Eliminate data races by using API without Progress. #116	2015-02-16 00:32:45 +03:00
Andrey Smirnov	9250479846	Extract common part of show and search packages from snapshots and repos. #168	2015-01-24 22:23:16 +03:00
Andrey Smirnov	d489694ea9	Refactoring: simplify version generation. Rename API to /api/version. #167	2015-01-13 18:47:41 +03:00
Sylvain Baubeau	6c7f3b3bbd	Add /api route to show API version #116	2015-01-12 10:56:54 +01:00
Andrey Smirnov	93e8e18ca6	Document lock order acquisition. #116 [ci skip]	2014-12-23 00:59:29 +03:00
Andrey Smirnov	10056b8571	Add first /repos/ API, command `api serve`. #116	2014-10-08 16:19:15 +04:00

34 Commits