This fixes the race condition that happens when you call publish
concurrently. It adds a valuable test that reproduces the error almost
deterministically, it's hard to say always but I have run this in loop
100 times and it reproduces the error consistently without the patch and
after the patch it works consistently.
Several sections of the code *required* a LocalPackagePool, but they
could still perform their operations with a standard PackagePool.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
In some local tests w/ a slowed down filesystem, this massively cut down
on the time to clean up a repository by ~3x, bringing a total 'publish
update' time from ~16s to ~13s.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
When a publishing uses a publish prefix, instead of listing the contents
of the whole bucket under the storage prefix, only list the contents of
the bucket under the storage prefix and publish prefix, and cache it by
publish prefix.
This speeds up publish operations under a prefix.
instead of caching the whole s3 bucket, cache only the pool path. this
requires an additional parameter, and since this is an interface, all
implementations need to follow. might help in other backends too.
closes#1181
presently there is no use case where we need this. on the other hand,
passing empty paths into any of the remove methods is indicative of a bug.
this is particularly dangerous as this can temporarily smash the publish
root but later restore it again when actually publishing. this makes
for super nasty and hard to track down problems.
to guard against this simply disallow root dir removal using empty
strings. should we find a use case for this in the future we can always
revisit this (FTR: I think very explicitly API should be used so everyone
knows what is going on and you can't accidentally run it)
previously it'd return an absolute path which makes the path absolutely
useless as all other functions of PublishedStorage need relative input
and will prepend them with the rootPath, so getting an absolute ReadLink
and then trying to remove that'd would ultimately try to remove the
absolute path `$root/AbsoluteRoot/LinkTarget` instead of `$root/LinkTarget`
add a unit test to actually verify readlink
Local package pool now implements more generic package pool API.
The base idea is to never expose full paths to files, so that other
kinds of package pools (e.g. package pool in S3) could be used to implement
the same interface.
Files get into the pool only using `Import` method. `Import` method is
now more smart, it supports moving files into the pool, it can detect if
files reside on the same filesystem and use hardlinking instead of copying.
This will make direct mirror downloads still as fast as they were with previous
version which was performing download directly to package pool.
New package pool doesn't have two things implemented yet:
1. New file placement according to SHA256 or other configured hash
2. Calculate at least SHA256/MD5 for each imported files.
MD5 would be required for S3/Swift publishing
This is related to #506
As a first step, don't pass MD5 explicitly, pass checksum info object,
so that as a next step we can choose which hash to use.
There should be no functional changes so far.
Next step: stop returning explicit paths from public package pool.