Skip to content

Commit 24334d3

Browse files
New "Caching" doc page (#1615)
With the help of @predat --------- Signed-off-by: brycegbrazen <[email protected]> Signed-off-by: Jean-Christophe Morin <[email protected]> Co-authored-by: Jean-Christophe Morin <[email protected]>
1 parent 6bdba31 commit 24334d3

File tree

4 files changed

+262
-182
lines changed

4 files changed

+262
-182
lines changed

docs/rez_sphinxext.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,8 @@ def convert_rez_config_to_rst() -> list[str]:
9393

9494
for section in settings:
9595
rst.append('')
96+
rst.append(".. _config-{}:".format(section.replace(' ', '-').lower()))
97+
rst.append("")
9698
rst.append(section)
9799
rst.append("-" * len(section))
98100
rst.append('')

docs/source/caching.rst

Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
=======
2+
Caching
3+
=======
4+
5+
Resolve Caching
6+
===============
7+
8+
Resolve caching is a feature that caches resolves to a `memcached <https://memcached.org/>`_ database.
9+
10+
Memcached is widely used, easy to deploy (because there is no storage needed since it's a single
11+
process/executable), and is very fast due to the data residing in memory.
12+
13+
In a studio environment (with many machines), machines that perform a solve that is already cached to the
14+
resolve cache will simply receive the cached result rather than preforming a re-solve. This can significantly
15+
decrease the time it takes to resolve environments. Slow solves will now be almost instantaneous.
16+
17+
Resolve caching has almost no downsides. Only in rare edge cases where you have to "hack" a released package into
18+
production do you see any issues. In this case, because resolves are cached, you may receive a different package than
19+
you expect. In this case however, it's better to just manually invalidate the cache anyway.
20+
21+
Cache contents
22+
--------------
23+
24+
The following information is stored to the memcached server for each solve:
25+
26+
* Solver information about previously cached solves.
27+
* Timestamps of packages seen in previous solves.
28+
* Variant states information about the state of a variant. For example, in the 'filesystem' repository type,
29+
the 'state' is the last modified date of the file associated with the variant (perhaps a package.py).
30+
If the state of any variant has changed from a cached resolve - eg. if a file has been modified - the cached resolve is discarded.
31+
32+
Setup
33+
-----
34+
35+
To enable memcached caching, you need to configure the :data:`memcached_uri` config variable.
36+
This variable accepts a list of URI to your memcached servers or None. Example with memcached running on
37+
localhost on its default port:
38+
39+
.. code-block:: python
40+
41+
memcached_uri = ["127.0.0.1:11211"]
42+
43+
This is the only parameter you need to configure to enable caching of the content and location of package file definitions and resolutions in Rez.
44+
45+
Please refer to the :ref:`caching <config-caching>` configuration section for a complete list of settings.
46+
47+
Cache invalidation
48+
------------------
49+
50+
Cache entries will automatically be invalidated when a newer package version is released that would change the result
51+
of an existing resolve.
52+
53+
For example, let's say you are running rez-env with the package ``foo1+<2``, and originally, the only available
54+
``foo`` package version is ``1.0.0``, so the cached resolve points to ``1.0.0``. However, at some point afterwards
55+
you release a new version ``1.0.1``. The cache would invalidate for the request ``foo1+<2`` and the next resolve
56+
would correctly retrieve package version ``1.0.1``.
57+
58+
Validating operation
59+
--------------------
60+
61+
To print debugging information about memcached usage, you can set the :envvar:`REZ_DEBUG_MEMCACHE` environment
62+
variable or you can use the :data:`debug_memcache` setting.
63+
64+
Show stats from memcached server
65+
--------------------------------
66+
67+
Rez provides a command-line tool :ref:`rez-memcache` that can be used to see stats about cache misses/hits and to
68+
reset the memcached cache.
69+
70+
.. code-block:: console
71+
72+
$ rez-memcache
73+
74+
CACHE SERVER UPTIME HITS MISSES HIT RATIO MEMORY USED
75+
------------ ------ ---- ------ --------- ------ ----
76+
127.0.0.1:11211 20 hours 27690 5205 84% 119 Gb 10 Mb (0%)
77+
central.example.com:11211 6.2 months 19145089 456 99% 64 Mb 1.9 Mb (2%)
78+
79+
.. _package-caching:
80+
81+
Package Caching
82+
===============
83+
84+
Package caching is a feature that copies package payloads onto local disk in
85+
order to speed up runtime environments. For example, if your released packages
86+
reside on shared storage (which is common), then running say, a Python process,
87+
will load all source from the shared storage across your network. The point of
88+
the cache is to copy that content locally instead, and avoid the network cost.
89+
90+
.. note::
91+
Package caching does **NOT** cache package definitions.
92+
Only their payloads (ie, the package root directory).
93+
94+
Build behavior
95+
--------------
96+
97+
Package caching during a package build is disabled by default. To enable caching during
98+
a package build, you can set :data:`package_cache_during_build` to True.
99+
100+
.. _enabling-package-caching:
101+
102+
Enabling Package Caching
103+
========================
104+
105+
Package caching is not enabled by default. To enable it, you need to configure
106+
:data:`cache_packages_path` to specify a path to
107+
store the cache in.
108+
109+
You also have granular control over whether an individual package will or will
110+
not be cached. To make a package cachable, you can set :attr:`cachable`
111+
to False in its package definition file. Reasons you may *not* want to do this include
112+
packages that are large, or that aren't relocatable because other compiled packages are
113+
linked to them in a way that doesn't support library relocation.
114+
115+
There are also config settings that affect cachability in the event that :attr:`cachable`
116+
is not defined in a package's definition. For example, see
117+
:data:`default_cachable`, :data:`default_cachable_per_package`
118+
and :data:`default_cachable_per_repository`.
119+
120+
Note that you can also disable package caching on the command line, using
121+
:option:`rez-env --no-pkg-cache`.
122+
123+
Verifying
124+
---------
125+
126+
When you resolve an environment, you can see which variants have been cached by
127+
noting the ``cached`` label in the right-hand column of the :ref:`rez-context` output,
128+
as shown below:
129+
130+
.. code-block:: console
131+
132+
$ rez-env Flask
133+
134+
You are now in a rez-configured environment.
135+
136+
requested packages:
137+
Flask
138+
~platform==linux (implicit)
139+
~arch==x86_64 (implicit)
140+
~os==Ubuntu-16.04 (implicit)
141+
142+
resolved packages:
143+
Flask-1.1.2 /home/ajohns/package_cache/Flask/1.1.2/d998/a (cached)
144+
Jinja2-2.11.2 /home/ajohns/package_cache/Jinja2/2.11.2/6087/a (cached)
145+
MarkupSafe-1.1.1 /svr/packages/MarkupSafe/1.1.1/d9e9d80193dcd9578844ec4c2c22c9366ef0b88a
146+
Werkzeug-1.0.1 /home/ajohns/package_cache/Werkzeug/1.0.1/fe76/a (cached)
147+
arch-x86_64 /home/ajohns/package_cache/arch/x86_64/6450/a (cached)
148+
click-7.1.2 /home/ajohns/package_cache/click/7.1.2/0da2/a (cached)
149+
itsdangerous-1.1.0 /home/ajohns/package_cache/itsdangerous/1.1.0/b23f/a (cached)
150+
platform-linux /home/ajohns/package_cache/platform/linux/9d4d/a (cached)
151+
python-3.7.4 /home/ajohns/package_cache/python/3.7.4/ce1c/a (cached)
152+
153+
For reference, cached packages also have their original payload location stored to
154+
an environment variable like so:
155+
156+
.. code-block:: console
157+
158+
$ echo $REZ_FLASK_ORIG_ROOT
159+
/svr/packages/Flask/1.1.2/88a70aca30cb79a278872594adf043dc6c40af99
160+
161+
How it Works
162+
------------
163+
164+
Package caching actually caches :doc:`variants`, not entire packages. When you perform
165+
a resolve, or source an existing context, the variants required are copied to
166+
local disk asynchronously (if they are cachable), in a separate process called
167+
:ref:`rez-pkg-cache`. This means that a resolve will not necessarily use the cached
168+
variants that it should, the first time around. Package caching is intended to have
169+
a cumulative effect, so that more cached variants will be used over time. This is
170+
a tradeoff to avoid blocking resolves while variant payloads are copied across
171+
your network (and that can be a slow process).
172+
173+
Note that a package cache is **not** a package repository. It is simply a store
174+
of variant payloads, structured in such a way as to be able to store variants from
175+
any package repository, into the one shared cache.
176+
177+
Variants that are cached are assumed to be immutable. No check is done to see if
178+
a variant's payload has changed, and needs to replace an existing cache entry. So
179+
you should **not** enable caching on package repositories where packages may get
180+
overwritten. It is for this reason that caching is disabled for local packages by
181+
default (see :data:`package_cache_local`).
182+
183+
Commandline Tool
184+
----------------
185+
186+
Inspection
187+
++++++++++
188+
189+
Use the :ref:`rez-pkg-cache` tool to view the state of the cache, and to perform
190+
warming and deletion operations. Example output follows:
191+
192+
.. code-block:: console
193+
194+
$ rez-pkg-cache
195+
Package cache at /home/ajohns/package_cache:
196+
197+
status package variant uri cache path
198+
------ ------- ----------- ----------
199+
cached Flask-1.1.2 /svr/packages/Flask/1.1.2/package.py[0] /home/ajohns/package_cache/Flask/1.1.2/d998/a
200+
cached Jinja2-2.11.2 /svr/packages/Jinja2/2.11.2/package.py[0] /home/ajohns/package_cache/Jinja2/2.11.2/6087/a
201+
cached Werkzeug-1.0.1 /svr/packages/Werkzeug/1.0.1/package.py[0] /home/ajohns/package_cache/Werkzeug/1.0.1/fe76/a
202+
cached arch-x86_64 /svr/packages/arch/x86_64/package.py[] /home/ajohns/package_cache/arch/x86_64/6450/a
203+
cached click-7.1.2 /svr/packages/click/7.1.2/package.py[0] /home/ajohns/package_cache/click/7.1.2/0da2/a
204+
cached itsdangerous-1.1.0 /svr/packages/itsdangerous/1.1.0/package.py[0] /home/ajohns/package_cache/itsdangerous/1.1.0/b23f/a
205+
cached platform-linux /svr/packages/platform/linux/package.py[] /home/ajohns/package_cache/platform/linux/9d4d/a
206+
copying python-3.7.4 /svr/packages/python/3.7.4/package.py[0] /home/ajohns/package_cache/python/3.7.4/ce1c/a
207+
stalled MarkupSafe-1.1.1 /svr/packages/MarkupSafe/1.1.1/package.py[1] /home/ajohns/package_cache/MarkupSafe/1.1.1/724c/a
208+
209+
Each variant is stored into a directory based on a partial hash of that variant's
210+
unique identifier (its "handle"). The package cache is thread and multiprocess
211+
proof, and uses a file lock to control access where necessary.
212+
213+
Cached variants have one of the following statuses at any given time:
214+
215+
* **copying**: The variant is in the process of being copied into the cache, and is not
216+
yet available for use;
217+
* **cached**: The variant has been cached and is ready for use;
218+
* **stalled**: The variant was getting copied, but something went wrong and there is
219+
now a partial copy present (but unused) in the cache.
220+
221+
Logging
222+
+++++++
223+
224+
Caching operations are stored into logfiles within the cache directory. To view:
225+
226+
.. code-block:: console
227+
228+
$ rez-pkg-cache --logs
229+
rez-pkg-cache 2020-05-23 16:17:45,194 PID-29827 INFO Started daemon
230+
rez-pkg-cache 2020-05-23 16:17:45,201 PID-29827 INFO Started caching of variant /home/ajohns/packages/Werkzeug/1.0.1/package.py[0]...
231+
rez-pkg-cache 2020-05-23 16:17:45,404 PID-29827 INFO Cached variant to /home/ajohns/package_cache/Werkzeug/1.0.1/fe76/a in 0.202576 seconds
232+
rez-pkg-cache 2020-05-23 16:17:45,404 PID-29827 INFO Started caching of variant /home/ajohns/packages/python/3.7.4/package.py[0]...
233+
rez-pkg-cache 2020-05-23 16:17:46,006 PID-29827 INFO Cached variant to /home/ajohns/package_cache/python/3.7.4/ce1c/a in 0.602037 seconds
234+
235+
Cleaning The Cache
236+
++++++++++++++++++
237+
238+
Cleaning the cache refers to deleting variants that are stalled or no longer in use.
239+
It isn't really possible to know whether a variant is in use, so there is a
240+
configurable :data:`package_cache_max_variant_days`
241+
setting, that will delete variants that have not been used (ie that have not appeared
242+
in a created or sourced context) for more than N days.
243+
244+
You can also manually remove variants from the cache using :option:`rez-pkg-cache -r`.
245+
Note that when you do this, the variant is no longer available in the cache,
246+
however it is still stored on disk. You must perform a clean (:option:`rez-pkg-cache --clean`)
247+
to purge unused cache files from disk.
248+
249+
You can use the :data:`package_cache_clean_limit`
250+
setting to asynchronously perform some cleanup every time the cache is updated. If
251+
you do not use this setting, it is recommended that you set up a cron or other form
252+
of execution scheduler, to run :option:`rez-pkg-cache --clean` periodically. Otherwise,
253+
your cache will grow indefinitely.
254+
255+
Lastly, note that a stalled variant will not attempt to be re-cached until it is
256+
removed by a clean operation. Using :data:`package_cache_clean_limit` will not clean
257+
stalled variants either, as that could result in a problematic variant getting
258+
cached, then stalled, then deleted, then cached again and so on. You must run
259+
:option:`rez-pkg-cache --clean` to delete stalled variants.

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Welcome to rez's documentation!
2828
context_bundles
2929
suites
3030
managing_packages
31+
caching
3132
pip
3233

3334
.. toctree::

0 commit comments

Comments
 (0)