-
Notifications
You must be signed in to change notification settings - Fork 475
Description
Is your feature request related to a problem? Please describe.
Accumulo caches uncompressed rfile blocks. This can lead to data in cache taking up much more space than data on disk.
Describe the solution you'd like
Optionally allow storing compressed rfile blocks in the cache.
Storing compressed rfile blocks in the cache would likely lead to more CPU usage at query and would likely disable feature that allow random lookups in cached blocks.
Describe alternatives you've considered
This could potentially be implemented without any changes to Accumulo. The only drawback to that is we would always uncompress the data when reading from disk and then recompress it when storing in the cache. This could be expensive. To allow taking compressed rfile blocks directly from disk and storing them in cache would require a change in Accumulo because it always uncompresses before caching.
Maybe its best to leave the on heap primary cache as uncompressed and have a secondary cache (possibly off heap) that compresses blocks. This could likely be done w/o any changes to Accumulo as it could be done completely by plugins. This may look like the following.
- Read rfile block and uncompress it.
- Stored uncompressed rfile block in primary cache.
- When primary cache offloads a block to secondary cache its compresses.
- When primary cache loads a block from secondary cache its uncompresses.
Maybe that could provide good CPU and memory utilization.
Additional context
Noticed the difference in compressed vs uncompressed cache data when working on #6010. That change emits data about compressed and uncompressed data read per scan.