Skip to content

Optionally cache compressed data #6023

@keith-turner

Description

@keith-turner

Is your feature request related to a problem? Please describe.

Accumulo caches uncompressed rfile blocks. This can lead to data in cache taking up much more space than data on disk.

Describe the solution you'd like

Optionally allow storing compressed rfile blocks in the cache.

Storing compressed rfile blocks in the cache would likely lead to more CPU usage at query and would likely disable feature that allow random lookups in cached blocks.

Describe alternatives you've considered

This could potentially be implemented without any changes to Accumulo. The only drawback to that is we would always uncompress the data when reading from disk and then recompress it when storing in the cache. This could be expensive. To allow taking compressed rfile blocks directly from disk and storing them in cache would require a change in Accumulo because it always uncompresses before caching.

Maybe its best to leave the on heap primary cache as uncompressed and have a secondary cache (possibly off heap) that compresses blocks. This could likely be done w/o any changes to Accumulo as it could be done completely by plugins. This may look like the following.

  1. Read rfile block and uncompress it.
  2. Stored uncompressed rfile block in primary cache.
  3. When primary cache offloads a block to secondary cache its compresses.
  4. When primary cache loads a block from secondary cache its uncompresses.

Maybe that could provide good CPU and memory utilization.

Additional context

Noticed the difference in compressed vs uncompressed cache data when working on #6010. That change emits data about compressed and uncompressed data read per scan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementThis issue describes a new feature, improvement, or optimization.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions