Skip to content

[Java] DictionaryProvider leaks memory while adding dictionaries with duplicate encoding #313

@asfimport

Description

@asfimport

DictionaryProvider leaks memory while adding dictionaries with duplicate encoding. Is this expected? Should the provider release the memory of the existing dictionary vector if it accepts another one with same encoding id ?

Sample code:

"dictionaryProvider" should " not leak memory while adding dictionaries with duplicate encoding" in {

  val allocator: RootAllocator = new RootAllocator()

  val vector: ListVector = ListVector.empty("vector", allocator)
  val dictionaryVector1: ListVector = ListVector.empty("dict1", allocator)
  val dictionaryVector2: ListVector = ListVector.empty("dict2", allocator)

  val writer1: UnionListWriter = vector.getWriter
  writer1.allocate
  writer1.setValueCount(1)

  val dictWriter1: UnionListWriter = dictionaryVector1.getWriter
  dictWriter1.allocate
  dictWriter1.setValueCount(1)

  val dictWriter2: UnionListWriter = dictionaryVector2.getWriter
  dictWriter2.allocate
  dictWriter2.setValueCount(1)

  val dictionary1: Dictionary = new Dictionary(dictionaryVector1, new DictionaryEncoding(1L, false, None.orNull))
  val dictionary2: Dictionary = new Dictionary(dictionaryVector2, new DictionaryEncoding(1L, false, None.orNull))

  val provider = new DictionaryProvider.MapDictionaryProvider
  provider.put(dictionary1)
  provider.put(dictionary2)

  vector.clear()
  provider.getDictionaryIds.asScala.map(id => provider.lookup(id).getVector.clear())

  allocator.getAllocatedMemory shouldBe 0
} 

Reporter: Vimal Varghese

Note: This issue was originally created as ARROW-16920. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions