Skip to content

Conversation

@wkalt
Copy link
Contributor

@wkalt wkalt commented Dec 29, 2025

Prior to this commit, FixedSizeList was only supported with primitive element types (e.g., FSL<Float32> for vectors). This adds structural encoding support for FSL<Struct>, enabling use cases like fixed-size arrays of bounding boxes, coordinate tuples, or other structured data.

Key changes:

  • New FixedSizeListStructuralEncoder that encodes FSL validity to rep/def and delegates child encoding to the struct encoder
  • New StructuralFixedSizeListScheduler that scales row ranges by the FSL dimension when scheduling reads
  • New StructuralFixedSizeListDecoder that reconstructs FSL arrays from child data and rep/def validity

A key challenge is "garbage filtering": unlike variable-length lists which can omit children under null entries, FSL children always exist. When an FSL row is null, any nested list-like types within its children contain undefined "garbage" data. The encoder normalizes these to empty null lists before encoding.

@github-actions github-actions bot added enhancement New feature or request breaking-change labels Dec 29, 2025
@wkalt wkalt changed the title feat!: support FixedSizeList<Struct> feat: support FixedSizeList<Struct> Dec 29, 2025
@wkalt
Copy link
Contributor Author

wkalt commented Dec 29, 2025

sorry, mislabeled this breaking. Should not be breaking.

@codecov
Copy link

codecov bot commented Dec 29, 2025

@wkalt
Copy link
Contributor Author

wkalt commented Dec 31, 2025

I will update this patch once #5591 is merged - there are some related tests I want to add here that currently fail.

@wkalt wkalt force-pushed the task/complex-fsl branch from 10ef8c2 to 23ba011 Compare January 1, 2026 14:57
@westonpace
Copy link
Member

Should this encoder require 2.2? Our general rule of thumb has been "any file written by any version of 2.1 should be readable by any version of 2.1" which is perhaps stricter than normal backwards compatibility rules. Since this allows you to create files that old 2.1 readers will not be able to read I think this will need to be a 2.2 feature.

Or maybe you have that check and I just missed it.

Also, does this handle FixedSizeList<List<...>> or is that a todo still?

@wkalt
Copy link
Contributor Author

wkalt commented Jan 6, 2026

FixedSizeList<List> is still unimplemented - there is some existing todo note about List<List> somewhere I believe, and the same constraints still apply.

Sounds like this should require 2.2, thanks for catching that. I'll push an update shortly.

@wkalt
Copy link
Contributor Author

wkalt commented Jan 6, 2026

@westonpace thanks, this is updated, I think the failures are unrelated.

@wjones127
Copy link
Contributor

I think the failures are unrelated.

👍 #5646

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have one concern about handle corrupt schemas, but otherwise looks good.

Comment on lines +170 to +175
let size: i32 =
lt.0.split(':')
.next_back()
.expect("fixed_size_list:struct logical type missing size suffix")
.parse()
.expect("fixed_size_list:struct logical type has invalid size");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we can panic if we read a dataset that has a corrupt schema?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the lance schema. I'm not entirely sure users are able to create their own lance schema. So I think the only way this could happen is if there was some kind of corrupt protobuf. Also, there is a significant panic potential down below at lt => DataType::try_from(lt).unwrap().

I suppose it is technically a valid concern but this method has many many callsites and changing it to result returning should probably be a PR on its own or else this one is going to get real confusing.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. A few minor thoughts but looks good.

}

#[derive(Debug)]
struct RandomMapGenerator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document this (or the rand_map method). Just a brief comment somewhere to describe that it will randomly generate maps with 0-4 items.

let total_entries = lengths.values().iter().sum::<i32>() as u64;
let offsets = OffsetBuffer::from_lengths(lengths.values().iter().map(|v| *v as usize));

let keys = self.keys_gen.generate(RowCount::from(total_entries), rng)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really relevant but I wonder if keys need to be unique within a map? I guess not.

let child = field
.children
.first()
.expect("FixedSizeList field must have a child");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use expect_ok here so we get a result and not a panic?

let child = field
.children
.first()
.expect("FixedSizeList should have a child");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect_ok

}

/// Filters garbage from nested FSL arrays that contain list-like children.
fn filter_nested_fsl_garbage(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work here.

Comment on lines +635 to +651
async fn test_fsl_struct_random(
#[case] struct_fields: Fields,
#[case] dimension: i32,
#[case] min_version: LanceFileVersion,
#[values(STRUCTURAL_ENCODING_MINIBLOCK, STRUCTURAL_ENCODING_FULLZIP)]
structural_encoding: &str,
) {
let data_type = make_fsl_struct_type(struct_fields, dimension);
let mut field_metadata = HashMap::new();
field_metadata.insert(
STRUCTURAL_ENCODING_META_KEY.to_string(),
structural_encoding.into(),
);
let field = Field::new("", data_type, true).with_metadata(field_metadata);
let test_cases = TestCases::basic().with_min_file_version(min_version);
check_specific_random(field, test_cases).await;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice compact and comprehensive test

}

#[test]
#[should_panic(expected = "Unsupported logical type: map")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if unsupported type changed to err instead of panic. I'll make a ticket and we can handle in a follow-up.

wkalt added 6 commits January 7, 2026 20:52
Prior to this commit, FixedSizeList was only supported with primitive
element types (e.g., FSL<Float32> for vectors). This adds structural
encoding support for FSL<Struct>, enabling use cases like fixed-size
arrays of bounding boxes, coordinate tuples, or other structured data.

Key changes:
- New `FixedSizeListStructuralEncoder` that encodes FSL validity to
  rep/def and delegates child encoding to the struct encoder
- New `StructuralFixedSizeListScheduler` that scales row ranges by the
  FSL dimension when scheduling reads
- New `StructuralFixedSizeListDecoder` that reconstructs FSL arrays from
  child data and rep/def validity

A key challenge is "garbage filtering": unlike variable-length lists
which can omit children under null entries, FSL children always exist.
When an FSL row is null, any nested list-like types within its children
contain undefined "garbage" data. The encoder normalizes these to empty
null lists before encoding.
@wkalt wkalt force-pushed the task/complex-fsl branch from 690cac0 to 3d0064b Compare January 8, 2026 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants