-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-45522: [Parquet][C++] Parquet GEOMETRY and GEOGRAPHY logical type implementations #45459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
paleolimbot
merged 142 commits into
apache:main
from
paleolimbot:Kontinuation-kontinuation-parquet-geometry
Apr 30, 2025
Merged
Changes from all commits
Commits
Show all changes
142 commits
Select commit
Hold shift + click to select a range
c3531f7
Update thrift
paleolimbot 086e52c
Updated parquet.thrift and re-generated cpp sources
Kontinuation c5d01e1
update so that it all builds
paleolimbot e9d5180
simplify geometry utility
paleolimbot 8487f71
get projjson from metadata
paleolimbot 983d6b6
Merge branch 'main' into Kontinuation-kontinuation-parquet-geometry
paleolimbot 2a80461
format
paleolimbot 2c4f7e3
Apply suggestions from code review
paleolimbot 03dbac4
fix merge
paleolimbot c460eb0
move all parsing logic to the same file
paleolimbot 2d6d5cb
update geometry type/dims enum value min/max names
paleolimbot 568e76c
make EncodedGeospatialStatistics a struct instead of a class
paleolimbot 0630edd
remove unused null_count
paleolimbot a158be4
fix two more null count references
paleolimbot 3e5c097
remove get_ prefix from getters in geospatial_statistics.h/cc
paleolimbot 78c4cb2
revert is_set() definition
paleolimbot ab1a0c0
add clarifying note for is_empty()
paleolimbot 014eb06
remove WKBBuffer::Init()
paleolimbot c62393f
ReadXXX -> MergeXXX in geospatial_util_internal.h/cc
paleolimbot a6ebbee
Use structured binding for geometry type ane dimensions
paleolimbot d8bb4f6
format
paleolimbot 06c13f5
simpler Equals() for GeoStatisitics
paleolimbot 26329ac
add comment clarifying why record_wkb_type is false when recursing in…
paleolimbot 1990275
remove GeoCrsContext from ArrowWriterProperties
paleolimbot ceffbc9
add comment about why there are no null counts for Geometry/Geography
paleolimbot edcf971
Better C++ idioms in types.cc
paleolimbot f728191
Revert checked_cast usage in types.cc
paleolimbot 790b6af
explicit NaN handling
paleolimbot 0b686a7
Include cmath
paleolimbot adc2b3d
add test for xyz and xym to geospatial_statistics_test
paleolimbot 4008b8f
Ensure LogicalType JSON output is valid for arbitrary crs values
paleolimbot f0e019e
clean up diff
paleolimbot fcc7af6
Add big-endian WKB and error check to parameterized test
paleolimbot f4aedf6
don't define kNaN or kInf in tests for Windows
paleolimbot 6224fd4
Update cpp/src/parquet/test_util.cc
paleolimbot b76541b
internal_json -> json_internal
paleolimbot 9040c3b
check that regular statistics are missing for geometry output
paleolimbot 2780df0
spacing of includes in geospatial_statistics.cc
paleolimbot ae4e1b8
no nullptr in geospatial_util_internal.cc
paleolimbot e2cbc4f
fix error for incomplete coordinate sequence
paleolimbot 32346b9
Update cpp/src/parquet/geospatial_util_internal.cc
paleolimbot 80b2328
Update cpp/src/parquet/test_util.cc
paleolimbot a060b51
Update cpp/src/parquet/test_util.cc
paleolimbot 0014cf9
Update python/pyarrow/parquet/core.py
paleolimbot 774fe19
Update cpp/src/parquet/metadata.cc
paleolimbot d15bd2a
Update cpp/src/parquet/geospatial_util_internal.cc
paleolimbot 7657bd3
Update cpp/src/parquet/arrow/schema.cc
paleolimbot dfe15e7
fix signed/unsigned comparison
paleolimbot a391819
Move BoundingBox::ToString() out of line
paleolimbot 8415d3b
remove ByteSwap helper and don't predict endian swaps as unlikely
paleolimbot 21e1484
use string_view in geospatial_util_json
paleolimbot 0d06228
use Invalid instead of SerializationError
paleolimbot 13d7435
minimize includes of test_util.h and test_util.cc
paleolimbot a28b611
handle non-printable characters in LogicalType JSON output
paleolimbot 66cadb8
propagate fix to arrow_extensions_enabled Python documentation
paleolimbot 942a591
fix geostatistics accessors for Python
paleolimbot d350a39
make sure GeoStatistics constructor throws if called from Python
paleolimbot 801aad0
test geography in reader_test.cc
paleolimbot b263b16
fix indentation in Python documentation
paleolimbot 124a0f5
Add GeoStatistics::has_dimension() and use it in tests
paleolimbot 6da241b
check geostatistics inequality
paleolimbot 4d5e539
attempt requiring ARROW_JSON
paleolimbot bb5f9bb
Merge branch 'main' into Kontinuation-kontinuation-parquet-geometry
paleolimbot 7b97725
move geospatial files to geospatial/
paleolimbot 90609c2
make sure geospatial/ headers are installed
paleolimbot e5163b0
remove conditional ARROW_JSON code
paleolimbot 8c591c9
Use exceptions instead of Status in geospatial/util_internal.h
paleolimbot 1513336
fix unreferenced variable
paleolimbot 3f17099
ensure GeoStatistics are written for the all null case but are empty
paleolimbot 0578ae9
Merge remote-tracking branch 'upstream/main' into Kontinuation-kontin…
paleolimbot 7b4ec55
fix macros
paleolimbot a14fb07
include array in util_internal
paleolimbot 246d071
Update cpp/src/parquet/column_writer.cc
paleolimbot cd0b1cc
Update cpp/src/parquet/geospatial/statistics.cc
paleolimbot 80f9e81
Update cpp/src/parquet/geospatial/statistics.h
paleolimbot b76563d
sort test files ascening in CMakeLists.txt
paleolimbot 28a1dd1
consistently apply the parquet::geometry namespace
paleolimbot 0da26d1
parquet::geometry -> parquet::geospatial
paleolimbot a7b3c4b
update statistics for completely null and completely empty
paleolimbot cf4481f
use kMaxDimensions in more places, update pyarrow
paleolimbot c9c8b9d
Consolidate validity and existence of geostatistics
paleolimbot 85959a4
remove remaining conditional ARROW_JSON logic
paleolimbot 699eddd
add back in trailing whitespace to parquet.thrift from main
paleolimbot a2ab3b6
add mutex to protect possible_stats_ and possible_geo_stats_ modifica…
paleolimbot c917d33
fix python build
paleolimbot 6ceca04
don't use encoded geo statistics in Cython
paleolimbot d11413b
handle bounding box validity and emptiness separately
paleolimbot a586ca8
fix python build
paleolimbot 11df2d1
Update cpp/src/parquet/geospatial/statistics.cc
paleolimbot e01ec4e
better names and initializers for EncodedGeoStatistics fields
paleolimbot 9f6baf2
Update cpp/src/parquet/metadata.cc
paleolimbot ee13896
aquire lock before checking for geo stats == nullptr
paleolimbot b6b96da
fix clang-format of parquet-types.h generated file
paleolimbot c386fd2
Differentiate between the bound being present and the a dimension bei…
paleolimbot 7aed73f
undo clang-format of parquet_types.cpp
paleolimbot 4c18e94
Merge remote-tracking branch 'upstream/main' into Kontinuation-kontin…
paleolimbot d438e3d
add diagnostics for failure
paleolimbot 465308b
fix writing of geoarrow.wkb resulting from merge
paleolimbot a86feee
Add issue reference, remove unneeded header addition, fix outdated co…
paleolimbot bdbdcd1
clang-format
paleolimbot 0295268
update CMake to build rapidjson when parquet is turned on
paleolimbot 749fd63
rename writer_calculated_geospatial_types to geospatial_types_present
paleolimbot 0af1de8
Update cpp/src/parquet/types.cc
paleolimbot e0adea3
Update cpp/src/parquet/geospatial/statistics.h
paleolimbot e7da4fd
Update cpp/src/parquet/column_writer_test.cc
paleolimbot 32d6e31
Update cpp/src/parquet/geospatial/util_json_internal.cc
paleolimbot df8adf7
Update cpp/src/parquet/arrow/arrow_schema_test.cc
paleolimbot 96682c8
Update cpp/src/parquet/test_util.h
paleolimbot 7614ff9
Update cpp/src/parquet/thrift_internal.h
paleolimbot b0381b7
Update cpp/src/parquet/types.cc
paleolimbot fd7f722
Update cpp/src/parquet/types.cc
paleolimbot bac47dc
Update cpp/src/parquet/thrift_internal.h
paleolimbot 91b0872
Update cpp/src/parquet/arrow/arrow_schema_test.cc
paleolimbot 9358f90
revert shared_ptr change (doesn't compile because constructor is priv…
paleolimbot c1c5dc3
one more make_pair
paleolimbot 8c48088
initialize encoded statistics presence flags to false explicitly
paleolimbot c6ae22b
cleaner bbox init in thrift_initernal.h
paleolimbot bc57c46
add reader_test with geography + arrow
paleolimbot 4611d4b
add test for extensions enabled = true but without geoarrow extension…
paleolimbot bea82e4
remove mention of ARROW_JSON
paleolimbot 5b8d654
accept string_view in MakeGeoArrowCrsMetadata()
paleolimbot 5af1471
also sanitize JSON when the value comes from a Parquet file metadata …
paleolimbot b3261b2
also support "EPSG" + "4326" when 4326 is a string
paleolimbot e529480
remove stale line in arrow_schema_test.cc
paleolimbot 90202bd
document GeoStatisticsImpl::is_wraparound
paleolimbot 9b51ca3
single-line string representation for GeoStatistics
paleolimbot 0cbe770
add span overload for WKBGeometryBounder::MergeGeometry()
paleolimbot 7a7ceb1
move bounding box stringifier to util_internal.h
paleolimbot 995df92
test BoundingBox operator==
paleolimbot c38a476
ReadDoubles() -> ReadCoords()
paleolimbot 2b87131
test WKBGeometryBounder with too many bytes
paleolimbot e2f2044
add more tests for GeoStatistics equality
paleolimbot 516a053
use rapidjson to escape json
paleolimbot 73a47dc
undo format change
paleolimbot 92d655d
clarify comments about emtpiness and validity in statistics.h
paleolimbot 43dd64c
rename GeoStatisticsImpl::Update to Decode and move the Reset() call …
paleolimbot 81567b9
remove geospatial/statistics.h from api/reader.h
paleolimbot 94f1908
cleaner cython method implementations
paleolimbot 759ab2e
document Python GeoStatistics and properties
paleolimbot 4ce6803
Mark X values as invalid when merging wraparound box instead of throw…
paleolimbot 4bca1c0
remove unneded CMake check for ARROW_JSON now that RapidJSON is autom…
paleolimbot e1b7061
clarify comment regarding dimension_valid()
paleolimbot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.