Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
c3531f7
Update thrift
paleolimbot Jun 29, 2024
086e52c
Updated parquet.thrift and re-generated cpp sources
Kontinuation Sep 3, 2024
c5d01e1
update so that it all builds
paleolimbot Feb 6, 2025
e9d5180
simplify geometry utility
paleolimbot Feb 13, 2025
8487f71
get projjson from metadata
paleolimbot Feb 17, 2025
983d6b6
Merge branch 'main' into Kontinuation-kontinuation-parquet-geometry
paleolimbot Mar 20, 2025
2a80461
format
paleolimbot Mar 20, 2025
2c4f7e3
Apply suggestions from code review
paleolimbot Mar 20, 2025
03dbac4
fix merge
paleolimbot Mar 20, 2025
c460eb0
move all parsing logic to the same file
paleolimbot Mar 20, 2025
2d6d5cb
update geometry type/dims enum value min/max names
paleolimbot Mar 20, 2025
568e76c
make EncodedGeospatialStatistics a struct instead of a class
paleolimbot Mar 20, 2025
0630edd
remove unused null_count
paleolimbot Mar 20, 2025
a158be4
fix two more null count references
paleolimbot Mar 20, 2025
3e5c097
remove get_ prefix from getters in geospatial_statistics.h/cc
paleolimbot Mar 20, 2025
78c4cb2
revert is_set() definition
paleolimbot Mar 20, 2025
ab1a0c0
add clarifying note for is_empty()
paleolimbot Mar 20, 2025
014eb06
remove WKBBuffer::Init()
paleolimbot Mar 20, 2025
c62393f
ReadXXX -> MergeXXX in geospatial_util_internal.h/cc
paleolimbot Mar 20, 2025
a6ebbee
Use structured binding for geometry type ane dimensions
paleolimbot Mar 20, 2025
d8bb4f6
format
paleolimbot Mar 20, 2025
06c13f5
simpler Equals() for GeoStatisitics
paleolimbot Mar 20, 2025
26329ac
add comment clarifying why record_wkb_type is false when recursing in…
paleolimbot Mar 20, 2025
1990275
remove GeoCrsContext from ArrowWriterProperties
paleolimbot Mar 21, 2025
ceffbc9
add comment about why there are no null counts for Geometry/Geography
paleolimbot Mar 21, 2025
edcf971
Better C++ idioms in types.cc
paleolimbot Mar 21, 2025
f728191
Revert checked_cast usage in types.cc
paleolimbot Mar 21, 2025
790b6af
explicit NaN handling
paleolimbot Mar 21, 2025
0b686a7
Include cmath
paleolimbot Mar 21, 2025
adc2b3d
add test for xyz and xym to geospatial_statistics_test
paleolimbot Mar 22, 2025
4008b8f
Ensure LogicalType JSON output is valid for arbitrary crs values
paleolimbot Mar 22, 2025
f0e019e
clean up diff
paleolimbot Mar 22, 2025
fcc7af6
Add big-endian WKB and error check to parameterized test
paleolimbot Mar 22, 2025
f4aedf6
don't define kNaN or kInf in tests for Windows
paleolimbot Mar 22, 2025
6224fd4
Update cpp/src/parquet/test_util.cc
paleolimbot Mar 27, 2025
b76541b
internal_json -> json_internal
paleolimbot Mar 27, 2025
9040c3b
check that regular statistics are missing for geometry output
paleolimbot Mar 27, 2025
2780df0
spacing of includes in geospatial_statistics.cc
paleolimbot Mar 27, 2025
ae4e1b8
no nullptr in geospatial_util_internal.cc
paleolimbot Mar 27, 2025
e2cbc4f
fix error for incomplete coordinate sequence
paleolimbot Mar 27, 2025
32346b9
Update cpp/src/parquet/geospatial_util_internal.cc
paleolimbot Mar 27, 2025
80b2328
Update cpp/src/parquet/test_util.cc
paleolimbot Mar 27, 2025
a060b51
Update cpp/src/parquet/test_util.cc
paleolimbot Mar 27, 2025
0014cf9
Update python/pyarrow/parquet/core.py
paleolimbot Mar 27, 2025
774fe19
Update cpp/src/parquet/metadata.cc
paleolimbot Mar 27, 2025
d15bd2a
Update cpp/src/parquet/geospatial_util_internal.cc
paleolimbot Mar 27, 2025
7657bd3
Update cpp/src/parquet/arrow/schema.cc
paleolimbot Mar 27, 2025
dfe15e7
fix signed/unsigned comparison
paleolimbot Mar 27, 2025
a391819
Move BoundingBox::ToString() out of line
paleolimbot Mar 27, 2025
8415d3b
remove ByteSwap helper and don't predict endian swaps as unlikely
paleolimbot Mar 27, 2025
21e1484
use string_view in geospatial_util_json
paleolimbot Mar 27, 2025
0d06228
use Invalid instead of SerializationError
paleolimbot Mar 27, 2025
13d7435
minimize includes of test_util.h and test_util.cc
paleolimbot Mar 27, 2025
a28b611
handle non-printable characters in LogicalType JSON output
paleolimbot Mar 27, 2025
66cadb8
propagate fix to arrow_extensions_enabled Python documentation
paleolimbot Mar 27, 2025
942a591
fix geostatistics accessors for Python
paleolimbot Mar 27, 2025
d350a39
make sure GeoStatistics constructor throws if called from Python
paleolimbot Mar 27, 2025
801aad0
test geography in reader_test.cc
paleolimbot Mar 28, 2025
b263b16
fix indentation in Python documentation
paleolimbot Mar 28, 2025
124a0f5
Add GeoStatistics::has_dimension() and use it in tests
paleolimbot Mar 28, 2025
6da241b
check geostatistics inequality
paleolimbot Mar 28, 2025
4d5e539
attempt requiring ARROW_JSON
paleolimbot Apr 1, 2025
bb5f9bb
Merge branch 'main' into Kontinuation-kontinuation-parquet-geometry
paleolimbot Apr 1, 2025
7b97725
move geospatial files to geospatial/
paleolimbot Apr 2, 2025
90609c2
make sure geospatial/ headers are installed
paleolimbot Apr 2, 2025
e5163b0
remove conditional ARROW_JSON code
paleolimbot Apr 2, 2025
8c591c9
Use exceptions instead of Status in geospatial/util_internal.h
paleolimbot Apr 2, 2025
1513336
fix unreferenced variable
paleolimbot Apr 2, 2025
3f17099
ensure GeoStatistics are written for the all null case but are empty
paleolimbot Apr 3, 2025
0578ae9
Merge remote-tracking branch 'upstream/main' into Kontinuation-kontin…
paleolimbot Apr 10, 2025
7b4ec55
fix macros
paleolimbot Apr 10, 2025
a14fb07
include array in util_internal
paleolimbot Apr 10, 2025
246d071
Update cpp/src/parquet/column_writer.cc
paleolimbot Apr 11, 2025
cd0b1cc
Update cpp/src/parquet/geospatial/statistics.cc
paleolimbot Apr 11, 2025
80f9e81
Update cpp/src/parquet/geospatial/statistics.h
paleolimbot Apr 11, 2025
b76563d
sort test files ascening in CMakeLists.txt
paleolimbot Apr 11, 2025
28a1dd1
consistently apply the parquet::geometry namespace
paleolimbot Apr 11, 2025
0da26d1
parquet::geometry -> parquet::geospatial
paleolimbot Apr 11, 2025
a7b3c4b
update statistics for completely null and completely empty
paleolimbot Apr 11, 2025
cf4481f
use kMaxDimensions in more places, update pyarrow
paleolimbot Apr 11, 2025
c9c8b9d
Consolidate validity and existence of geostatistics
paleolimbot Apr 11, 2025
85959a4
remove remaining conditional ARROW_JSON logic
paleolimbot Apr 11, 2025
699eddd
add back in trailing whitespace to parquet.thrift from main
paleolimbot Apr 11, 2025
a2ab3b6
add mutex to protect possible_stats_ and possible_geo_stats_ modifica…
paleolimbot Apr 11, 2025
c917d33
fix python build
paleolimbot Apr 12, 2025
6ceca04
don't use encoded geo statistics in Cython
paleolimbot Apr 12, 2025
d11413b
handle bounding box validity and emptiness separately
paleolimbot Apr 17, 2025
a586ca8
fix python build
paleolimbot Apr 17, 2025
11df2d1
Update cpp/src/parquet/geospatial/statistics.cc
paleolimbot Apr 18, 2025
e01ec4e
better names and initializers for EncodedGeoStatistics fields
paleolimbot Apr 18, 2025
9f6baf2
Update cpp/src/parquet/metadata.cc
paleolimbot Apr 18, 2025
ee13896
aquire lock before checking for geo stats == nullptr
paleolimbot Apr 18, 2025
b6b96da
fix clang-format of parquet-types.h generated file
paleolimbot Apr 18, 2025
c386fd2
Differentiate between the bound being present and the a dimension bei…
paleolimbot Apr 18, 2025
7aed73f
undo clang-format of parquet_types.cpp
paleolimbot Apr 18, 2025
4c18e94
Merge remote-tracking branch 'upstream/main' into Kontinuation-kontin…
paleolimbot Apr 21, 2025
d438e3d
add diagnostics for failure
paleolimbot Apr 21, 2025
465308b
fix writing of geoarrow.wkb resulting from merge
paleolimbot Apr 22, 2025
a86feee
Add issue reference, remove unneeded header addition, fix outdated co…
paleolimbot Apr 22, 2025
bdbdcd1
clang-format
paleolimbot Apr 22, 2025
0295268
update CMake to build rapidjson when parquet is turned on
paleolimbot Apr 24, 2025
749fd63
rename writer_calculated_geospatial_types to geospatial_types_present
paleolimbot Apr 24, 2025
0af1de8
Update cpp/src/parquet/types.cc
paleolimbot Apr 24, 2025
e0adea3
Update cpp/src/parquet/geospatial/statistics.h
paleolimbot Apr 24, 2025
e7da4fd
Update cpp/src/parquet/column_writer_test.cc
paleolimbot Apr 24, 2025
32d6e31
Update cpp/src/parquet/geospatial/util_json_internal.cc
paleolimbot Apr 24, 2025
df8adf7
Update cpp/src/parquet/arrow/arrow_schema_test.cc
paleolimbot Apr 24, 2025
96682c8
Update cpp/src/parquet/test_util.h
paleolimbot Apr 24, 2025
7614ff9
Update cpp/src/parquet/thrift_internal.h
paleolimbot Apr 24, 2025
b0381b7
Update cpp/src/parquet/types.cc
paleolimbot Apr 24, 2025
fd7f722
Update cpp/src/parquet/types.cc
paleolimbot Apr 24, 2025
bac47dc
Update cpp/src/parquet/thrift_internal.h
paleolimbot Apr 24, 2025
91b0872
Update cpp/src/parquet/arrow/arrow_schema_test.cc
paleolimbot Apr 24, 2025
9358f90
revert shared_ptr change (doesn't compile because constructor is priv…
paleolimbot Apr 24, 2025
c1c5dc3
one more make_pair
paleolimbot Apr 24, 2025
8c48088
initialize encoded statistics presence flags to false explicitly
paleolimbot Apr 24, 2025
c6ae22b
cleaner bbox init in thrift_initernal.h
paleolimbot Apr 24, 2025
bc57c46
add reader_test with geography + arrow
paleolimbot Apr 24, 2025
4611d4b
add test for extensions enabled = true but without geoarrow extension…
paleolimbot Apr 24, 2025
bea82e4
remove mention of ARROW_JSON
paleolimbot Apr 24, 2025
5b8d654
accept string_view in MakeGeoArrowCrsMetadata()
paleolimbot Apr 24, 2025
5af1471
also sanitize JSON when the value comes from a Parquet file metadata …
paleolimbot Apr 24, 2025
b3261b2
also support "EPSG" + "4326" when 4326 is a string
paleolimbot Apr 24, 2025
e529480
remove stale line in arrow_schema_test.cc
paleolimbot Apr 24, 2025
90202bd
document GeoStatisticsImpl::is_wraparound
paleolimbot Apr 24, 2025
9b51ca3
single-line string representation for GeoStatistics
paleolimbot Apr 24, 2025
0cbe770
add span overload for WKBGeometryBounder::MergeGeometry()
paleolimbot Apr 24, 2025
7a7ceb1
move bounding box stringifier to util_internal.h
paleolimbot Apr 24, 2025
995df92
test BoundingBox operator==
paleolimbot Apr 24, 2025
c38a476
ReadDoubles() -> ReadCoords()
paleolimbot Apr 24, 2025
2b87131
test WKBGeometryBounder with too many bytes
paleolimbot Apr 24, 2025
e2f2044
add more tests for GeoStatistics equality
paleolimbot Apr 24, 2025
516a053
use rapidjson to escape json
paleolimbot Apr 24, 2025
73a47dc
undo format change
paleolimbot Apr 25, 2025
92d655d
clarify comments about emtpiness and validity in statistics.h
paleolimbot Apr 25, 2025
43dd64c
rename GeoStatisticsImpl::Update to Decode and move the Reset() call …
paleolimbot Apr 25, 2025
81567b9
remove geospatial/statistics.h from api/reader.h
paleolimbot Apr 25, 2025
94f1908
cleaner cython method implementations
paleolimbot Apr 25, 2025
759ab2e
document Python GeoStatistics and properties
paleolimbot Apr 25, 2025
4ce6803
Mark X values as invalid when merging wraparound box instead of throw…
paleolimbot Apr 25, 2025
4bca1c0
remove unneded CMake check for ARROW_JSON now that RapidJSON is autom…
paleolimbot Apr 25, 2025
e1b7061
clarify comment regarding dimension_valid()
paleolimbot Apr 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions cpp/cmake_modules/ThirdpartyToolchain.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -374,16 +374,13 @@ target_include_directories(arrow::flatbuffers
# ----------------------------------------------------------------------
# Some EP's require other EP's

if(PARQUET_REQUIRE_ENCRYPTION)
set(ARROW_JSON ON)
endif()

if(ARROW_WITH_OPENTELEMETRY)
set(ARROW_WITH_NLOHMANN_JSON ON)
set(ARROW_WITH_PROTOBUF ON)
endif()

if(ARROW_PARQUET)
set(ARROW_WITH_RAPIDJSON ON)
set(ARROW_WITH_THRIFT ON)
endif()

Expand Down
16 changes: 13 additions & 3 deletions cpp/src/parquet/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,9 @@ set(PARQUET_SRCS
exception.cc
file_reader.cc
file_writer.cc
geospatial/statistics.cc
geospatial/util_internal.cc
geospatial/util_json_internal.cc
level_comparison.cc
level_conversion.cc
metadata.cc
Expand Down Expand Up @@ -260,6 +263,10 @@ endif()
if(NOT PARQUET_MINIMAL_DEPENDENCY)
list(APPEND PARQUET_SHARED_LINK_LIBS arrow_shared)

# Add RapidJSON libraries
list(APPEND PARQUET_SHARED_PRIVATE_LINK_LIBS RapidJSON)
list(APPEND PARQUET_STATIC_LINK_LIBS RapidJSON)

# These are libraries that we will link privately with parquet_shared (as they
# do not need to be linked transitively by other linkers)
list(APPEND PARQUET_SHARED_PRIVATE_LINK_LIBS thrift::thrift)
Expand Down Expand Up @@ -357,6 +364,7 @@ endif()
add_subdirectory(api)
add_subdirectory(arrow)
add_subdirectory(encryption)
add_subdirectory(geospatial)

arrow_install_all_headers("parquet")

Expand All @@ -367,15 +375,17 @@ install(FILES "${CMAKE_CURRENT_BINARY_DIR}/parquet_version.h"

add_parquet_test(internals-test
SOURCES
bloom_filter_test.cc
bloom_filter_reader_test.cc
properties_test.cc
statistics_test.cc
bloom_filter_test.cc
encoding_test.cc
geospatial/statistics_test.cc
geospatial/util_internal_test.cc
metadata_test.cc
page_index_test.cc
properties_test.cc
public_api_test.cc
size_statistics_test.cc
statistics_test.cc
types_test.cc)

set_source_files_properties(public_api_test.cc PROPERTIES SKIP_PRECOMPILE_HEADERS ON
Expand Down
56 changes: 56 additions & 0 deletions cpp/src/parquet/arrow/arrow_reader_writer_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
#include "arrow/scalar.h"
#include "arrow/table.h"
#include "arrow/testing/builder.h"
#include "arrow/testing/extension_type.h"
#include "arrow/testing/gtest_util.h"
#include "arrow/testing/random.h"
#include "arrow/testing/util.h"
Expand Down Expand Up @@ -1481,6 +1482,61 @@ TEST_F(TestJsonParquetIO, JsonExtension) {
this->RoundTripSingleColumn(json_large_array, json_large_array, writer_properties);
}

using TestGeoArrowParquetIO = TestParquetIO<test::GeoArrowWkbExtensionType>;

TEST_F(TestGeoArrowParquetIO, GeoArrowExtension) {
::arrow::ExtensionTypeGuard guard(test::geoarrow_wkb());

// Build a binary WKB array with at least one null value
::arrow::BinaryBuilder builder;

for (int k = 0; k < 10; k++) {
std::string item = test::MakeWKBPoint(
{static_cast<double>(k), static_cast<double>(k + 1)}, false, false);
ASSERT_OK(builder.Append(item));
}
ASSERT_OK(builder.AppendNull());
for (int k = 0; k < 5; k++) {
std::string item = test::MakeWKBPoint(
{static_cast<double>(k), static_cast<double>(k + 1)}, false, false);
ASSERT_OK(builder.Append(item));
}

ASSERT_OK_AND_ASSIGN(const auto binary_array, builder.Finish());
const auto wkb_type = test::geoarrow_wkb_lonlat();
const auto wkb_array = ::arrow::ExtensionType::WrapArray(wkb_type, binary_array);

const auto large_wkb_type = test::geoarrow_wkb_lonlat(::arrow::large_binary());
ASSERT_OK_AND_ASSIGN(const auto large_binary_array,
::arrow::compute::Cast(binary_array, ::arrow::large_binary()));
const auto large_wkb_array =
::arrow::ExtensionType::WrapArray(large_wkb_type, large_binary_array.make_array());

// When the original Arrow schema isn't stored and Arrow extensions are disabled,
// LogicalType::GEOMETRY is read as utf8.
auto writer_properties = default_arrow_writer_properties();
ASSERT_NO_FATAL_FAILURE(
this->RoundTripSingleColumn(wkb_array, binary_array, writer_properties));
ASSERT_NO_FATAL_FAILURE(
this->RoundTripSingleColumn(large_wkb_array, binary_array, writer_properties));

// When the original Arrow schema isn't stored and Arrow extensions are enabled,
// LogicalType::GEOMETRY is read as geoarrow.wkb with binary storage.
::parquet::ArrowReaderProperties reader_properties;
reader_properties.set_arrow_extensions_enabled(true);
ASSERT_NO_FATAL_FAILURE(this->RoundTripSingleColumn(
wkb_array, wkb_array, writer_properties, reader_properties));
ASSERT_NO_FATAL_FAILURE(this->RoundTripSingleColumn(
large_wkb_array, wkb_array, writer_properties, reader_properties));

// When the original Arrow schema is stored, the stored Arrow type is respected.
writer_properties = ::parquet::ArrowWriterProperties::Builder().store_schema()->build();
ASSERT_NO_FATAL_FAILURE(
this->RoundTripSingleColumn(wkb_array, wkb_array, writer_properties));
ASSERT_NO_FATAL_FAILURE(
this->RoundTripSingleColumn(large_wkb_array, large_wkb_array, writer_properties));
}

using TestNullParquetIO = TestParquetIO<::arrow::NullType>;

TEST_F(TestNullParquetIO, NullColumn) {
Expand Down
152 changes: 151 additions & 1 deletion cpp/src/parquet/arrow/arrow_schema_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
#include <vector>

#include "gmock/gmock-matchers.h"
#include "gmock/gmock.h"
#include "gtest/gtest.h"

#include "parquet/arrow/reader.h"
Expand All @@ -35,6 +34,7 @@
#include "arrow/extension/json.h"
#include "arrow/extension/uuid.h"
#include "arrow/ipc/writer.h"
#include "arrow/testing/extension_type.h"
#include "arrow/testing/gtest_util.h"
#include "arrow/type.h"
#include "arrow/util/base64.h"
Expand Down Expand Up @@ -237,6 +237,10 @@ TEST_F(TestConvertParquetSchema, ParquetAnnotatedFields) {
::arrow::int64()},
{"json", LogicalType::JSON(), ParquetType::BYTE_ARRAY, -1, ::arrow::utf8()},
{"bson", LogicalType::BSON(), ParquetType::BYTE_ARRAY, -1, ::arrow::binary()},
{"geometry", LogicalType::Geometry(), ParquetType::BYTE_ARRAY, -1,
::arrow::binary()},
{"geography", LogicalType::Geography(), ParquetType::BYTE_ARRAY, -1,
::arrow::binary()},
{"interval", LogicalType::Interval(), ParquetType::FIXED_LEN_BYTE_ARRAY, 12,
::arrow::fixed_size_binary(12)},
{"uuid", LogicalType::UUID(), ParquetType::FIXED_LEN_BYTE_ARRAY, 16,
Expand Down Expand Up @@ -1090,6 +1094,62 @@ TEST_F(TestConvertParquetSchema, ParquetSchemaArrowUuidExtension) {
}
}

TEST_F(TestConvertParquetSchema, ParquetSchemaGeoArrowExtensions) {
std::vector<NodePtr> parquet_fields;
parquet_fields.push_back(PrimitiveNode::Make("geometry", Repetition::OPTIONAL,
LogicalType::Geometry(),
ParquetType::BYTE_ARRAY));
parquet_fields.push_back(PrimitiveNode::Make("geography", Repetition::OPTIONAL,
LogicalType::Geography(),
ParquetType::BYTE_ARRAY));

{
// Parquet file does not contain Arrow schema.
// By default, both fields should be treated as binary() fields in Arrow.
auto arrow_schema = ::arrow::schema({::arrow::field("geometry", BINARY, true),
::arrow::field("geography", BINARY, true)});
std::shared_ptr<KeyValueMetadata> metadata{};
ASSERT_OK(ConvertSchema(parquet_fields, metadata));
CheckFlatSchema(arrow_schema);
}

{
// Parquet file does not contain Arrow schema.
// If Arrow extensions are enabled and extensions are registered,
// fields will be interpreted as geoarrow_wkb(binary()) extension fields.
::arrow::ExtensionTypeGuard guard(test::geoarrow_wkb());

ArrowReaderProperties props;
props.set_arrow_extensions_enabled(true);
auto arrow_schema = ::arrow::schema(
{::arrow::field(
"geometry",
test::geoarrow_wkb(R"({"crs": "OGC:CRS84", "crs_type": "authority_code"})"),
true),
::arrow::field(
"geography",
test::geoarrow_wkb(
R"({"crs": "OGC:CRS84", "crs_type": "authority_code", "edges": "spherical"})"),
true)});
std::shared_ptr<KeyValueMetadata> metadata{};
ASSERT_OK(ConvertSchema(parquet_fields, metadata, props));
CheckFlatSchema(arrow_schema);
}

{
// Parquet file does not contain Arrow schema.
// If Arrow extensions are enabled and extensions are NOT registered,
// fields will be interpreted as binary().
ArrowReaderProperties props;
props.set_arrow_extensions_enabled(true);
auto arrow_schema = ::arrow::schema({::arrow::field("geometry", BINARY, true),
::arrow::field("geography", BINARY, true)});
std::shared_ptr<KeyValueMetadata> metadata{};
ASSERT_OK(ConvertSchema(parquet_fields, metadata, props));
CheckFlatSchema(arrow_schema);
}
}

class TestConvertArrowSchema : public ::testing::Test {
public:
virtual void SetUp() {}
Expand Down Expand Up @@ -1343,6 +1403,96 @@ TEST_F(TestConvertArrowSchema, ParquetFlatPrimitivesAsDictionaries) {
ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(parquet_fields));
}

TEST_F(TestConvertArrowSchema, ParquetGeoArrowCrsLonLat) {
// All the Arrow Schemas below should convert to the type defaults for GEOMETRY
// and GEOGRAPHY when GeoArrow extension types are registered and the appropriate
// writer option is set.
::arrow::ExtensionTypeGuard guard(test::geoarrow_wkb());

std::vector<NodePtr> parquet_fields;
parquet_fields.push_back(PrimitiveNode::Make("geometry", Repetition::OPTIONAL,
LogicalType::Geometry(),
ParquetType::BYTE_ARRAY));
parquet_fields.push_back(PrimitiveNode::Make("geography", Repetition::OPTIONAL,
LogicalType::Geography(),
ParquetType::BYTE_ARRAY));

// There are several ways that longitude/latitude could be specified when coming from
// GeoArrow, which allows null, missing, arbitrary strings (e.g., Authority:Code), and
// PROJJSON.
std::vector<std::string> geoarrow_lonlat = {
"null", R"("OGC:CRS84")", R"("EPSG:4326")",
// Purely the parts of the PROJJSON that we inspect to check the lon/lat case
R"({"id": {"authority": "OGC", "code": "CRS84"}})",
R"({"id": {"authority": "EPSG", "code": "4326"}})",
R"({"id": {"authority": "EPSG", "code": 4326}})"};

for (const auto& geoarrow_lonlatish_crs : geoarrow_lonlat) {
ARROW_SCOPED_TRACE("crs = ", geoarrow_lonlatish_crs);
std::vector<std::shared_ptr<Field>> arrow_fields = {
::arrow::field("geometry",
test::geoarrow_wkb(R"({"crs": )" + geoarrow_lonlatish_crs + "}"),
true),
::arrow::field("geography",
test::geoarrow_wkb(R"({"crs": )" + geoarrow_lonlatish_crs +
R"(, "edges": "spherical"})"),
true)};

ASSERT_OK(ConvertSchema(arrow_fields));
ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(parquet_fields));
}
}

TEST_F(TestConvertArrowSchema, ParquetGeoArrowCrsSrid) {
// Checks that it is possible to write the srid:xxxx reccomendation from GeoArrow
::arrow::ExtensionTypeGuard guard(test::geoarrow_wkb());

std::vector<NodePtr> parquet_fields;
parquet_fields.push_back(PrimitiveNode::Make("geometry", Repetition::OPTIONAL,
LogicalType::Geometry("srid:1234"),
ParquetType::BYTE_ARRAY));
parquet_fields.push_back(PrimitiveNode::Make("geography", Repetition::OPTIONAL,
LogicalType::Geography("srid:5678"),
ParquetType::BYTE_ARRAY));

std::vector<std::shared_ptr<Field>> arrow_fields = {
::arrow::field("geometry", test::geoarrow_wkb(R"({"crs": "srid:1234"})"), true),
::arrow::field("geography",
test::geoarrow_wkb(R"({"crs": "srid:5678", "edges": "spherical"})"),
true)};

ASSERT_OK(ConvertSchema(arrow_fields));
ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(parquet_fields));
}

TEST_F(TestConvertArrowSchema, ParquetGeoArrowCrsProjjson) {
// Checks the conversion from GeoArrow that contains non-lon/lat PROJJSON
// to Parquet. Almost all GeoArrow types that arrive at the Parquet reader
// will have their CRS expressed in this way.
::arrow::ExtensionTypeGuard guard(test::geoarrow_wkb());

std::vector<std::shared_ptr<Field>> arrow_fields = {
::arrow::field("geometry", test::geoarrow_wkb(R"({"crs": {"key0": "value0"}})"),
true),
::arrow::field(
"geography",
test::geoarrow_wkb(R"({"crs": {"key1": "value1"}, "edges": "spherical"})"),
true)};

auto arrow_properties = default_arrow_writer_properties();
ASSERT_OK(ConvertSchema(arrow_fields, arrow_properties));

std::vector<NodePtr> parquet_fields;
parquet_fields.push_back(PrimitiveNode::Make(
"geometry", Repetition::OPTIONAL, LogicalType::Geometry(R"({"key0":"value0"})"),
ParquetType::BYTE_ARRAY));
parquet_fields.push_back(PrimitiveNode::Make(
"geography", Repetition::OPTIONAL, LogicalType::Geography(R"({"key1":"value1"})"),
ParquetType::BYTE_ARRAY));

ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(parquet_fields));
}

TEST_F(TestConvertArrowSchema, ParquetLists) {
std::vector<NodePtr> parquet_fields;
std::vector<std::shared_ptr<Field>> arrow_fields;
Expand Down
Loading
Loading