ARROW-394: [Integration] Generate tests cases for numeric types, strings, lists, structs #219

wesm · 2016-11-30T18:27:54Z

Automatically generating testing files from Python.

wesm · 2016-11-30T18:33:33Z

@julienledem there is some problem on the Java side causing record batches to be written in the wrong order to the Arrow file. to reproduce, see the file in this gist: https://gist.github.com/wesm/459d9d53983c7eb29df2bd4fa2cc5219

Now run these commands:

java -cp /home/wesm/code/arrow/java/tools/target/arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar org.apache.arrow.tools.Integration -a sample.arrow -j sample.json -c JSON_TO_ARROW

java -cp /home/wesm/code/arrow/java/tools/target/arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar org.apache.arrow.tools.Integration -a sample.arrow -j sample-roundtrip.json -c ARROW_TO_JSON

The sample-roundtrip.json returned has the record batches written in the opposite order:

https://gist.github.com/wesm/3e3403ae09ac97770fc6e5aa9bce95bc

wesm · 2016-11-30T22:07:33Z

@julienledem I'm running into a difference between our implementations for arrays that have no nulls. In the C++, we are writing buffer metadata with 0 length. In Java, even if there are no nulls, a bitmap with all 1's set is being written out. Since I'm randomly generating data in this PR, occasionally it generates vectors without nulls. I would call this a bug but want to confirm that you agree

wesm · 2016-11-30T22:40:37Z

also cc @jacques-n

wesm · 2016-11-30T23:16:26Z

Rebased after ARROW-395. I'm going to add some more data types to the integration tests, but the empty bitmap issue is a blocker to moving forward

wesm · 2016-12-01T16:41:12Z

I created ARROW-398 for this

wesm · 2016-12-05T14:53:58Z

I'll try to take a crack at one of these today if I can. would be great to get this all closed out this week and plan for the 0.2 release

wesm · 2016-12-07T23:14:45Z

I tried to fix ARROW-400 myself but very quickly ran into issues I don't understand -- I commented in the JIRA.

nicely. Add integration tests to Travis CI build matrix. Add ApproxEquals method for floating point comparisons. Add boolean, string, struct, list to generated json test case

wesm · 2016-12-09T18:42:24Z

@julienledem unfortunately, the fixes to the above JIRAs have introduced more issues:

-- Java producing, C++ consuming
Testing with /home/wesm/code/arrow/integration/data/struct_example.json
Testing with /home/wesm/code/arrow/integration/data/simple.json
Testing with /tmp/tmp3pcm6k6f/dae57b606e734452b29c1179f6557fd7.json
Testing with /tmp/tmp3pcm6k6f/7571e15523864e2bbab476457f2cb733.json
Testing with /tmp/tmp3pcm6k6f/6a1cf54c8019480a8eef1fdfe6432a6e.json
Testing with /tmp/tmp3pcm6k6f/ce4f7bcee205441cb26930b894888e9f.json
Testing with /tmp/tmp3pcm6k6f/0480cd10809245f29cc89e226e4f21f3.json
Testing with /tmp/tmp3pcm6k6f/cc9deadcf69041cbb4e090d345b03990.json
Testing with /tmp/tmp3pcm6k6f/5c6577ab9c0e41ad8a688cad662981c8.json
Testing with /tmp/tmp3pcm6k6f/48e38ce2cbf64eb9bbd2a16a2719ae0f.json
Testing with /tmp/tmp3pcm6k6f/2a2aa1fd9a434f97aa2a1090a3352012.json
Testing with /tmp/tmp3pcm6k6f/455b9448600a4183aad2145ff53a9d0e.json
Testing with /tmp/tmp3pcm6k6f/5ee0455f245541d19d2ee88be9df463d.json
-- C++ producing, Java consuming
Testing with /home/wesm/code/arrow/integration/data/struct_example.json
Command failed: java -cp /home/wesm/code/arrow/java/tools/target/arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar org.apache.arrow.tools.Integration -a /tmp/tmp7bv1snbc/7762a6d2179a4934bc20a37a60c968db -j /home/wesm/code/arrow/integration/data/struct_example.json -c VALIDATE
With output:
--------------
13:41:19.100 [main] DEBUG i.n.u.i.l.InternalLoggerFactory - Using SLF4J as the default logging framework
13:41:19.108 [main] DEBUG i.n.util.internal.PlatformDependent0 - java.nio.Buffer.address: available
13:41:19.108 [main] DEBUG i.n.util.internal.PlatformDependent0 - sun.misc.Unsafe.theUnsafe: available
13:41:19.109 [main] DEBUG i.n.util.internal.PlatformDependent0 - sun.misc.Unsafe.copyMemory: available
13:41:19.109 [main] DEBUG i.n.util.internal.PlatformDependent0 - direct buffer constructor: available
13:41:19.109 [main] DEBUG i.n.util.internal.PlatformDependent0 - java.nio.Bits.unaligned: available, true
13:41:19.109 [main] DEBUG i.n.util.internal.PlatformDependent0 - java.nio.DirectByteBuffer.<init>(long, int): available
13:41:19.110 [main] DEBUG io.netty.util.internal.Cleaner0 - java.nio.ByteBuffer.cleaner(): available
13:41:19.110 [main] DEBUG i.n.util.internal.PlatformDependent - Java version: 7
13:41:19.110 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.noUnsafe: false
13:41:19.110 [main] DEBUG i.n.util.internal.PlatformDependent - sun.misc.Unsafe: available
13:41:19.110 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.noJavassist: false
13:41:19.111 [main] DEBUG i.n.util.internal.PlatformDependent - Javassist: unavailable
13:41:19.111 [main] DEBUG i.n.util.internal.PlatformDependent - You don't have Javassist in your class path or you don't have enough permission to load dynamically generated classes.  Please check the configuration for better performance.
13:41:19.111 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.tmpdir: /tmp (java.io.tmpdir)
13:41:19.111 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.bitMode: 64 (sun.arch.data.model)
13:41:19.111 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.noPreferDirect: false
13:41:19.111 [main] DEBUG i.n.util.internal.PlatformDependent - io.netty.maxDirectMemory: 7481065472 bytes
13:41:19.111 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numHeapArenas: 16
13:41:19.111 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numDirectArenas: 16
13:41:19.111 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.pageSize: 8192
13:41:19.111 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxOrder: 11
13:41:19.111 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.chunkSize: 16777216
13:41:19.111 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.tinyCacheSize: 512
13:41:19.111 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.smallCacheSize: 256
13:41:19.112 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.normalCacheSize: 64
13:41:19.112 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedBufferCapacity: 32768
13:41:19.112 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimInterval: 8192
13:41:19.125 [main] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.bytebuf.checkAccessible: true
13:41:19.127 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.level: simple
13:41:19.127 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.maxRecords: 4
13:41:19.131 [main] DEBUG i.n.util.ResourceLeakDetectorFactory - Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@73dfc95e
13:41:19.292 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Footer starts at 1408, length: 408
13:41:19.298 [main] DEBUG org.apache.arrow.tools.Integration - Arrow Input file size: 1826
13:41:19.299 [main] DEBUG org.apache.arrow.tools.Integration - ARROW schema: Schema<struct_nullable: Struct<f1: Int(32, true), f2: Utf8>>
13:41:19.299 [main] DEBUG org.apache.arrow.tools.Integration - JSON Input file size: 4846
13:41:19.299 [main] DEBUG org.apache.arrow.tools.Integration - JSON schema: Schema<struct_nullable: Struct<f1: Int(32, true), f2: Utf8>>
13:41:19.330 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxCapacity.default: 32768
13:41:19.330 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxSharedCapacityFactor: 2
13:41:19.330 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.linkCapacity: 16
13:41:19.330 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.ratio: 8
13:41:19.332 [main] DEBUG i.n.util.internal.PlatformDependent - org.jctools-core.MpscChunkedArrayQueue: available
13:41:19.343 [main] DEBUG o.a.arrow.vector.file.ArrowReader - RecordBatch at 64, metadata: 256, body: 384
13:41:19.343 [main] DEBUG o.a.arrow.vector.file.ArrowReader - allocated buffer ArrowBuf[9], udle: [9 0..1024]
13:41:19.344 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 0, length: 64
13:41:19.344 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 64, length: 64
13:41:19.344 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 128, length: 64
13:41:19.344 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 192, length: 64
13:41:19.344 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 256, length: 64
13:41:19.344 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 320, length: 64
13:41:19.345 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 64
13:41:19.345 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 64
13:41:19.345 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 128, length: 64
13:41:19.345 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 192, length: 64
13:41:19.345 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 256, length: 64
13:41:19.345 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 320, length: 64
13:41:19.345 [main] DEBUG o.a.arrow.vector.file.ArrowReader - released buffer ArrowBuf[9], udle: [9 0..1024]
13:41:19.352 [main] DEBUG o.a.arrow.vector.file.ArrowReader - RecordBatch at 704, metadata: 256, body: 448
13:41:19.352 [main] DEBUG o.a.arrow.vector.file.ArrowReader - allocated buffer ArrowBuf[31], udle: [17 0..1024]
13:41:19.352 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 0, length: 64
13:41:19.352 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 64, length: 64
13:41:19.352 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 128, length: 64
13:41:19.353 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 192, length: 64
13:41:19.353 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 256, length: 128
13:41:19.353 [main] DEBUG o.a.arrow.vector.file.ArrowReader - Buffer in RecordBatch at 384, length: 64
13:41:19.353 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 64
13:41:19.353 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 64
13:41:19.353 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 128, length: 64
13:41:19.353 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 192, length: 64
13:41:19.353 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 256, length: 128
13:41:19.353 [main] DEBUG o.a.a.vector.schema.ArrowRecordBatch - Buffer in RecordBatch at 384, length: 64
13:41:19.353 [main] DEBUG o.a.arrow.vector.file.ArrowReader - released buffer ArrowBuf[31], udle: [17 0..1024]
Incompatible files
Could not load buffers for field f2: Utf8. error message: Buffer too large to resize to 44: 128
13:41:19.356 [main] ERROR org.apache.arrow.tools.Integration - Incompatible files
java.lang.IllegalArgumentException: Could not load buffers for field f2: Utf8. error message: Buffer too large to resize to 44: 128
	at org.apache.arrow.vector.VectorLoader.loadBuffers(VectorLoader.java:84) ~[arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.vector.VectorLoader.loadBuffers(VectorLoader.java:94) ~[arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:63) ~[arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.tools.Integration$Command$3.execute(Integration.java:156) ~[arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.tools.Integration.run(Integration.java:212) ~[arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.tools.Integration.main(Integration.java:61) ~[arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar:na]
Caused by: java.lang.IllegalArgumentException: Buffer too large to resize to 44: 128
	at org.apache.arrow.vector.BaseDataValueVector.truncateBufferBasedOnSize(BaseDataValueVector.java:55) ~[arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.vector.NullableVarCharVector.loadFieldBuffers(NullableVarCharVector.java:142) ~[arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.vector.VectorLoader.loadBuffers(VectorLoader.java:82) ~[arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar:na]
	... 5 common frames omitted

--------------
Traceback (most recent call last):
  File "integration_test.py", line 667, in <module>
    run_all_tests(debug=args.debug)
  File "integration_test.py", line 657, in run_all_tests
    runner.run()
  File "integration_test.py", line 564, in run
    consumer.validate(json_path, arrow_path)
  File "integration_test.py", line 607, in validate
    return self._run(arrow_path, json_path, 'VALIDATE')
  File "integration_test.py", line 604, in _run
    return run_cmd(cmd)
  File "integration_test.py", line 78, in run_cmd
    raise e
  File "integration_test.py", line 70, in run_cmd
    output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/home/wesm/anaconda3/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/home/wesm/anaconda3/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-cp', '/home/wesm/code/arrow/java/tools/target/arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar', 'org.apache.arrow.tools.Integration', '-a', '/tmp/tmp7bv1snbc/7762a6d2179a4934bc20a37a60c968db', '-j', '/home/wesm/code/arrow/integration/data/struct_example.json', '-c', 'VALIDATE']' returned non-zero exit status 1

I'll create a JIRA with the offending Arrow file

Change-Id: I751ca9ef598087a1e32bcb20f20d0b1da6ff3515

wesm · 2016-12-09T19:40:34Z

Removing the max buffer padding check, the integration test suite passes locally for me:

$ python integration_test.py
-- Java producing, C++ consuming
Testing with /home/wesm/code/arrow/integration/data/struct_example.json
Testing with /home/wesm/code/arrow/integration/data/simple.json
Testing with /tmp/tmpqtpj5ole/0ea11fef218c49e4800942847eae40e4.json
Testing with /tmp/tmpqtpj5ole/b83aa62969d54107a15a4a9e04fb16b1.json
Testing with /tmp/tmpqtpj5ole/5868b1813dcb49ada7c9b50b2a9e437a.json
Testing with /tmp/tmpqtpj5ole/10af29c62bd8454b93977d62a2b19d1c.json
Testing with /tmp/tmpqtpj5ole/11ee04e8bdae4d4bafc6aeef199fd94f.json
Testing with /tmp/tmpqtpj5ole/04dbb647812e461aa00d20c013a6fe10.json
Testing with /tmp/tmpqtpj5ole/15fc3fecb37a42da9c313b5fddf33c04.json
Testing with /tmp/tmpqtpj5ole/9e304d6c653245a5bc329d620c060dba.json
Testing with /tmp/tmpqtpj5ole/85ecca29f44b4a0c977e8294b2de4a49.json
Testing with /tmp/tmpqtpj5ole/95d179be346348179e5b0403ab7dd8f7.json
Testing with /tmp/tmpqtpj5ole/3b1a276c292142158b20d34f064d3ec7.json
-- C++ producing, Java consuming
Testing with /home/wesm/code/arrow/integration/data/struct_example.json
Testing with /home/wesm/code/arrow/integration/data/simple.json
Testing with /tmp/tmpqtpj5ole/0ea11fef218c49e4800942847eae40e4.json
Testing with /tmp/tmpqtpj5ole/b83aa62969d54107a15a4a9e04fb16b1.json
Testing with /tmp/tmpqtpj5ole/5868b1813dcb49ada7c9b50b2a9e437a.json
Testing with /tmp/tmpqtpj5ole/10af29c62bd8454b93977d62a2b19d1c.json
Testing with /tmp/tmpqtpj5ole/11ee04e8bdae4d4bafc6aeef199fd94f.json
Testing with /tmp/tmpqtpj5ole/04dbb647812e461aa00d20c013a6fe10.json
Testing with /tmp/tmpqtpj5ole/15fc3fecb37a42da9c313b5fddf33c04.json
Testing with /tmp/tmpqtpj5ole/9e304d6c653245a5bc329d620c060dba.json
Testing with /tmp/tmpqtpj5ole/85ecca29f44b4a0c977e8294b2de4a49.json
Testing with /tmp/tmpqtpj5ole/95d179be346348179e5b0403ab7dd8f7.json
Testing with /tmp/tmpqtpj5ole/3b1a276c292142158b20d34f064d3ec7.json
-- All tests passed!

let's see if i can get a green build on Travis CI

Change-Id: I061b5446f9f61a463a887b2737b0e143ee064c14

wesm · 2016-12-09T21:41:31Z

@xhochy this is all working properly finally. would you mind giving a quick review?

Change-Id: I38063b7e20bc63777b86b1a29090475d8d8037e3

Change-Id: I186b230da232b938821755f8ec9c909be3152876

Change-Id: I19064bedc9d5e47defde1071ae7980ba98895ffe

wesm · 2016-12-10T00:49:48Z

Green build on my fork: https://travis-ci.org/wesm/arrow/builds/182761224. I'm going to merge and we can chase up more integration test improvements in subsequent JIRAs.

xhochy

LGTM

…and metadata args I also slightly refactored the test suite to use OpenFile rather than using the `ParquetFileReader` ctor directly (`OpenFile` wasn't being used in the test suite). Needed for ARROW-471 Author: Wes McKinney <[email protected]> Closes apache#219 from wesm/PARQUET-830 and squashes the following commits: bd17192 [Wes McKinney] Add parquet::arrow::OpenFile with additional properties and metadata arguments Change-Id: Ib00d04a9284b2108377a9ffac9faf8514b9e46cf

* fix: `binary` overlap * fix: Simplify list constructors, `_Ordered` * refactor: Use `_Tz` default

* Initial commit * init project * complete most of the annotations * fix FixedSizeBufferWriter init annotation * bump 10.0.1.2 * complete parquet core annotations * bump 10.0.1.3 * re-export modules * fix: add return type for foreign_buffer * fix output_stream and read_message annotations * ci: add release job * pre-commit specify flake8 version to 5.0.4 * flake8 ignore F821 for private files * optimize annotations * bump 10.0.1.4 * if param supports IOBase, it should also support NativeFile * bump 10.0.1.5 * pre-commit adds mypy lint * bump 10.0.1.6 * fix ci name * Remove version restrictions for Python. * release 10.0.1.7 * update poetry ci * Fix stubs for Table factory methods The main problem was that these were annotated as instance methods rather than static/class methods, but I've added some detail, too. * update pre-commit * update * fix: make fs.FileSystem.from_uri and hdfs.HadoopFileSystem.from_uri as classmethod * fix: fix read_metadata and read_schema wrong annotations (#11) * fix: typo S3FileSystem schema -> scheme (#12) * bump version 10.0.1.8 (#13) * . (#16) * make DataType hashable (#22) * pa.table support recordbatch (#20) * RecordBatchStreamReader supports next (#18) * add RecordBatch.to_pylist (#23) * precise return types for to_pandas (#25) * bump version 10.0.1.9 (#26) * [pre-commit.ci] pre-commit autoupdate (#27) * [pre-commit.ci] pre-commit autoupdate (#28) * Fix types in FlightDescriptor class (#29) * Fix types in FlightDescriptor class * Add argument types * chore: update pre-commit config (#30) * build: use `pixi` to manage project (#31) * chore: add taplo config (#32) * chore: update LICENSE date (#33) * doc: add CODE_OF_CONDUCT.md (#34) * [pre-commit.ci] pre-commit autoupdate (#38) * [pre-commit.ci] pre-commit autoupdate (#39) updates: - [github.com/astral-sh/ruff-pre-commit: v0.5.7 → v0.6.1](astral-sh/ruff-pre-commit@v0.5.7...v0.6.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (apache#48) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.1 → v0.6.2](astral-sh/ruff-pre-commit@v0.6.1...v0.6.2) - [github.com/pre-commit/mirrors-mypy: v1.11.1 → v1.11.2](pre-commit/mirrors-mypy@v1.11.1...v1.11.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * refactor: rewrite type annotations by hand. (#35) * chore: restart * update ruff config * build: add extra dependencies * update mypy config * feat: add util.pyi * feat: add types.pyi * feat: impl lib.pyi * update * feat: add acero.pyi * feat: add compute.pyi * add benchmark.pyi * add cffi * feat: add csv.pyi * disable isort single line * reformat * update compute.pyi * add _auzurefs.pyi * add _cuda.pyi * add _dataset.pyi * rename _stub_typing.pyi -> _stubs_typing.pyi * add _dataset_orc.pyi * add pyarrow-stubs/_dataset_parquet_encryption.pyi * add _dataset_parquet.pyi * add _feather.pyi * feat: add _flight.pyi * add _fs.pyi * add _gcsfs.pyi * add _hdfs.pyi * add _json.pyi * add _orc.pyi * add _parquet_encryption.pyi * add _parquet.pyi * update * add _parquet.pyi * add _s3fs.pyi * add _substrait.pyi * update * update * add parquet/core.pyi * add parquet/encryption.pyi * add BufferProtocol * impl _filesystemdataset_write * add dataset.pyi * add feather.pyi * add flight.pyi * add fs.pyi * add gandiva.pyi * add json.pyi * add orc.pyi * add pandas_compat.pyi * add substrait.pyi * update util.pyi * add interchange * add __lib_pxi * update __lib_pxi * update * update * add types.pyi * feat: add scalar.pyi * update types.pyi * update types.pyi * update scalar.pyi * update * update * update * update * update * update * feat: impl array * feat: add builder.pyi * add scipy * add tensor.pyi * feat: impl NativeFile * update io.pyi * complete io.pyi * add ipc.pyi * mv benchmark.pyi into __lib_pxi * add table.pyi * do re-export in lib.pyi * fix io.pyi * update * optimize scalar.pyi * optimize indices * complete ipc.pyi * update * fix NullableIterable * fix string array * ignore overload-overlap error * fix _Tabular.__getitem__ * remove additional_dependencies * remove check-mypy.sh (apache#49) * release 20240828 (apache#50) * fix release tag (apache#51) * ci: install hatch by pip (apache#52) * ci: fix hatch keyring (apache#53) * ci: use Release environment (apache#54) * remove Scalar generic type var _IsValid (apache#56) * remove Scalar generic type var _IsValid * make Array, Scalar, Types generic type var as covariant type (apache#57) * remove Field generic type var _Nullable (apache#58) * remove Field generic type var _Nullable * fix: pa.dictionary and pa.schema annotation (apache#59) * fix pa.dictionary annotation * fix: schema annotation * release new version (apache#60) * [pre-commit.ci] pre-commit autoupdate (apache#62) * release: 2024.9.3 (apache#63) use new date release format %Y.%m.%d * support pyarrow compute funcs (apache#61) * update compute.pyi * impl Aggregation funcs * impl arithmetic * imit bit-wise functions * imit rounding functions * optimize annotation * impl logarithmic functions * update * impl comparisons funcs * impl logical funcs * impl string predicates and transforms * impl string padding * impl string trimming * impl string splitting and component extraction * impl string joining and slicing * impl Containment tests * impl Categorizations * impl Structural transforms * impl Conversions * impl Temporal component extraction * impl random, Timezone handling * impl Array-wise functions * fix timestamp scalar * support build array with list of scalar (apache#64) * release 2024.9.4 (apache#65) * Version follows the version of pyarrow (apache#66) * import parquet.core into parquet __init__.py (apache#67) Update __init__.pyi * release 17.1 (apache#69) * fix: add missing submodule benchmark, csv and cuda (apache#71) * release 17.2 (apache#72) * fix: from_pylist covariance (apache#73) * [pre-commit.ci] pre-commit autoupdate (apache#74) * Fix return type for middleware factory's start_call (apache#75) It can return None if middleware is not needed for a given call. * release 17.3 (apache#76) * fix: add missing return type in FlightDescriptor static methods (apache#80) * Support Tabular filter with Expression (apache#81) support Tabular filter with Expression * Support compute functions to accept Expression as parameter (apache#82) * fix: Fix the return value of Expression comparison (apache#83) * release 17.4 (apache#84) * fix: fix the array return type (apache#89) * a few type improvements, mostly flight related (apache#90) * FlightError.extra_info -> bytes * annotate FlightStreamReader.cancel return * BasicAuth serialize/deserialize * RecordBatchFileReader.schema * actually str | bytes * add_type_to_Field (apache#87) * add_type_to_Field * Field.type should return the covariant DataType --------- Co-authored-by: ZhengYu, Xu <[email protected]> * Support fsspec.AbstractFileSystem (apache#88) * supported_filesystem * fixes * remove unused import --------- Co-authored-by: ZhengYu, Xu <[email protected]> * release 17.5 (apache#91) * [pre-commit.ci] pre-commit autoupdate (apache#95) * fix: parquet not accepting NativeFile (apache#98) * feat: support pa.Buffer buffer protocol (apache#99) * feat: Support `compute` functions to accept ChunkedArray. (apache#100) * release 17.6 (apache#101) * [pre-commit.ci] pre-commit autoupdate (apache#102) * working towards making return signatures only have one type (mean and exp) (apache#105) * group_by_returns_TableGroupBy * return_single_type_for_mean_exp * revert table.pyi * compute.mean does not support BinaryScalar or BinaryArray --------- Co-authored-by: ZhengYu, Xu <[email protected]> * a table group_by was returing Self but should return TableGroupBy (apache#104) group_by_returns_TableGroupBy * [pre-commit.ci] pre-commit autoupdate (apache#106) updates: - [github.com/pre-commit/pre-commit-hooks: v4.6.0 → v5.0.0](pre-commit/pre-commit-hooks@v4.6.0...v5.0.0) - [github.com/astral-sh/ruff-pre-commit: v0.6.7 → v0.6.9](astral-sh/ruff-pre-commit@v0.6.7...v0.6.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: RecordBatch missing `from_arrays` and `from_pandas` (apache#108) * release 17.7 (apache#109) * fix_combine_chunks (apache#110) * make Self backward compatible (apache#115) * fix: update ConvertOptions (apache#114) * add type property to Array (apache#112) * add type property to Array * Array.type should return covariant --------- Co-authored-by: ZhengYu, Xu <[email protected]> * release 17.8 (apache#117) * Add include_columns parameter in ConvertOptions (apache#118) * add list[str] overload to rename_columns (apache#119) * release 17.9 (apache#120) * [pre-commit.ci] pre-commit autoupdate (apache#124) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.9 → v0.7.0](astral-sh/ruff-pre-commit@v0.6.9...v0.7.0) - [github.com/pre-commit/mirrors-mypy: v1.11.2 → v1.12.1](pre-commit/mirrors-mypy@v1.11.2...v1.12.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve type annotations for parquet writer (apache#125) Add support for per-field compression specification Add missing none compression value. * Add missing return type for Schema.serialize (apache#123) * Add `Schema.field(int)` (apache#122) * Change various io related functions to support `StrPath` as a path input (apache#121) * Change various io related functions to support StrPath as a path input * fmt * Added StrPath | IO for feather types * fix type hint for sort_by (apache#130) sort_by takes str or list[tuple(name, order)] as its argument where str is a field name not a sort order * metadata on a schema can be passed as str (apache#128) For details see https://github.com/apache/arrow/blob/apache-arrow-17.0.0/python/pyarrow/types.pxi\#L2053-L2056 * Correct typevars for DictionaryType, MapType, RunEncodedType (apache#126) Correct type hints for Dictionary, RunEndEncoded and Map Signed-off-by: Jonas Dedden <[email protected]> Co-authored-by: ZhengYu, Xu <[email protected]> * Add some more StrPath io parts that were overlooked. (apache#131) * Add some more StrPath io parts that were overlooked. Additionally, add the utility typealias `SingleOrList` that can be used in places where we want a concise type declaration but the there is a large union of types. * write_dataset(base_dir = ) can also take Path * Support ChunkedArray in add/append methods in Table (apache#129) * Add missing partitioning typing case (apache#132) This should now support the examples in the docstring for partitioning. * fix: typo 'permissive' instead of 'premissive' (apache#133) * release 17.10 (apache#134) * fix incorrect type hints for compute.sort_indices (apache#135) * disallow passing `names` as an argument to table when using dictionaries (apache#137) * [pre-commit.ci] pre-commit autoupdate (apache#138) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.0 → v0.7.1](astral-sh/ruff-pre-commit@v0.7.0...v0.7.1) - [github.com/pre-commit/mirrors-mypy: v1.12.1 → v1.13.0](pre-commit/mirrors-mypy@v1.12.1...v1.13.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add missing type for FlightEndpoint (apache#136) * release 17.11 (apache#139) * [pre-commit.ci] pre-commit autoupdate (apache#140) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.1 → v0.7.2](astral-sh/ruff-pre-commit@v0.7.1...v0.7.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (apache#142) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.2 → v0.7.3](astral-sh/ruff-pre-commit@v0.7.2...v0.7.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * chore: Create FUNDING.yml (apache#143) Create FUNDING.yml * fix: `read_schema` should return Schema (apache#145) fix: read_schema should return Schema * release 17.12 (apache#146) * [pre-commit.ci] pre-commit autoupdate (apache#147) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.3 → v0.7.4](astral-sh/ruff-pre-commit@v0.7.3...v0.7.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: `to_table` argument `columns` can be a dict of expressions (apache#149) * [pre-commit.ci] pre-commit autoupdate (apache#148) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.4 → v0.8.1](astral-sh/ruff-pre-commit@v0.7.4...v0.8.1) * ruff: ignore PYI063 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ZhengYu, Xu <[email protected]> * release 17.13 (apache#151) * fix: FileSystem metadata value should be str (apache#152) * fix: FileSystemHandler metadata value should be str (apache#153) * [pre-commit.ci] pre-commit autoupdate (apache#154) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.1 → v0.8.2](astral-sh/ruff-pre-commit@v0.8.1...v0.8.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve coverage for pyarrow.struct typehint (apache#157) * fix: ipc typing (apache#159) * release 17.14 (apache#160) * fix: add missing param 'nbytes' to NativeFile.read (apache#163) * release 17.15 (apache#164) * [pre-commit.ci] pre-commit autoupdate (apache#161) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.2 → v0.8.3](astral-sh/ruff-pre-commit@v0.8.2...v0.8.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add 'None' as a valid argument for partitioning to the various parquet reading functions (apache#166) * [pre-commit.ci] pre-commit autoupdate (apache#165) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](astral-sh/ruff-pre-commit@v0.8.3...v0.8.6) - [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](pre-commit/mirrors-mypy@v1.13.0...v1.14.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: should use Collection[Array] instead list[Array] (apache#170) "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance Consider using "Sequence" instead, which is covariant * fix: update type hints for path_or_paths and source parameters in ParquetDataset and read_table (apache#171) * [pre-commit.ci] pre-commit autoupdate (apache#167) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.6 → v0.9.1](astral-sh/ruff-pre-commit@v0.8.6...v0.9.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 17.16 (apache#172) * Fixed pa.fixed_shape_tensor (apache#175) * [pre-commit.ci] pre-commit autoupdate (apache#173) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.1 → v0.9.4](astral-sh/ruff-pre-commit@v0.9.1...v0.9.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: Preserve generic in `ChunkedArray.type` (apache#177) * release 17.17 (apache#178) * [pre-commit.ci] pre-commit autoupdate (apache#176) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.4 → v0.9.6](astral-sh/ruff-pre-commit@v0.9.4...v0.9.6) - [github.com/pre-commit/mirrors-mypy: v1.14.1 → v1.15.0](pre-commit/mirrors-mypy@v1.14.1...v1.15.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: support to construct ListArray with primitive type (apache#179) * fix: Avoid `chunked_array` overlapping overloads (apache#183) * fix: Add placeholder annotations to `pc.if_else` (apache#182) * fix: Widen `Array` to `Array | ChunkedArray` (apache#181) * fix: add `pc.fill_null` (apache#185) - https://arrow.apache.org/docs/python/generated/pyarrow.compute.fill_null.html - https://github.com/narwhals-dev/narwhals/blob/05e47b27ebe27b24196cee5956d07748d65a62ee/narwhals/_arrow/series.py#L675 * fix: Allow Table.from_arrays to take a list containing a mix of Array and ChunkedArray (apache#187) Update table.pyi * release 17.18 (apache#188) * [pre-commit.ci] pre-commit autoupdate (apache#180) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.6 → v0.9.10](astral-sh/ruff-pre-commit@v0.9.6...v0.9.10) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: from_arrays for both Table and RecordBatch (apache#189) * fix: resolve some `pa.compute` overlaps (apache#184) * fix: resolve overlapping `compute.(add|divide)` * fix: copy from non-cloned signature * fix: resolve overlapping `compute.exp` * fix: resolve overlapping `compute.power` * fix: resolve overlapping `compute.equal` * fix: resolve overlapping `compute.and_` * fix: Include `Array` in `chunked_array` overload (apache#190) narwhals-dev/narwhals@0237f7a * release 17.19 (apache#191) * Add Scalar, Array and Type classes for Json & Uuid (apache#194) * Add Scalar, Array and Type classes for Json & Uuid * Formatting fixes * [pre-commit.ci] pre-commit autoupdate (apache#192) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.10 → v0.11.2](astral-sh/ruff-pre-commit@v0.9.10...v0.11.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Revert "Add Scalar, Array and Type classes for Json & Uuid" (apache#195) Revert "Add Scalar, Array and Type classes for Json & Uuid (apache#194)" This reverts commit 8f77909. * fix: Add missing `pc.equal` overload (apache#196) * feat: support pyarrow 19.0 (apache#198) * build: upgrade pyarrow min version to 19.0 * feat: support pyarrow 19.0 * omit mypy bool8 override error * fix: reexport new types (apache#199) * feat: override new patterns for func repeat and nulls (apache#200) * fix: reexport decimal64 array and decimal128 array * feat: override new patterns for func `repeat` and `nulls` * release: 19.1 (apache#201) * fix: Allow `Iterable[Table]` in `concat_tables` (apache#203) https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html > tables : iterable of pyarrow.Table objects * fix: Allow `ChunkedArray[BooleanScalar]` in `pc.invert` (apache#204) Fixes https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L298-L299 * feat: Fully spec `TableGroupBy.aggregate` (apache#197) ## Related - https://arrow.apache.org/docs/python/compute.html#grouped-aggregations - https://arrow.apache.org/docs/python/generated/pyarrow.TableGroupBy.html#pyarrow.TableGroupBy.aggregate - https://github.com/apache/arrow/blob/34a984c842db42b409a1359e6e2cf167a2365a48/python/pyarrow/table.pxi#L6578-L6604 * fix: Add missing return type to `ChunkedArray.filter` (apache#205) * fix: Add relaxed final overload to logical functions (apache#206) Covers all of `pc.(and_ | and_kleene | and_not | and_not_kleene | or_ | or_kleene | xor)` Resolves: - https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L219-L233 - https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L662 * fix: Allow `ChunkedArray` in `Table.set_column` (apache#211) Also being more consistent with `ArrayOrChunkedArray[Any]` everywhere Discovered in - https://github.com/vega/vega-datasets/blob/343b7101391a81190ba24e1e8d62a381d2fef3bd/scripts/species.py#L798-L799 * chore: Ignore `fsspec` `[import-untyped]` (apache#210) ```py _fs.pyi:18: error: Skipping analyzing "fsspec": module is installed, but missing library stubs or py.typed marker [import-untyped] _fs.pyi:18: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports Found 1 error in 1 file (checked 64 source files) ``` - fsspec/filesystem_spec#625 - fsspec/filesystem_spec#1676 * feat: Convert `types.is_*` into `TypeIs` guards (apache#215) * chore: Add `types.__all__` * feat: Convert `types._is_*` into `TypeIs` guards I've been using this for a little while, but makes more sense to live in the stubs https://github.com/narwhals-dev/narwhals/blob/16427440e6d74939c403083b52ce3fb0af7d63c7/narwhals/_arrow/utils.py#L44-L67 * fix: Resolve `bit_wise_and` overlaps (apache#214) Fixes 3 errors: ```py compute.pyi:608:5 - error: Overload 1 for "bit_wise_and" overlaps overload 4 and returns an incompatible type (reportOverlappingOverload) compute.pyi:608:5 - error: Overload 1 for "bit_wise_and" overlaps overload 5 and returns an incompatible type (reportOverlappingOverload) compute.pyi:620:5 - error: Overload 3 for "bit_wise_and" will never be used because its parameters overlap overload 1 (reportOverlappingOverload) ``` * fix: Resolve `list_*` overlapping overloads (apache#213) * fix: Resolve `list_value_length` overlaps * fix: Resolve `list_element` overlaps * fix: Resolve `list_(flatten|slice|parent_indices)` overlaps An improvement, but still not that accurate * fix: Include `VarianceOptions` in `TableGroupBy.aggregate` (apache#212) - Follow-up to apache#197 - Noticed while writing up (narwhals-dev/narwhals#2385) - We already use it for `std`, `var` in https://github.com/narwhals-dev/narwhals/blob/16427440e6d74939c403083b52ce3fb0af7d63c7/narwhals/_arrow/group_by.py#L81-L82 * [pre-commit.ci] pre-commit autoupdate (apache#202) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.2 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.2...v0.11.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: Resolve `Scalar.as_py` warnings for `DictionaryType` (apache#207) > scalar.pyi:75:20 - warning: TypeVar "_AsPyTypeK" appears only once in generic function signature > Use "object" instead (reportInvalidTypeVarUse) > scalar.pyi:85:20 - warning: TypeVar "_AsPyTypeK" appears only once in generic function signature > Use "object" instead (reportInvalidTypeVarUse) Instead just using `int`, which should be all that is possible from: https://github.com/zen-xu/pyarrow-stubs/blob/02552b81161d19d4aa71d8656b028eefac84612b/pyarrow-stubs/__lib_pxi/types.pyi#L154-L164 https://github.com/zen-xu/pyarrow-stubs/blob/02552b81161d19d4aa71d8656b028eefac84612b/pyarrow-stubs/__lib_pxi/types.pyi#L63-L70 * fix: Add default to `pc.sort_indices` (apache#216) * fix: Add default to `pc.sort_indices` Fixes narwhals-dev/narwhals#2390 (comment) Default is specified in https://arrow.apache.org/docs/python/generated/pyarrow.compute.sort_indices.html * refactor: Reuse some aliases * fix: Allow `list_size` with `Field` in `pa.list_` (apache#218) Closes apache#217 * allow `Table` or `RecordBatch` for dataset (apache#222) allow source argument pyarrow.dataset.dataset() to be RecordBatch | Table * refactor: Simplify `types` overloads (apache#219) * fix: `binary` overlap * fix: Simplify list constructors, `_Ordered` * refactor: Use `_Tz` default * fix: iter ChunkedArray should return scalar value (apache#224) * release: 19.2 (apache#225) * fix: Add missing `DictionaryArray` methods/properties (apache#226) ## Docs - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.indices - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary_decode - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary_encode ## Fixes - https://github.com/narwhals-dev/narwhals/blob/c23e56c56630761f0fbc58b575a1c987e57d58d5/narwhals/_arrow/series.py#L787-L798 - https://github.com/narwhals-dev/narwhals/blob/c23e56c56630761f0fbc58b575a1c987e57d58d5/narwhals/_arrow/series_cat.py#L14-L18 * chore: use pyright as static type checker (apache#227) * use pyright as static type checker * make pyright happy * fix: fix pyright action (apache#229) fix github ci * fix: Match runtime behavior of `(Table|RecordBatch).select` (apache#221) * fix: Match runtime behavior of `(Table|RecordBatch).select` ## Resolves - https://github.com/MarcoGorelli/narwhals/blob/5b02b592183b8d39e2d32e0aedd6c234bb22d405/narwhals/_arrow/dataframe.py#L305-L307 - https://github.com/MarcoGorelli/narwhals/blob/5b02b592183b8d39e2d32e0aedd6c234bb22d405/narwhals/_arrow/dataframe.py#L285-L294 ##Description Following up on what I thought was a simple stub issue, but we're both *too strict* and *too permissive* in different ways ##Examples {placeholder} ##Related - https://github.com/apache/arrow/blob/d2ddee62329eb711572b4d71d6380673d7f7edd1/python/pyarrow/table.pxi#L4367-L4374 - https://github.com/apache/arrow/blob/d2ddee62329eb711572b4d71d6380673d7f7edd1/python/pyarrow/table.pxi#L1721-L1739 * update select * update select --------- Co-authored-by: ZhengYu, Xu <[email protected]> * [pre-commit.ci] pre-commit autoupdate (apache#220) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.5 → v0.11.8](astral-sh/ruff-pre-commit@v0.11.5...v0.11.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: narrow scalar when type is given (apache#230) * rename Uint -> UInt * feat: narrow scalar when type is given * release 19.3 (apache#231) * chore: pyright use strict mode (apache#233) * fix types * update array.pyi * update scalar.pyi * update * update array * update array * optimize chunked_array * optimizer iterchunks * update * update pyproject.toml * fix: pa.nulls accept type rather than types (apache#234) * [pre-commit.ci] pre-commit autoupdate (apache#232) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.8 → v0.11.9](astral-sh/ruff-pre-commit@v0.11.8...v0.11.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 19.4 (apache#235) * lint(pyright): disable reportUnknownMemberType (apache#239) * [pre-commit.ci] pre-commit autoupdate (apache#236) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.9 → v0.11.13](astral-sh/ruff-pre-commit@v0.11.9...v0.11.13) - [github.com/RobertCraigie/pyright-python: v1.1.400 → v1.1.401](RobertCraigie/pyright-python@v1.1.400...v1.1.401) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: support pyarrow 20.0 (apache#240) * [pre-commit.ci] pre-commit autoupdate (apache#241) updates: - [github.com/RobertCraigie/pyright-python: v1.1.401 → v1.1.402](RobertCraigie/pyright-python@v1.1.401...v1.1.402) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * support docstring (apache#242) * doc: complete tensor doc * doc: complete table doc * doc: complete scalar doc * doc: complete orc doc * doc: complete memory doc * doc: complete lib doc * doc: complete json doc * doc: complete hdfs doc * doc: complete gcsfs doc * doc: complete fs doc * doc: complete flight doc * doc: complete dataset doc * doc: complete dataset parquet doc * doc: complete dataset parquet encryption doc * doc: complete cuda doc * doc: complete csv doc * doc: complete azurefs doc * doc: complete core doc * doc: complete interchange doc * doc: complete array doc * doc: complete builder doc * doc: complete device doc * doc: complete io doc * doc: complete ipc doc * doc: complete types doc * mark deprecated apis * doc: complete _compute doc * doc: complete compute doc * doc: update compute doc * lint code * release 20.0.0.20250618 (apache#243) * fix: make ParquetFileFormat constructor args optional (apache#244) * fix: Field.remove_metadata should return Self (apache#246) * [pre-commit.ci] pre-commit autoupdate (apache#245) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.13 → v0.12.0](astral-sh/ruff-pre-commit@v0.11.13...v0.12.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 20.0.0.20250627 (apache#247) * fix: chunked_array with type should be specified (apache#250) * [pre-commit.ci] pre-commit autoupdate (apache#248) updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.0 → v0.12.3](astral-sh/ruff-pre-commit@v0.12.0...v0.12.3) - [github.com/RobertCraigie/pyright-python: v1.1.402 → v1.1.403](RobertCraigie/pyright-python@v1.1.402...v1.1.403) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 20.0.0.20250715 (apache#251) * fix: The type parameter of array should be covariant (apache#253) * release 20.0.0.20250716 (apache#254) * Add py.typed file to signify that the library is typed See the relevant PEP https://peps.python.org/pep-0561 * Prepare `pyarrow-stubs` for history merging MINOR: [Python] Prepare `pyarrow-stubs` for history merging Co-authored-by: ZhengYu, Xu <[email protected]> * Add `ty` configuration and suppress error codes * One line per rule * Add licence header from original repo for all `.pyi` files * Revert "Add licence header from original repo for all `.pyi` files" This reverts commit 1631f39. * Prepare for licence merging * Exclude `stubs` from `rat` test * Add Apache licence clause to `py.typed` * Reduce list * Resolve merge conflict --------- Signed-off-by: Jonas Dedden <[email protected]> Co-authored-by: ZhengYu, Xu <[email protected]> Co-authored-by: Jim Bosch <[email protected]> Co-authored-by: Oliver Mannion <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eugene Toder <[email protected]> Co-authored-by: fvankrieken <[email protected]> Co-authored-by: Ilia Ablamonov <[email protected]> Co-authored-by: Mathias Beguin <[email protected]> Co-authored-by: Dylan Scott <[email protected]> Co-authored-by: deanm0000 <[email protected]> Co-authored-by: Jan Moravec <[email protected]> Co-authored-by: Marius van Niekerk <[email protected]> Co-authored-by: Jonas Dedden <[email protected]> Co-authored-by: Fábio D. Batista <[email protected]> Co-authored-by: ben-freist <[email protected]> Co-authored-by: Jiahao Yuan <[email protected]> Co-authored-by: Pim de Haan <[email protected]> Co-authored-by: Dan Redding <[email protected]> Co-authored-by: Tom Crasset <[email protected]> Co-authored-by: Tom McTiernan <[email protected]> Co-authored-by: Rok Mihevc <[email protected]>

* Initial commit * init project * complete most of the annotations * fix FixedSizeBufferWriter init annotation * bump 10.0.1.2 * complete parquet core annotations * bump 10.0.1.3 * re-export modules * fix: add return type for foreign_buffer * fix output_stream and read_message annotations * ci: add release job * pre-commit specify flake8 version to 5.0.4 * flake8 ignore F821 for private files * optimize annotations * bump 10.0.1.4 * if param supports IOBase, it should also support NativeFile * bump 10.0.1.5 * pre-commit adds mypy lint * bump 10.0.1.6 * fix ci name * Remove version restrictions for Python. * release 10.0.1.7 * update poetry ci * Fix stubs for Table factory methods The main problem was that these were annotated as instance methods rather than static/class methods, but I've added some detail, too. * update pre-commit * update * fix: make fs.FileSystem.from_uri and hdfs.HadoopFileSystem.from_uri as classmethod * fix: fix read_metadata and read_schema wrong annotations (#11) * fix: typo S3FileSystem schema -> scheme (#12) * bump version 10.0.1.8 (#13) * . (#16) * make DataType hashable (#22) * pa.table support recordbatch (#20) * RecordBatchStreamReader supports next (#18) * add RecordBatch.to_pylist (#23) * precise return types for to_pandas (#25) * bump version 10.0.1.9 (#26) * [pre-commit.ci] pre-commit autoupdate (#27) * [pre-commit.ci] pre-commit autoupdate (#28) * Fix types in FlightDescriptor class (#29) * Fix types in FlightDescriptor class * Add argument types * chore: update pre-commit config (#30) * build: use `pixi` to manage project (#31) * chore: add taplo config (#32) * chore: update LICENSE date (#33) * doc: add CODE_OF_CONDUCT.md (#34) * [pre-commit.ci] pre-commit autoupdate (#38) * [pre-commit.ci] pre-commit autoupdate (#39) updates: - [github.com/astral-sh/ruff-pre-commit: v0.5.7 → v0.6.1](astral-sh/ruff-pre-commit@v0.5.7...v0.6.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (apache#48) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.1 → v0.6.2](astral-sh/ruff-pre-commit@v0.6.1...v0.6.2) - [github.com/pre-commit/mirrors-mypy: v1.11.1 → v1.11.2](pre-commit/mirrors-mypy@v1.11.1...v1.11.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * refactor: rewrite type annotations by hand. (#35) * chore: restart * update ruff config * build: add extra dependencies * update mypy config * feat: add util.pyi * feat: add types.pyi * feat: impl lib.pyi * update * feat: add acero.pyi * feat: add compute.pyi * add benchmark.pyi * add cffi * feat: add csv.pyi * disable isort single line * reformat * update compute.pyi * add _auzurefs.pyi * add _cuda.pyi * add _dataset.pyi * rename _stub_typing.pyi -> _stubs_typing.pyi * add _dataset_orc.pyi * add pyarrow-stubs/_dataset_parquet_encryption.pyi * add _dataset_parquet.pyi * add _feather.pyi * feat: add _flight.pyi * add _fs.pyi * add _gcsfs.pyi * add _hdfs.pyi * add _json.pyi * add _orc.pyi * add _parquet_encryption.pyi * add _parquet.pyi * update * add _parquet.pyi * add _s3fs.pyi * add _substrait.pyi * update * update * add parquet/core.pyi * add parquet/encryption.pyi * add BufferProtocol * impl _filesystemdataset_write * add dataset.pyi * add feather.pyi * add flight.pyi * add fs.pyi * add gandiva.pyi * add json.pyi * add orc.pyi * add pandas_compat.pyi * add substrait.pyi * update util.pyi * add interchange * add __lib_pxi * update __lib_pxi * update * update * add types.pyi * feat: add scalar.pyi * update types.pyi * update types.pyi * update scalar.pyi * update * update * update * update * update * update * feat: impl array * feat: add builder.pyi * add scipy * add tensor.pyi * feat: impl NativeFile * update io.pyi * complete io.pyi * add ipc.pyi * mv benchmark.pyi into __lib_pxi * add table.pyi * do re-export in lib.pyi * fix io.pyi * update * optimize scalar.pyi * optimize indices * complete ipc.pyi * update * fix NullableIterable * fix string array * ignore overload-overlap error * fix _Tabular.__getitem__ * remove additional_dependencies * remove check-mypy.sh (apache#49) * release 20240828 (apache#50) * fix release tag (apache#51) * ci: install hatch by pip (apache#52) * ci: fix hatch keyring (apache#53) * ci: use Release environment (apache#54) * remove Scalar generic type var _IsValid (apache#56) * remove Scalar generic type var _IsValid * make Array, Scalar, Types generic type var as covariant type (apache#57) * remove Field generic type var _Nullable (apache#58) * remove Field generic type var _Nullable * fix: pa.dictionary and pa.schema annotation (apache#59) * fix pa.dictionary annotation * fix: schema annotation * release new version (apache#60) * [pre-commit.ci] pre-commit autoupdate (apache#62) * release: 2024.9.3 (apache#63) use new date release format %Y.%m.%d * support pyarrow compute funcs (apache#61) * update compute.pyi * impl Aggregation funcs * impl arithmetic * imit bit-wise functions * imit rounding functions * optimize annotation * impl logarithmic functions * update * impl comparisons funcs * impl logical funcs * impl string predicates and transforms * impl string padding * impl string trimming * impl string splitting and component extraction * impl string joining and slicing * impl Containment tests * impl Categorizations * impl Structural transforms * impl Conversions * impl Temporal component extraction * impl random, Timezone handling * impl Array-wise functions * fix timestamp scalar * support build array with list of scalar (apache#64) * release 2024.9.4 (apache#65) * Version follows the version of pyarrow (apache#66) * import parquet.core into parquet __init__.py (apache#67) Update __init__.pyi * release 17.1 (apache#69) * fix: add missing submodule benchmark, csv and cuda (apache#71) * release 17.2 (apache#72) * fix: from_pylist covariance (apache#73) * [pre-commit.ci] pre-commit autoupdate (apache#74) * Fix return type for middleware factory's start_call (apache#75) It can return None if middleware is not needed for a given call. * release 17.3 (apache#76) * fix: add missing return type in FlightDescriptor static methods (apache#80) * Support Tabular filter with Expression (apache#81) support Tabular filter with Expression * Support compute functions to accept Expression as parameter (apache#82) * fix: Fix the return value of Expression comparison (apache#83) * release 17.4 (apache#84) * fix: fix the array return type (apache#89) * a few type improvements, mostly flight related (apache#90) * FlightError.extra_info -> bytes * annotate FlightStreamReader.cancel return * BasicAuth serialize/deserialize * RecordBatchFileReader.schema * actually str | bytes * add_type_to_Field (apache#87) * add_type_to_Field * Field.type should return the covariant DataType --------- Co-authored-by: ZhengYu, Xu <[email protected]> * Support fsspec.AbstractFileSystem (apache#88) * supported_filesystem * fixes * remove unused import --------- Co-authored-by: ZhengYu, Xu <[email protected]> * release 17.5 (apache#91) * [pre-commit.ci] pre-commit autoupdate (apache#95) * fix: parquet not accepting NativeFile (apache#98) * feat: support pa.Buffer buffer protocol (apache#99) * feat: Support `compute` functions to accept ChunkedArray. (apache#100) * release 17.6 (apache#101) * [pre-commit.ci] pre-commit autoupdate (apache#102) * working towards making return signatures only have one type (mean and exp) (apache#105) * group_by_returns_TableGroupBy * return_single_type_for_mean_exp * revert table.pyi * compute.mean does not support BinaryScalar or BinaryArray --------- Co-authored-by: ZhengYu, Xu <[email protected]> * a table group_by was returing Self but should return TableGroupBy (apache#104) group_by_returns_TableGroupBy * [pre-commit.ci] pre-commit autoupdate (apache#106) updates: - [github.com/pre-commit/pre-commit-hooks: v4.6.0 → v5.0.0](pre-commit/pre-commit-hooks@v4.6.0...v5.0.0) - [github.com/astral-sh/ruff-pre-commit: v0.6.7 → v0.6.9](astral-sh/ruff-pre-commit@v0.6.7...v0.6.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: RecordBatch missing `from_arrays` and `from_pandas` (apache#108) * release 17.7 (apache#109) * fix_combine_chunks (apache#110) * make Self backward compatible (apache#115) * fix: update ConvertOptions (apache#114) * add type property to Array (apache#112) * add type property to Array * Array.type should return covariant --------- Co-authored-by: ZhengYu, Xu <[email protected]> * release 17.8 (apache#117) * Add include_columns parameter in ConvertOptions (apache#118) * add list[str] overload to rename_columns (apache#119) * release 17.9 (apache#120) * [pre-commit.ci] pre-commit autoupdate (apache#124) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.9 → v0.7.0](astral-sh/ruff-pre-commit@v0.6.9...v0.7.0) - [github.com/pre-commit/mirrors-mypy: v1.11.2 → v1.12.1](pre-commit/mirrors-mypy@v1.11.2...v1.12.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve type annotations for parquet writer (apache#125) Add support for per-field compression specification Add missing none compression value. * Add missing return type for Schema.serialize (apache#123) * Add `Schema.field(int)` (apache#122) * Change various io related functions to support `StrPath` as a path input (apache#121) * Change various io related functions to support StrPath as a path input * fmt * Added StrPath | IO for feather types * fix type hint for sort_by (apache#130) sort_by takes str or list[tuple(name, order)] as its argument where str is a field name not a sort order * metadata on a schema can be passed as str (apache#128) For details see https://github.com/apache/arrow/blob/apache-arrow-17.0.0/python/pyarrow/types.pxi\#L2053-L2056 * Correct typevars for DictionaryType, MapType, RunEncodedType (apache#126) Correct type hints for Dictionary, RunEndEncoded and Map Signed-off-by: Jonas Dedden <[email protected]> Co-authored-by: ZhengYu, Xu <[email protected]> * Add some more StrPath io parts that were overlooked. (apache#131) * Add some more StrPath io parts that were overlooked. Additionally, add the utility typealias `SingleOrList` that can be used in places where we want a concise type declaration but the there is a large union of types. * write_dataset(base_dir = ) can also take Path * Support ChunkedArray in add/append methods in Table (apache#129) * Add missing partitioning typing case (apache#132) This should now support the examples in the docstring for partitioning. * fix: typo 'permissive' instead of 'premissive' (apache#133) * release 17.10 (apache#134) * fix incorrect type hints for compute.sort_indices (apache#135) * disallow passing `names` as an argument to table when using dictionaries (apache#137) * [pre-commit.ci] pre-commit autoupdate (apache#138) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.0 → v0.7.1](astral-sh/ruff-pre-commit@v0.7.0...v0.7.1) - [github.com/pre-commit/mirrors-mypy: v1.12.1 → v1.13.0](pre-commit/mirrors-mypy@v1.12.1...v1.13.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add missing type for FlightEndpoint (apache#136) * release 17.11 (apache#139) * [pre-commit.ci] pre-commit autoupdate (apache#140) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.1 → v0.7.2](astral-sh/ruff-pre-commit@v0.7.1...v0.7.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (apache#142) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.2 → v0.7.3](astral-sh/ruff-pre-commit@v0.7.2...v0.7.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * chore: Create FUNDING.yml (apache#143) Create FUNDING.yml * fix: `read_schema` should return Schema (apache#145) fix: read_schema should return Schema * release 17.12 (apache#146) * [pre-commit.ci] pre-commit autoupdate (apache#147) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.3 → v0.7.4](astral-sh/ruff-pre-commit@v0.7.3...v0.7.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: `to_table` argument `columns` can be a dict of expressions (apache#149) * [pre-commit.ci] pre-commit autoupdate (apache#148) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.4 → v0.8.1](astral-sh/ruff-pre-commit@v0.7.4...v0.8.1) * ruff: ignore PYI063 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ZhengYu, Xu <[email protected]> * release 17.13 (apache#151) * fix: FileSystem metadata value should be str (apache#152) * fix: FileSystemHandler metadata value should be str (apache#153) * [pre-commit.ci] pre-commit autoupdate (apache#154) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.1 → v0.8.2](astral-sh/ruff-pre-commit@v0.8.1...v0.8.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve coverage for pyarrow.struct typehint (apache#157) * fix: ipc typing (apache#159) * release 17.14 (apache#160) * fix: add missing param 'nbytes' to NativeFile.read (apache#163) * release 17.15 (apache#164) * [pre-commit.ci] pre-commit autoupdate (apache#161) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.2 → v0.8.3](astral-sh/ruff-pre-commit@v0.8.2...v0.8.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add 'None' as a valid argument for partitioning to the various parquet reading functions (apache#166) * [pre-commit.ci] pre-commit autoupdate (apache#165) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](astral-sh/ruff-pre-commit@v0.8.3...v0.8.6) - [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](pre-commit/mirrors-mypy@v1.13.0...v1.14.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: should use Collection[Array] instead list[Array] (apache#170) "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance Consider using "Sequence" instead, which is covariant * fix: update type hints for path_or_paths and source parameters in ParquetDataset and read_table (apache#171) * [pre-commit.ci] pre-commit autoupdate (apache#167) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.6 → v0.9.1](astral-sh/ruff-pre-commit@v0.8.6...v0.9.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 17.16 (apache#172) * Fixed pa.fixed_shape_tensor (apache#175) * [pre-commit.ci] pre-commit autoupdate (apache#173) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.1 → v0.9.4](astral-sh/ruff-pre-commit@v0.9.1...v0.9.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: Preserve generic in `ChunkedArray.type` (apache#177) * release 17.17 (apache#178) * [pre-commit.ci] pre-commit autoupdate (apache#176) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.4 → v0.9.6](astral-sh/ruff-pre-commit@v0.9.4...v0.9.6) - [github.com/pre-commit/mirrors-mypy: v1.14.1 → v1.15.0](pre-commit/mirrors-mypy@v1.14.1...v1.15.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: support to construct ListArray with primitive type (apache#179) * fix: Avoid `chunked_array` overlapping overloads (apache#183) * fix: Add placeholder annotations to `pc.if_else` (apache#182) * fix: Widen `Array` to `Array | ChunkedArray` (apache#181) * fix: add `pc.fill_null` (apache#185) - https://arrow.apache.org/docs/python/generated/pyarrow.compute.fill_null.html - https://github.com/narwhals-dev/narwhals/blob/05e47b27ebe27b24196cee5956d07748d65a62ee/narwhals/_arrow/series.py#L675 * fix: Allow Table.from_arrays to take a list containing a mix of Array and ChunkedArray (apache#187) Update table.pyi * release 17.18 (apache#188) * [pre-commit.ci] pre-commit autoupdate (apache#180) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.6 → v0.9.10](astral-sh/ruff-pre-commit@v0.9.6...v0.9.10) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: from_arrays for both Table and RecordBatch (apache#189) * fix: resolve some `pa.compute` overlaps (apache#184) * fix: resolve overlapping `compute.(add|divide)` * fix: copy from non-cloned signature * fix: resolve overlapping `compute.exp` * fix: resolve overlapping `compute.power` * fix: resolve overlapping `compute.equal` * fix: resolve overlapping `compute.and_` * fix: Include `Array` in `chunked_array` overload (apache#190) narwhals-dev/narwhals@0237f7a * release 17.19 (apache#191) * Add Scalar, Array and Type classes for Json & Uuid (apache#194) * Add Scalar, Array and Type classes for Json & Uuid * Formatting fixes * [pre-commit.ci] pre-commit autoupdate (apache#192) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.10 → v0.11.2](astral-sh/ruff-pre-commit@v0.9.10...v0.11.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Revert "Add Scalar, Array and Type classes for Json & Uuid" (apache#195) Revert "Add Scalar, Array and Type classes for Json & Uuid (apache#194)" * fix: Add missing `pc.equal` overload (apache#196) * feat: support pyarrow 19.0 (apache#198) * build: upgrade pyarrow min version to 19.0 * feat: support pyarrow 19.0 * omit mypy bool8 override error * fix: reexport new types (apache#199) * feat: override new patterns for func repeat and nulls (apache#200) * fix: reexport decimal64 array and decimal128 array * feat: override new patterns for func `repeat` and `nulls` * release: 19.1 (apache#201) * fix: Allow `Iterable[Table]` in `concat_tables` (apache#203) https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html > tables : iterable of pyarrow.Table objects * fix: Allow `ChunkedArray[BooleanScalar]` in `pc.invert` (apache#204) Fixes https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L298-L299 * feat: Fully spec `TableGroupBy.aggregate` (apache#197) - https://arrow.apache.org/docs/python/compute.html#grouped-aggregations - https://arrow.apache.org/docs/python/generated/pyarrow.TableGroupBy.html#pyarrow.TableGroupBy.aggregate - https://github.com/apache/arrow/blob/34a984c842db42b409a1359e6e2cf167a2365a48/python/pyarrow/table.pxi#L6578-L6604 * fix: Add missing return type to `ChunkedArray.filter` (apache#205) * fix: Add relaxed final overload to logical functions (apache#206) Covers all of `pc.(and_ | and_kleene | and_not | and_not_kleene | or_ | or_kleene | xor)` Resolves: - https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L219-L233 - https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L662 * fix: Allow `ChunkedArray` in `Table.set_column` (apache#211) Also being more consistent with `ArrayOrChunkedArray[Any]` everywhere Discovered in - https://github.com/vega/vega-datasets/blob/343b7101391a81190ba24e1e8d62a381d2fef3bd/scripts/species.py#L798-L799 * chore: Ignore `fsspec` `[import-untyped]` (apache#210) ```py _fs.pyi:18: error: Skipping analyzing "fsspec": module is installed, but missing library stubs or py.typed marker [import-untyped] _fs.pyi:18: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports Found 1 error in 1 file (checked 64 source files) ``` - fsspec/filesystem_spec#625 - fsspec/filesystem_spec#1676 * feat: Convert `types.is_*` into `TypeIs` guards (apache#215) * chore: Add `types.__all__` * feat: Convert `types._is_*` into `TypeIs` guards I've been using this for a little while, but makes more sense to live in the stubs https://github.com/narwhals-dev/narwhals/blob/16427440e6d74939c403083b52ce3fb0af7d63c7/narwhals/_arrow/utils.py#L44-L67 * fix: Resolve `bit_wise_and` overlaps (apache#214) Fixes 3 errors: ```py compute.pyi:608:5 - error: Overload 1 for "bit_wise_and" overlaps overload 4 and returns an incompatible type (reportOverlappingOverload) compute.pyi:608:5 - error: Overload 1 for "bit_wise_and" overlaps overload 5 and returns an incompatible type (reportOverlappingOverload) compute.pyi:620:5 - error: Overload 3 for "bit_wise_and" will never be used because its parameters overlap overload 1 (reportOverlappingOverload) ``` * fix: Resolve `list_*` overlapping overloads (apache#213) * fix: Resolve `list_value_length` overlaps * fix: Resolve `list_element` overlaps * fix: Resolve `list_(flatten|slice|parent_indices)` overlaps An improvement, but still not that accurate * fix: Include `VarianceOptions` in `TableGroupBy.aggregate` (apache#212) - Follow-up to apache#197 - Noticed while writing up (narwhals-dev/narwhals#2385) - We already use it for `std`, `var` in https://github.com/narwhals-dev/narwhals/blob/16427440e6d74939c403083b52ce3fb0af7d63c7/narwhals/_arrow/group_by.py#L81-L82 * [pre-commit.ci] pre-commit autoupdate (apache#202) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.2 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.2...v0.11.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: Resolve `Scalar.as_py` warnings for `DictionaryType` (apache#207) > scalar.pyi:75:20 - warning: TypeVar "_AsPyTypeK" appears only once in generic function signature > Use "object" instead (reportInvalidTypeVarUse) > scalar.pyi:85:20 - warning: TypeVar "_AsPyTypeK" appears only once in generic function signature > Use "object" instead (reportInvalidTypeVarUse) Instead just using `int`, which should be all that is possible from: https://github.com/zen-xu/pyarrow-stubs/blob/02552b81161d19d4aa71d8656b028eefac84612b/pyarrow-stubs/__lib_pxi/types.pyi#L154-L164 https://github.com/zen-xu/pyarrow-stubs/blob/02552b81161d19d4aa71d8656b028eefac84612b/pyarrow-stubs/__lib_pxi/types.pyi#L63-L70 * fix: Add default to `pc.sort_indices` (apache#216) * fix: Add default to `pc.sort_indices` Fixes narwhals-dev/narwhals#2390 (comment) Default is specified in https://arrow.apache.org/docs/python/generated/pyarrow.compute.sort_indices.html * refactor: Reuse some aliases * fix: Allow `list_size` with `Field` in `pa.list_` (apache#218) Closes apache#217 * allow `Table` or `RecordBatch` for dataset (apache#222) allow source argument pyarrow.dataset.dataset() to be RecordBatch | Table * refactor: Simplify `types` overloads (apache#219) * fix: `binary` overlap * fix: Simplify list constructors, `_Ordered` * refactor: Use `_Tz` default * fix: iter ChunkedArray should return scalar value (apache#224) * release: 19.2 (apache#225) * fix: Add missing `DictionaryArray` methods/properties (apache#226) - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.indices - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary_decode - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary_encode - https://github.com/narwhals-dev/narwhals/blob/c23e56c56630761f0fbc58b575a1c987e57d58d5/narwhals/_arrow/series.py#L787-L798 - https://github.com/narwhals-dev/narwhals/blob/c23e56c56630761f0fbc58b575a1c987e57d58d5/narwhals/_arrow/series_cat.py#L14-L18 * chore: use pyright as static type checker (apache#227) * use pyright as static type checker * make pyright happy * fix: fix pyright action (apache#229) fix github ci * fix: Match runtime behavior of `(Table|RecordBatch).select` (apache#221) * fix: Match runtime behavior of `(Table|RecordBatch).select` - https://github.com/MarcoGorelli/narwhals/blob/5b02b592183b8d39e2d32e0aedd6c234bb22d405/narwhals/_arrow/dataframe.py#L305-L307 - https://github.com/MarcoGorelli/narwhals/blob/5b02b592183b8d39e2d32e0aedd6c234bb22d405/narwhals/_arrow/dataframe.py#L285-L294 Following up on what I thought was a simple stub issue, but we're both *too strict* and *too permissive* in different ways {placeholder} - https://github.com/apache/arrow/blob/d2ddee62329eb711572b4d71d6380673d7f7edd1/python/pyarrow/table.pxi#L4367-L4374 - https://github.com/apache/arrow/blob/d2ddee62329eb711572b4d71d6380673d7f7edd1/python/pyarrow/table.pxi#L1721-L1739 * update select * update select --------- Co-authored-by: ZhengYu, Xu <[email protected]> * [pre-commit.ci] pre-commit autoupdate (apache#220) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.5 → v0.11.8](astral-sh/ruff-pre-commit@v0.11.5...v0.11.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: narrow scalar when type is given (apache#230) * rename Uint -> UInt * feat: narrow scalar when type is given * release 19.3 (apache#231) * chore: pyright use strict mode (apache#233) * fix types * update array.pyi * update scalar.pyi * update * update array * update array * optimize chunked_array * optimizer iterchunks * update * update pyproject.toml * fix: pa.nulls accept type rather than types (apache#234) * [pre-commit.ci] pre-commit autoupdate (apache#232) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.8 → v0.11.9](astral-sh/ruff-pre-commit@v0.11.8...v0.11.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 19.4 (apache#235) * lint(pyright): disable reportUnknownMemberType (apache#239) * [pre-commit.ci] pre-commit autoupdate (apache#236) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.9 → v0.11.13](astral-sh/ruff-pre-commit@v0.11.9...v0.11.13) - [github.com/RobertCraigie/pyright-python: v1.1.400 → v1.1.401](RobertCraigie/pyright-python@v1.1.400...v1.1.401) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: support pyarrow 20.0 (apache#240) * [pre-commit.ci] pre-commit autoupdate (apache#241) updates: - [github.com/RobertCraigie/pyright-python: v1.1.401 → v1.1.402](RobertCraigie/pyright-python@v1.1.401...v1.1.402) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * support docstring (apache#242) * doc: complete tensor doc * doc: complete table doc * doc: complete scalar doc * doc: complete orc doc * doc: complete memory doc * doc: complete lib doc * doc: complete json doc * doc: complete hdfs doc * doc: complete gcsfs doc * doc: complete fs doc * doc: complete flight doc * doc: complete dataset doc * doc: complete dataset parquet doc * doc: complete dataset parquet encryption doc * doc: complete cuda doc * doc: complete csv doc * doc: complete azurefs doc * doc: complete core doc * doc: complete interchange doc * doc: complete array doc * doc: complete builder doc * doc: complete device doc * doc: complete io doc * doc: complete ipc doc * doc: complete types doc * mark deprecated apis * doc: complete _compute doc * doc: complete compute doc * doc: update compute doc * lint code * release 20.0.0.20250618 (apache#243) * fix: make ParquetFileFormat constructor args optional (apache#244) * fix: Field.remove_metadata should return Self (apache#246) * [pre-commit.ci] pre-commit autoupdate (apache#245) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.13 → v0.12.0](astral-sh/ruff-pre-commit@v0.11.13...v0.12.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 20.0.0.20250627 (apache#247) * fix: chunked_array with type should be specified (apache#250) * [pre-commit.ci] pre-commit autoupdate (apache#248) updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.0 → v0.12.3](astral-sh/ruff-pre-commit@v0.12.0...v0.12.3) - [github.com/RobertCraigie/pyright-python: v1.1.402 → v1.1.403](RobertCraigie/pyright-python@v1.1.402...v1.1.403) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 20.0.0.20250715 (apache#251) * fix: The type parameter of array should be covariant (apache#253) * release 20.0.0.20250716 (apache#254) * Add py.typed file to signify that the library is typed See the relevant PEP https://peps.python.org/pep-0561 * Prepare `pyarrow-stubs` for history merging MINOR: [Python] Prepare `pyarrow-stubs` for history merging Co-authored-by: ZhengYu, Xu <[email protected]> * Add `ty` configuration and suppress error codes * One line per rule * Add licence header from original repo for all `.pyi` files * Revert "Add licence header from original repo for all `.pyi` files" * Prepare for licence merging * Exclude `stubs` from `rat` test * Add Apache licence clause to `py.typed` * Reduce list * Resolve merge conflict --------- Signed-off-by: Jonas Dedden <[email protected]> Co-authored-by: ZhengYu, Xu <[email protected]> Co-authored-by: Jim Bosch <[email protected]> Co-authored-by: Oliver Mannion <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eugene Toder <[email protected]> Co-authored-by: fvankrieken <[email protected]> Co-authored-by: Ilia Ablamonov <[email protected]> Co-authored-by: Mathias Beguin <[email protected]> Co-authored-by: Dylan Scott <[email protected]> Co-authored-by: deanm0000 <[email protected]> Co-authored-by: Jan Moravec <[email protected]> Co-authored-by: Marius van Niekerk <[email protected]> Co-authored-by: Jonas Dedden <[email protected]> Co-authored-by: Fábio D. Batista <[email protected]> Co-authored-by: ben-freist <[email protected]> Co-authored-by: Jiahao Yuan <[email protected]> Co-authored-by: Pim de Haan <[email protected]> Co-authored-by: Dan Redding <[email protected]> Co-authored-by: Tom Crasset <[email protected]> Co-authored-by: Tom McTiernan <[email protected]> Co-authored-by: Rok Mihevc <[email protected]>

…ngs, lists, structs Automatically generating testing files from Python. Author: Wes McKinney <[email protected]> Closes apache#219 from wesm/ARROW-394 and squashes the following commits: 7807f48 [Wes McKinney] OS X doesn't have std::fabs c0c804c [Wes McKinney] abs -> fabs 8cd1902 [Wes McKinney] Fix compiler warning in OS X from incorrect type declaration d51581a [Wes McKinney] Add missing apache license 527622d [Wes McKinney] ARROW-414: remove check for maximum buffer padding 2a7b0fc [Wes McKinney] Add JSON generation code to fuzz test numeric types, print integers more nicely. Add integration tests to Travis CI build matrix. Add ApproxEquals method for floating point comparisons. Add boolean, string, struct, list to generated json test case

wesm mentioned this pull request Nov 30, 2016

[SPARK-13534][PySpark] Using Apache Arrow to increase performance of DataFrame.toPandas apache/spark#15821

Closed

wesm force-pushed the ARROW-394 branch from ce16311 to 07c44f2 Compare November 30, 2016 22:00

wesm changed the title ~~ARROW-394: [Integration] JSON generation code to fuzz test numeric types~~ ARROW-394: [Integration] JSON generation code to fuzz test more types Nov 30, 2016

wesm force-pushed the ARROW-394 branch from a00419c to b150d23 Compare November 30, 2016 23:15

wesm changed the title ~~ARROW-394: [Integration] JSON generation code to fuzz test more types~~ ARROW-394: [Integration] Generate tests cases for numeric types, lists, structs Dec 1, 2016

wesm force-pushed the ARROW-394 branch 2 times, most recently from 217bc60 to 4bace7f Compare December 2, 2016 15:48

wesm changed the title ~~ARROW-394: [Integration] Generate tests cases for numeric types, lists, structs~~ ARROW-394: [Integration] Generate tests cases for numeric types, strings, lists, structs Dec 2, 2016

Add JSON generation code to fuzz test numeric types, print integers more

2a7b0fc

nicely. Add integration tests to Travis CI build matrix. Add ApproxEquals method for floating point comparisons. Add boolean, string, struct, list to generated json test case

wesm force-pushed the ARROW-394 branch from 4bace7f to 2a7b0fc Compare December 9, 2016 18:41

ARROW-414: remove check for maximum buffer padding

527622d

Change-Id: I751ca9ef598087a1e32bcb20f20d0b1da6ff3515

Add missing apache license

d51581a

Change-Id: I061b5446f9f61a463a887b2737b0e143ee064c14

wesm added 3 commits December 9, 2016 17:43

Fix compiler warning in OS X from incorrect type declaration

8cd1902

Change-Id: I38063b7e20bc63777b86b1a29090475d8d8037e3

abs -> fabs

c0c804c

Change-Id: I186b230da232b938821755f8ec9c909be3152876

OS X doesn't have std::fabs

7807f48

Change-Id: I19064bedc9d5e47defde1071ae7980ba98895ffe

asfgit closed this in 45ed7e7 Dec 10, 2016

wesm deleted the ARROW-394 branch December 10, 2016 00:50

xhochy reviewed Dec 10, 2016

View reviewed changes

kou mentioned this pull request May 19, 2025

[Discussion][C++][Statistics] Should ApproximateEquals or Equals be used to handle floating-point precision errors in arrow::ArrayStatistics? #46436

Open

paddyroddy pushed a commit to rok/arrow that referenced this pull request Jul 19, 2025

refactor: Simplify types overloads (apache#219)

28a54cc

* fix: `binary` overlap * fix: Simplify list constructors, `_Ordered` * refactor: Use `_Tz` default

ARROW-394: [Integration] Generate tests cases for numeric types, strings, lists, structs #219

ARROW-394: [Integration] Generate tests cases for numeric types, strings, lists, structs #219

Uh oh!

Conversation

wesm commented Nov 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wesm commented Nov 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wesm commented Nov 30, 2016

Uh oh!

wesm commented Nov 30, 2016

Uh oh!

wesm commented Nov 30, 2016

Uh oh!

wesm commented Dec 1, 2016

Uh oh!

wesm commented Dec 5, 2016

Uh oh!

wesm commented Dec 7, 2016

Uh oh!

wesm commented Dec 9, 2016

Uh oh!

wesm commented Dec 9, 2016

Uh oh!

wesm commented Dec 9, 2016

Uh oh!

wesm commented Dec 10, 2016

Uh oh!

xhochy left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wesm commented Nov 30, 2016 •

edited

Loading

wesm commented Nov 30, 2016 •

edited

Loading