Skip to content

Commit e77e51a

Browse files
authored
Add note on security concerns with virtual chunks (#1277)
* add note on security concerns with virtual chunks * pre-commit * concerns -> considerations
1 parent 4f7c7ac commit e77e51a

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

docs/docs/virtual.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@ While Icechunk works wonderfully with native chunks managed by Zarr, there is lo
1212

1313
Currently, Icechunk support virtual references to data stored in `s3` compatible,`gcs`, `http/https`, and `local` storage backends. Support for [`azure`](https://github.com/earth-mover/icechunk/issues/602) is on the roadmap.
1414

15+
!!! warning "Security considerations with virtual chunks"
16+
17+
Virtual chunks let Icechunk point to external locations (s3://, http://, file://, etc.), which means a malicious repo could try to trick your code into reading sensitive data from your machine or other sources.
18+
19+
To protect you, Icechunk is safe by default: it won't read from these locations unless you explicitly allow it. This requires (1) defining trusted virtual chunk containers when writing data, and (2) passing ``authorize_virtual_chunk_access`` when opening a repo, so you stay in control of what external paths get accessed.
20+
1521
## Creating a virtual dataset with VirtualiZarr
1622

1723
We are going to create a virtual dataset pointing to all of the [OISST](https://www.ncei.noaa.gov/products/optimum-interpolation-sst) data for August 2024. This data is distributed publicly as netCDF files on AWS S3, with one netCDF file containing the Sea Surface Temperature (SST) data for each day of the month. We are going to use `VirtualiZarr` to combine all of these files into a single virtual dataset spanning the entire month, then write that dataset to Icechunk for use in analysis.

0 commit comments

Comments
 (0)