-
Notifications
You must be signed in to change notification settings - Fork 280
Description
OS: Windows 10
NetCDF version: 4.9.1
I am trying to read a 3D double variable (2000 x 512 x 512) from a netCDF4 file with the following parameters:
start = {0,0,0}
count[] = {1000, 256, 256};
stride[] = {2, 2, 2};
chunk size: {20, 10, 10}
shuffle: no
deflate : yes
deflate_level : 6
I time the call to nc_get_vars.
On Debian 11, it takes ~25 seconds.
On Windows 10, it takes ~130 seconds.
I would expect Windows to be slightly slower, but >5x slowdown is unexpected.
I see similar slowdowns with 'nc_get_vars_double'
On the contrary, using 'nc_get_var_double' or 'nc_get_var' to read the whole variable is significantly faster (~3 sec on Linux, and ~1 sec on Windows)
-
Is there a way to optimize the performance of 'nc_get_vars' or 'nc_get_vars_double' so that Windows performance is closer to Linux performance?
-
Is reading the whole variable using 'nc_get_var' to memory and then slicing it later an option? I have seen that there were some discussions regarding this (Make netcdf-4 use the the stride > 1 facilities of hdf5 #908) and that a submission was made to make strided reads faster. But for my variable, reading the whole variable still seems to be significantly faster than strided reads (especially on Windows)
Please find the link to the nc file here.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <netcdf.h>
#include <cstdlib>
#include <iostream>
#include <chrono>
int
main()
{
int status;
int ncid;
int varid;
int elems_x = 256;
int elems_y = 256;
int elems_z = 1000;
double* outData = (double*)malloc (elems_x*elems_y*elems_z*sizeof(double));
size_t start[] = {0, 0, 0};
size_t count[] = {1000, 256, 256};
ptrdiff_t stride[] = {2, 2, 2};
// open the NetCDF-4 file
status = nc_open("repro_nc4file.nc", NC_NOWRITE, &ncid);
if(status != NC_NOERR) {
printf("Could not open file.\n");
}
// get the varid
status = nc_inq_varid(ncid, "my_var", &varid);
printf("status after inq var = %d\n", status);
printf("varid = %d\n", varid);
// get the strided subset
auto timestart = std::chrono::high_resolution_clock::now();
status = nc_get_vars(ncid, varid, start, count, stride, outData);
auto timeend = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::seconds>(timeend - timestart);
std::cout << "Execution time: " << duration.count() << " seconds" << std::endl;
printf("status after getting strided subset = %d\n", status);
// close the file
status = nc_close(ncid);
printf("status after close = %d\n", status);
printf("End of test.\n\n");
return 0;
}