Skip to content

nc_get_vars incredibly slow in Windows compared to Linux #2721

@abhibaruah

Description

@abhibaruah

OS: Windows 10
NetCDF version: 4.9.1

I am trying to read a 3D double variable (2000 x 512 x 512) from a netCDF4 file with the following parameters:
start = {0,0,0}
count[] = {1000, 256, 256};
stride[] = {2, 2, 2};
chunk size: {20, 10, 10}
shuffle: no
deflate : yes
deflate_level : 6

I time the call to nc_get_vars.
On Debian 11, it takes ~25 seconds.
On Windows 10, it takes ~130 seconds.

I would expect Windows to be slightly slower, but >5x slowdown is unexpected.
I see similar slowdowns with 'nc_get_vars_double'

On the contrary, using 'nc_get_var_double' or 'nc_get_var' to read the whole variable is significantly faster (~3 sec on Linux, and ~1 sec on Windows)

  1. Is there a way to optimize the performance of 'nc_get_vars' or 'nc_get_vars_double' so that Windows performance is closer to Linux performance?

  2. Is reading the whole variable using 'nc_get_var' to memory and then slicing it later an option? I have seen that there were some discussions regarding this (Make netcdf-4 use the the stride > 1 facilities of hdf5 #908) and that a submission was made to make strided reads faster. But for my variable, reading the whole variable still seems to be significantly faster than strided reads (especially on Windows)

Please find the link to the nc file here.
Here is my code:

#include <stdio.h>
#include <string.h>
#include <netcdf.h>
#include <cstdlib>
#include <iostream>
#include <chrono>

int
main()
{
    int status;
    int ncid;
    int varid;

    int elems_x = 256;
	int elems_y = 256;
	int elems_z = 1000;
    double* outData = (double*)malloc (elems_x*elems_y*elems_z*sizeof(double));

    size_t start[] = {0, 0, 0};
    size_t count[] = {1000, 256, 256};
    ptrdiff_t stride[] = {2, 2, 2};

    
    // open the NetCDF-4 file
    status = nc_open("repro_nc4file.nc", NC_NOWRITE, &ncid);
    if(status != NC_NOERR) {
         printf("Could not open file.\n");
    }
   
    // get the varid 
    status = nc_inq_varid(ncid, "my_var", &varid);
    printf("status after inq var = %d\n", status);
    printf("varid = %d\n", varid);

    // get the strided subset
	auto timestart = std::chrono::high_resolution_clock::now();
    status = nc_get_vars(ncid, varid, start, count, stride, outData);
	auto timeend = std::chrono::high_resolution_clock::now();
	auto duration = std::chrono::duration_cast<std::chrono::seconds>(timeend - timestart);
	std::cout << "Execution time: " << duration.count() << " seconds" << std::endl;
    printf("status after getting strided subset = %d\n", status);

    // close the file 
    status = nc_close(ncid);
    printf("status after close = %d\n", status);

    printf("End of test.\n\n");

    return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions