Skip to content

Commit 67485c4

Browse files
committed
Merge branch 'benchmarks-v0.5.0' into 'main'
release benchmarks v0.5.0 See merge request cuda-hpc-libraries/cuquantum-sdk/cuquantum-public!37
2 parents 2fef4a9 + e376d0a commit 67485c4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+825
-307
lines changed

benchmarks/LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2021-2024 NVIDIA CORPORATION & AFFILIATES.
1+
Copyright (c) 2021-2025 NVIDIA CORPORATION & AFFILIATES.
22

33
BSD-3-Clause
44

benchmarks/README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# cuquantum-benchmarks
1+
# nv-quantum-benchmarks
22

33
## Installing
44

@@ -24,12 +24,12 @@ and `pip` would not install any extra package for you.
2424

2525
## Running
2626

27-
After installation, a new command `cuquantum-benchmarks` is installed to your Python environment. You can see the help message via `cuquantum-benchmarks --help`:
27+
After installation, a new command `nv-quantum-benchmarks` is installed to your Python environment. You can see the help message via `nv-quantum-benchmarks --help`:
2828

2929
```
30-
usage: cuquantum-benchmarks [-h] {circuit,api} ...
30+
usage: nv-quantum-benchmarks [-h] {circuit,api} ...
3131
32-
=============== NVIDIA cuQuantum Performance Benchmark Suite ===============
32+
=============== NVIDIA Quantum Performance Benchmark Suite ===============
3333
3434
positional arguments:
3535
{circuit,api}
@@ -40,23 +40,23 @@ optional arguments:
4040
-h, --help show this help message and exit
4141
```
4242

43-
Starting v0.2.0, we offer subcommands for performing benchmarks at different levels, as shown above. For details, please refer to the help message of each subcommand, ex: `cuquantum-benchmarks circuit --help`.
43+
Starting v0.2.0, we offer subcommands for performing benchmarks at different levels, as shown above. For details, please refer to the help message of each subcommand, ex: `nv-quantum-benchmarks circuit --help`.
4444

45-
Alternatively, you can launch the benchmark program via `python -m cuquantum_benchmarks`. This is equivalent to the standalone command, and is useful when, say, `pip` installs this package to the user site-package (so that the `cuquantum-benchmarks` command may not be available without modifying `$PATH`).
45+
Alternatively, you can launch the benchmark program via `python -m nv_quantum_benchmarks`. This is equivalent to the standalone command, and is useful when, say, `pip` installs this package to the user site-package (so that the `nv-quantum-benchmarks` command may not be available without modifying `$PATH`).
4646

4747
For GPU backends, it is preferred that `--ngpus N` is explicitly set. On a multi-GPU system, the first `N` GPUs would be used. To limit which GPUs can be accessed by the CUDA runtime, use the environment variable `CUDA_VISIBLE_DEVICES` following the CUDA documentation.
4848

49-
For backends that support MPI parallelism, it is assumed that `MPI_COMM_WORLD` is the communicator, and that `mpi4py` is installed. You can run the benchmarks as you would normally do to launch MPI processes: `mpiexec -n N cuquantum-benchmarks ...`. It is preferred if you fully specify the problem (explicitly set `--benchmark` & `--nqubits`).
49+
For backends that support MPI parallelism, it is assumed that `MPI_COMM_WORLD` is the communicator, and that `mpi4py` is installed. You can run the benchmarks as you would normally do to launch MPI processes: `mpiexec -n N nv-quantum-benchmarks ...`. It is preferred if you fully specify the problem (explicitly set `--benchmark` & `--nqubits`).
5050

5151
Examples:
52-
- `cuquantum-benchmarks api --benchmark apply_matrix --targets 4,5 --controls 2,3 --nqubits 16`: Apply a random gate matrix controlled by qubits 2 & 3 to qubits 4 & 5 of a 16-qubit statevector using cuStateVec's `apply_matrix()` API
53-
- `cuquantum-benchmarks circuit --frontend qiskit --backend cutn --compute-mode statevector --benchmark qft --nqubits 8 --ngpus 1`: Construct a 8-qubit QFT circuit in Qiskit and compute the statevector with cuTensorNet on GPU. Note that the `--compute-mode` can be specified only for `cutn` backend and supports `amplitude` (default), `statevector`, and `expectation`.
54-
- `cuquantum-benchmarks circuit --frontend cirq --backend qsim-mgpu --benchmark qaoa --nqubits 16 --ngpus 2`: Construct a 16-qubit QAOA circuit in Cirq and run it with the (multi-GPU) `qsim-mgpu` backend on 2 GPUs (requires cuQuantum Appliance)
55-
- `mpiexec -n 4 cuquantum-benchmarks circuit --frontend qiskit --backend cusvaer --benchmark quantum_volume --nqubits 32 --ngpus 1 --cusvaer-global-index-bits 1,1 --cusvaer-p2p-device-bits 1`: Construct a 32-qubit Quantum Volume circuit in Qiskit and run it with the (multi-GPU-multi-node) `cusvaer` backend on 2 nodes. Each node runs 2 MPI processes, each of which controls 1 GPU (requires cuQuantum Appliance)
52+
- `nv-quantum-benchmarks api --benchmark apply_matrix --targets 4,5 --controls 2,3 --nqubits 16`: Apply a random gate matrix controlled by qubits 2 & 3 to qubits 4 & 5 of a 16-qubit statevector using cuStateVec's `apply_matrix()` API.
53+
- `nv-quantum-benchmarks circuit --frontend qiskit --backend cutn --compute-mode statevector --benchmark qft --nqubits 8 --ngpus 1`: Construct a 8-qubit QFT circuit using Qiskit and compute the statevector with cuTensorNet on GPU. The `--compute-mode` option determines the type of computation performed, and can be set to `amplitude`, `statevector`, `sampling`, or `expectation`, depending on the backend used. However, not all backends support all compute modes, and each backend has its own default mode.<br> When the `compute-mode` is set to `expectation`, it is allowed to specify the following options to define the Pauli operators for which the expectation value is computed: `--pauli-string`, `--pauli-seed`, or `--pauli-identity-fraction`.
54+
- `mpirun -np 2 nv-quantum-benchmarks circuit --frontend cudaq --backend cudaq-mgpu --compute-mode expectation --benchmark qaoa --nqubits 16 --ngpus 2`: Construct a 16-qubit QAOA circuit in NVIDIA CUDA-Q and compute the expectation with the (multi-GPU) `cudaq-mgpu` backend on 2 GPUs.
55+
- `mpiexec -n 4 nv-quantum-benchmarks circuit --frontend qiskit --backend cusvaer --benchmark quantum_volume --nqubits 32 --ngpus 1 --cusvaer-global-index-bits 1,1 --cusvaer-p2p-device-bits 1`: Construct a 32-qubit Quantum Volume circuit in Qiskit and run it with the (multi-GPU-multi-node) `cusvaer` backend on 2 nodes. Each node runs 2 MPI processes, each of which controls 1 GPU (requires cuQuantum Appliance).
5656

5757
## Known issues
5858

59-
- Due to Qiskit Aer's design, it'd initialize the CUDA contexts for all GPUs installed on the system at import time. While we can defer the import, it might have an impact to the (multi-GPU) system performance when any `aer*` backend is in use. For the time being, we recommend to work around it by limiting the visible devices. For example, `CUDA_VISIBLE_DEVICES=0,1 cuquantum-benchmarks ...` would only use GPU 0 & 1.
59+
- Due to Qiskit Aer's design, it'd initialize the CUDA contexts for all GPUs installed on the system at import time. While we can defer the import, it might have an impact to the (multi-GPU) system performance when any `aer*` backend is in use. For the time being, we recommend to work around it by limiting the visible devices. For example, `CUDA_VISIBLE_DEVICES=0,1 nv-quantum-benchmarks ...` would only use GPU 0 & 1.
6060

6161
## Output data
6262

benchmarks/cuquantum_benchmarks/__init__.py

Lines changed: 0 additions & 5 deletions
This file was deleted.

benchmarks/cuquantum_benchmarks/backends/backend_cirq.py

Lines changed: 0 additions & 52 deletions
This file was deleted.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Copyright (c) 2021-2025, NVIDIA CORPORATION & AFFILIATES
2+
#
3+
# SPDX-License-Identifier: BSD-3-Clause
4+
5+
__version__ = '0.5.0'

benchmarks/cuquantum_benchmarks/_utils.py renamed to benchmarks/nv_quantum_benchmarks/_utils.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES
1+
# Copyright (c) 2021-2025, NVIDIA CORPORATION & AFFILIATES
22
#
33
# SPDX-License-Identifier: BSD-3-Clause
44

@@ -17,16 +17,16 @@
1717
import time
1818
from typing import Iterable, Optional, Union
1919
import warnings
20-
2120
import cupy as cp
2221
import numpy as np
2322
import nvtx
2423
import psutil
2524

25+
from .constants import LOGGER_NAME
26+
2627

2728
# set up a logger
28-
logger_name = "cuquantum-benchmarks"
29-
logger = logging.getLogger(logger_name)
29+
logger = logging.getLogger(LOGGER_NAME)
3030

3131

3232
def wrap_with_nvtx(func, msg):

benchmarks/cuquantum_benchmarks/backends/__init__.py renamed to benchmarks/nv_quantum_benchmarks/backends/__init__.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES
1+
# Copyright (c) 2021-2025, NVIDIA CORPORATION & AFFILIATES
22
#
33
# SPDX-License-Identifier: BSD-3-Clause
44

55
from .backend_cirq import Cirq
6+
from .backend_cudaq import CudaqCusv, CudaqMgpu, CudaqCpu
67
from .backend_cutn import cuTensorNet
78
from .backend_pny import Pny, PnyLightningGpu, PnyLightningCpu, PnyLightningKokkos
89
from .backend_qsim import Qsim, QsimCuda, QsimCusv, QsimMgpu
@@ -16,6 +17,9 @@
1617
'aer-cusv': AerCusv,
1718
'cusvaer': CusvAer,
1819
'cirq': Cirq,
20+
'cudaq-cusv': CudaqCusv,
21+
'cudaq-mgpu': CudaqMgpu,
22+
'cudaq-cpu': CudaqCpu,
1923
'cutn': cuTensorNet,
2024
'qsim': Qsim,
2125
'qsim-cuda': QsimCuda,
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Copyright (c) 2021-2025, NVIDIA CORPORATION & AFFILIATES
2+
#
3+
# SPDX-License-Identifier: BSD-3-Clause
4+
5+
import functools
6+
import warnings
7+
import logging
8+
try:
9+
import cirq
10+
except ImportError:
11+
cirq = None
12+
13+
try:
14+
from .. import _internal_utils
15+
except ImportError:
16+
_internal_utils = None
17+
from .backend import Backend
18+
from ..constants import LOGGER_NAME
19+
20+
21+
# set up a logger
22+
logger = logging.getLogger(LOGGER_NAME)
23+
24+
25+
class _Cirq(Backend):
26+
27+
def __init__(self, ngpus, ncpu_threads, precision, *args, identifier=None, **kwargs):
28+
if cirq is None:
29+
raise RuntimeError("cirq is not installed")
30+
if ngpus > 0:
31+
raise ValueError("the cirq backend only runs on CPU")
32+
if ncpu_threads > 1:
33+
warnings.warn("cannot set the number of CPU threads for the cirq backend")
34+
if precision != 'single':
35+
raise ValueError("the cirq backend only supports single precision")
36+
37+
self.backend = cirq.Simulator()
38+
self.identifier = identifier
39+
self.version = cirq.__version__
40+
self.meta = {}
41+
self.meta['ncputhreads'] = ncpu_threads
42+
43+
def preprocess_circuit(self, circuit, *args, **kwargs):
44+
if _internal_utils is not None:
45+
_internal_utils.preprocess_circuit(self.identifier, circuit, *args, **kwargs)
46+
47+
self.compute_mode = kwargs.pop('compute_mode')
48+
valid_choices = ['statevector', 'sampling']
49+
if self.compute_mode not in valid_choices:
50+
raise ValueError(f"The '{self.compute_mode}' computation mode is not supported for this backend. Supported modes are: {valid_choices}")
51+
52+
self.updated_circuit = circuit
53+
if self.compute_mode == 'statevector':
54+
self.updated_circuit = cirq.drop_terminal_measurements(circuit)
55+
56+
self.meta['compute-mode'] = f'{self.compute_mode}()'
57+
logger.info(f'data: {self.meta}')
58+
59+
pre_data = self.meta
60+
return pre_data
61+
62+
def run(self, circuit, nshots=1024):
63+
if self.compute_mode == 'sampling':
64+
results = self.backend.run(self.updated_circuit, repetitions=nshots)
65+
samples = results.histogram(key='result')
66+
post_res = results.measurements['result']
67+
elif self.compute_mode == 'statevector':
68+
results = self.backend.simulate(self.updated_circuit)
69+
sv = results.final_state_vector
70+
post_res = None
71+
72+
return {'results': None, 'post_results': post_res, 'run_data': {}}
73+
74+
75+
Cirq = functools.partial(_Cirq, identifier='cirq')

0 commit comments

Comments
 (0)