CUDA-Q ML: Enabling End-to-End Differentiable Quantum-Classical Computing

Wednesday, June 24, 2026 3:45 PM to 5:15 PM · 1 hr. 30 min. (Europe/Berlin)

Foyer D-G - 2nd Floor

Project Poster

Quantum Machine Learning

Information

Poster is on display.

While classical deep learning environments like PyTorch benefit from seamless end-to-end automatic differentiation, the development of hybrid quantum-classical models utilizing NVIDIA's CUDA-Q currently faces a critical operational bottleneck. Specifically, the direct integration of the CUDA-Q compiler creates a “black box” effect that severs the classical backpropagation chain. This phenomenon, termed "Gradient Severing," prohibits the joint optimization of classical and quantum parameters.
To resolve this disconnect, we present CUDA Quantum ML (CUDA-Q ML), an innovative encapsulation framework meticulously designed to bridge the operational gap between state-of-the-art quantum simulation and mature classical deep learning ecosystems. By implementing custom forward computation and backward propagation protocols, CUDA-Q ML wraps the native CUDA-Q engine, effectively restoring quantum automatic differentiation and allowing quantum circuits to function seamlessly as fully differentiable nodes within the PyTorch AD graph.
To accommodate diverse algorithmic architectures, the framework formally defines three positional integration paradigms for the quantum components. (1) The Quantum Intermediate Layer functions as a core trainable feature extractor, projecting compressed classical representations into a high-dimensional Hilbert space before returning them to subsequent classical layers. (2) The Quantum Output Layer positions the circuit at the network's terminus, directly deriving predictions from quantum measurement expectations. Alternatively, (3) the Quantum Parameter Input paradigm reconfigures the computational flow such that a classical backbone directly predicts the variational parameters of the quantum gates, enabling sophisticated optimization strategies for architectures like Variational Quantum Eigensolvers.
At the system level, this flexibility is governed by a robust Connector mechanism. The Dynamic Circuit Connector manages the continuous translation between paradigms, orchestrating advanced data encoding and dynamic circuit construction to map classical tensors to the quantum state space accurately. Concurrently, the Gradient Connector directly addresses the core optimization challenge during the backward and forward pass. It automatically selects and deploys context-appropriate differentiation algorithms to compute precise gradients, reliably reinject them into the PyTorch optimizer, and set different output modes (SamplerMode and EstimatorMode). Specifically, the gradient computation methods cover the following three: Adjoint Differentiation for maximized simulation speed, Parameter Shift Rules for strict quantum hardware compatibility, and Density Matrix Gradients for native cuQuantum integration. We also set a Parameter Input treating quantum circuits outside of Torch layers, allowing seamless end-to-end optimization within classical neural networks or fixed-input hybrid inference.
We conduct preliminary evaluations to validate the framework’s usability and operational superiority under test cases. End-to-end training experiments on MNIST classification verify that CUDA-Q ML enables stable, gradient-based training across different integration paradigms. Additionally, efficiency comparisons among the implemented gradient computation methods provide suggestions on which method to choose, while benchmarking against existing quantum machine learning frameworks demonstrates CUDA-Q ML's computational efficiency in given cases.
Moving forward, we prioritizes scaling these validations to more complex architectures, notably Transformer-based models, and incorporating rigorous system-level metrics such as wall-clock time and memory consumption. We further intend to implement robust multi-GPU scaling, validate the framework’s physical viability through direct deployment on real quantum hardware, and ultimately perform the necessary compliance reviews to release CUDA-Q ML as an open-source asset for community.

Contributors:

Format

on-demandon-site

Speakers

Wenshuo Zhang

Solution ArchitectNVIDIA

Session

Project Poster Reception

Wednesday, June 24, 2026 3:45 PM to 5:15 PM