Hybrid Quantum-Classical Architecture for Single-Cell Gene Expression Prediction
Virtual Cell Challenge - Predicting gene perturbation effects using parameterized quantum circuits
Background
The Arc Institute launched the Virtual Cell Challenge, a global computational competition that challenges researchers to build artificial intelligence models able to accurately simulate how cells respond to unseen genetic perturbations. This project was developed as an early prototype for this challenge.
Abstract
In order to generate cellular function maps, we have a large reliance on predicting how genetic modifications shift gene activity. Although high-throughput screens are available, the huge number of possibilities slows down scaling. Our present frameworks offer insight, but they still aren't the most ideal when working with complex biological intricacies.
This project is a hybrid quantum-classical architecture to predict single-cell gene expression profiles following gene perturbations. First, inside a frozen classical State Transition (STATE) transformer core, we embedded parameterized quantum circuits (PQCs), as trainable nonlinear transformations in our system. Then, we had the system encode basal gene expression and perturbation features into a shared latent representation, mapped using linear projection to our quantum circuit parameters.
The quantum part itself used layers of adjustable single-qubit rotations and entanglement gates. From there, the Pauli-Z expectation values were then compacted into usable encoding, which a final linear layer decoded into gene expression predictions. The optimization process was mainly focused on the quantum parameters and decoder, while keeping the classical backbone frozen.
The hybrid design worked for enhancing the classical encoder with the additional nonlinear processing. It allowed for organization while bringing together localized and correlated quantum states to predict expression for hidden perturbations.
The research shows that adaptable quantum circuits are able to work as nonlinear operators in biological foundation frameworks. Our hybrid strategy shows a very real route of this very application of quantum learning techniques for the dimensionality complexities inherent to genes.
Methods
To isolate the effects of genetic modifications, the model prototype was designed using a "Delta Prediction" architecture. Instead of predicting the entire gene expression profile from scratch, the model was engineered to predict only the perturbation shift.
Frozen Classical Encoding: Single-cell gene expression profiles (18,080 genes) and perturbation features (5,120-dimensional ESM2 embeddings) are fed into a pre-trained STATE transformer. The transformer's weights are kept entirely frozen, reducing the data into a dense, 672-dimensional shared latent representation without losing previously learned biological rules.
Quantum Projection & Data Re-uploading: A linear projection layer reduces the 672-dimensional vector into rotation angles suitable for an 8-qubit quantum processor. The classical inputs are re-embedded at every variational layer using Angle Embedding, increasing expressivity by acting as a multi-frequency Fourier series.
Entanglement & Multi-Basis Measurement: The circuit utilizes adjustable single-qubit rotations (RX, RY, RZ) and CNOT gates in a ring topology for entanglement. Crucially, the quantum state is measured across three bases (Pauli-X, Pauli-Y, and Pauli-Z), expanding the output to 24 expectation values (3 X 8 qubits).
Decoding & Skip Connection: A linear decoder maps the quantum output to the 18,080-dimensional perturbation delta. A skip connection mathematically adds the original basal (control) cell data back to this predicted delta to generate the final cellular state.
Testing Conditions: During evaluation, instead of using a single mean control profile, the algorithm samples 200 distinct, individual control cells. This engineered variance mimics natural biology and provides the statistical rigor required for accurate Wilcoxon rank-sum testing.
Architecture Diagrams

Graphic created by Venkata Shashish using draw.io, 2026.

Graphic created by Venkata Shashish using draw.io, 2026.
Results
The hybrid prototype successfully showed state-of-the-art ability to identify differentially expressed genes across 51 diverse genetic perturbations.
Primary Endpoint Achieved: The architecture achieved a highly competitive Differential Expression Score (DES) of 0.8532 (Scaled: 0.8455).
Engineering Efficiency: By relying on the delta-prediction skip connection, 100% of the trainable parameters (9.29 million) were focused purely on the genetic shift, completely bypassing the computational waste of reconstructing the underlying cell.
1. EVALUATION METRICS
Differential Expression Score (DES): 0.8532
Perturbation Discrimination Score (PDS): 0.5076
Mean Absolute Error (MAE): 0.1433
Scaled Scores (vs baseline):
DES_scaled: 0.8455
PDS_scaled: 0.0000
MAE_scaled: 0.0000
OVERALL SCORE: 28.18/100
2. QUANTUM LAYER CONTRIBUTION ANALYSIS
Variance explained by first 4 components: 25.8%
Variance explained by first 10 components: 29.3%
Effective dimensionality (participation ratio): 2.58
(Expected: ~4 if quantum layer dominates, higher otherwise)
Mean between-perturbation similarity: 0.9993
(Higher = more uniform predictions = less differentiation)

Graphic created by Venkata Shashish using MatPlotLib.io, 2026.
Analysis
The model excelled at identifying which genes changed, we used Principal Component Analysis (PCA) was conducted to interpret the limitations of the current prototype.
Secondary Metrics: The model achieved a Perturbation Discrimination Score (PDS) of 0.5076 and a Mean Absolute Error (MAE) of 0.1433.
The Quantum Bottleneck: PCA determined that the first 4 principal components captured 25.8% of the data's variance, resulting in a highly restricted effective dimensionality of only 2.58.
Interpretation: Because the 672-dimensional biological latent space was compressed into just an 8-qubit space, the model exhibited "mode collapse," resulting in a between-perturbation similarity of 0.9993. The prototype predicted highly accurate, but ultimately uniform, shifts for different targets.

Graphic created by Venkata Shashish using MatPlotLib.io, 2026.
Future Plans
The current prototype achieved a strong DES of 0.8532, but PDS and MAE metrics indicate room for improvement. The following phases outline the systematic plan to enhance model performance toward competitive standings.
Phase 1: Fix the Quantum Bottleneck (Architecture)
- ✓1A. Classical bypass residual path: Add a parallel classical MLP path from the STATE hidden state directly to the output, so the model isn't forced to route 100% of information through the quantum bottleneck. The quantum circuit becomes an additive correction to a classical baseline prediction.
- ✓1B. Transfer STATE's decoder weights: Initialize the classical decoder with STATE's project_out weights for immediate strong baseline predictions.
- ○1C. Increase quantum circuit capacity: Scale to 24+ qubits with deeper decoder MLP if needed.
Phase 2: Metric-Aware Loss Function
- ○2A. PDS-inspired contrastive loss: Implement differentiable L1 ranking proxy to push the model to make different perturbations distinguishable.
- ○2B. DES-aware ranking loss: Penalize when ranking of gene-level fold changes doesn't match ground truth.
Phase 3: Training Regime Improvements
- ○3A. Pseudo-bulk training: Train on averaged cells per perturbation rather than noisy single-cell data to focus on true perturbation signal.
- ○3B. Extended training with scheduling: Cosine annealing LR scheduler with warmup, 20-50 epochs on pseudo-bulk data.
Phase 4: Post-Processing and Scaling
- ○4A. Global prediction scaling: Apply cross-validated scale factor to predicted delta for improved L1-based discrimination.
- ○4B. DEG-aware gene weighting: Amplify predicted changes for genes that frequently appear as differentially expressed.
Phase 5: Backbone Fine-Tuning (Advanced)
- ○Selective STATE fine-tuning: Unfreeze project_out, final_down_then_up, and last transformer layers with 10x lower learning rate.
Expected Trajectory:
- Phase 1 alone should lift PDS from 0.50 to 0.60-0.70 and improve MAE
- Adding Phase 2-3 should push PDS toward 0.75-0.85 while maintaining DES
- Phase 4 scaling can provide another 5-10% PDS boost
- Target: Overall score 60-80/100 (competitive with top-5 VCC teams)
Project Status
Key Contributions
- •Engineering PyTorch pipelines integrating Parameterized Quantum Circuits (PQCs) as trainable nonlinear transformations
- •Developing hybrid quantum-classical architecture for the Virtual Cell Challenge
- •Working on single-cell gene expression prediction following gene perturbations
- •Collaborating with PhD student Samuel Yue Yu on cutting-edge quantum biology research
- •Debugging CUBLAS issues to ensure stable GPU-accelerated quantum simulations
- •Making the pipeline faster and more efficient by implementing a skip connection using a predicted delta architecture