Sparse Hyper-Connections Documentation

Sparse Selective Hyper-Connections (SHC) is a practical efficiency framework for multi-stream residual architectures that achieves substantial computational and memory improvements while maintaining equivalent accuracy.

Key Features

🎯 Guaranteed Stability

Bounded spectral norm ρ ≤ 1 by construction via closed-form Cayley transform, ensuring stable training at any depth.

⚡ 16× Faster Routing

Replace iterative Sinkhorn normalization with closed-form orthogonal matrix generation via the Cayley transform.

💾 3.3× Less Memory

Factorized KV cache compression reduces memory from 4× to ~1.2× baseline through learned low-rank projections.

📈 O(L) Inference

Optional SSM distillation enables linear-time generation without KV cache, trading ~1% accuracy for 4.4× memory reduction.

Quick Installation

pip install sparse-hyper-connections

Quick Start

from shc.models import SHCTransformer, get_config

# Create model with predefined configuration
config = get_config('500m')  # Options: '500m', '1b', '3b', '7b'
model = SHCTransformer(config)

# Forward pass
import torch
input_ids = torch.randint(0, 32000, (2, 512))
logits = model(input_ids)

# Generate text
output = model.generate(
    input_ids[:, :10],  # prompt
    max_new_tokens=100,
    temperature=0.7,
)

Documentation Contents

Getting Started

Development

Citation

If you use SHC in your research, please cite:

@article{shc2026,
  title={Sparse Selective Hyper-Connections: A Unified Framework for 
         Stable and Efficient Deep Residual Learning},
  author={SHC Research Team},
  journal={IEEE Conference},
  year={2026}
}