PIUnet¶

PIUnet (Permutation-Invariant Uncertainty Network) is the primary MFSR neural network used in the Ceres LWIR super-resolution project. It was originally developed by Valsesia & Magli at Politecnico di Torino for the ESA PROBA-V Kelvin Competition (satellite multi-image SR), then adapted for our LWIR thermal aerial imagery use case.

Repository: /home/geoff/projects/ceres/superrez/piunet/ Status: Last active November 2025. Could not consistently beat bicubic baseline on LWIR data. Paper: Valsesia & Magli, "Permutation invariance and uncertainty in multitemporal image super-resolution," IEEE TGRS 2022.

Architecture Overview¶

See PIUnet Architecture for full technical details.

The core architecture has three main components:

TEFA blocks (Temporal Enhancement with Feature Attention) -- the main feature extraction backbone. Stack of 16 blocks, each containing 3D convolutions, multi-head self-attention across the temporal dimension, and channel-wise squeeze-excitation gating.
TERN (Temporal Enhancement with Registration Network) -- implicit alignment module that predicts a spatially-invariant 5x5 convolution kernel per frame to compensate for sub-pixel misalignment.
Reconstruction head -- pixel-shuffle upsampling (3x) with global residual learning (output = network_residual + bicubic_upsample(mean_LR)).

The network outputs both a super-resolved image (mu_sr) and an uncertainty map (sigma_sr).

Key hyperparameters (piunet/config/config.py, config_lwir.py): - N_feat = 42 feature channels - R_bneck = 8 attention bottleneck ratio - N_tefa = 16 TEFA blocks - N_heads = 1 attention head - patch_size = 32 (LR), maps to 96 (HR) at 3x scale

Model Variants¶

Five model files exist, reflecting the evolution of approaches tried:

1. Original PIUNET (`piunet/models/piunet.py`)¶

The unmodified Valsesia/Magli architecture. Takes (B, T, H, W) LR stack, returns (mu_sr, sigma_sr). Uses BatchNorm3d, bilinear upsampling, global residual from mean of all input frames. Trained on PROBA-V (NIR/RED satellite bands).

2. PIUNETResidual (`piunet/models/piunet_residual.py`)¶

First LWIR adaptation. Explicit residual learning with a learned gain parameter. Takes a designated reference frame x_ref plus the LR stack. Reconstruction formula:

hr_pred = bicubic_upsample(denormalize(x_ref)) + gain * zero_mean_residual

Gain is constrained to [20, 80] via sigmoid. Uses BatchNorm3d. This was the model used in the November 2025 boss meeting results.

3. PIUNETResidual V2 (`piunet/models/piunet_residual_v2.py`)¶

Modernized residual model. Key changes from V1: - GroupNorm (6 groups of 7 channels) replaces BatchNorm3d for stability at small batch sizes - Unfrozen gain parameter (learnable, range [0.05, 0.5] as fraction of HR std) - Statistics-matched bicubic baseline -- during training, bicubic is histogram-matched to HR target statistics before residual addition - Accepts mu_hr/sigma_hr for proper training/inference normalization separation

Source: piunet/PIUNET_V2_IMPROVEMENTS.md, piunet/STATISTICS_MATCHED_TRAINING.md

4. PIUNETLRFusion (`piunet/models/piunet_lr_fusion.py`)¶

Alternative approach: instead of doing SR directly, learn to fuse 9 noisy LR frames into a single clean LR image (matching downsampled HR mosaic). No upsampling -- stays at LR resolution. Intended as Stage 1 of a two-stage pipeline (fuse, then SR separately).

5. PIUNETLRFusion V2 (`piunet/models/piunet_lr_fusion_v2.py`)¶

Optimized fusion model with: - Flash Attention 2 (F.scaled_dot_product_attention) replacing nn.MultiheadAttention -- fixed a bug where attention was running in FP32 under mixed precision, saving 31% VRAM - Uncertainty-aware temporal pooling -- frames weighted by attention_scores * confidence^2 instead of plain mean - Identity-initialized TERN -- conv1 bias set to delta kernel so registration starts as identity - Enabled batch size increase from 8 to 10 on 16GB GPU

Source: piunet/LR_FUSION_V2_IMPROVEMENTS.md

Training Pipeline¶

ProbaV Training (`piunet/training/train.py`)¶

The original training loop. Adam optimizer, MultiStepLR scheduler (step at 150k iterations, gamma=0.2). Two-phase loss: epoch 0 uses L1-registered loss, then switches to L1-registered-with-uncertainty loss. Gradient clipping at 15. Trains for 750 epochs.

LWIR Training Scripts (`scripts/train_lwir_residual.py`, `scripts/train_lwir_lr_fusion.py`)¶

Multiple training scripts reflecting different experiments. Common pattern: - Per-sequence normalization (mean=0, std=1 per tile) - AdamW optimizer (V2) or Adam (V1) - Mixed precision training (torch.cuda.amp) - WandB logging - Data augmentation (flips, rotations)

Loss Functions (`piunet/training/losses.py`)¶

Registered L1 loss -- the signature PIUnet loss. Since LR-to-HR alignment is imperfect, the loss searches over a 7x7 grid of pixel shifts (border=3, yielding 49 candidate offsets) and picks the minimum L1 error. Also applies per-shift brightness bias correction. This makes training tolerant of small registration errors but is computationally expensive.

Registered uncertainty loss -- same shift-search, but uses Laplacian NLL: sigma + |y - mu| * exp(-sigma) instead of plain L1.

cPSNR/cSSIM metrics -- same shift-search approach applied to PSNR/SSIM computation. Finds the shift that maximizes PSNR, with brightness correction.

Dataset Configs¶

Config	Dataset	Tiles	Frames	Patch	Batch	LR	Epochs
`Config` (ProbaV)	NIR satellite	393	variable	32	24	1e-4	750
`ConfigLWIR`	LWIR aerial	210	9	32	16	5e-5	2000

LWIR normalization constants: mu = 31951.56, sigma = 543.50 (16-bit DN values).

Source: piunet/config/config.py lines 1-42, piunet/config/config_lwir.py lines 1-42.

Results History¶

November 3, 2025 -- Boss Meeting (CORRECT results)¶

Inference on 3 test sequences from flight 21052 vs flight 21051 mosaic ground truth. Model: Results_Residual/model_best_20251031_0058.pt.

Sequence	PIUnet PSNR	Bicubic PSNR	Delta
seq000	21.64 dB	22.64 dB	-0.99 dB
seq001	19.40 dB	19.75 dB	-0.35 dB
seq002	19.86 dB	20.27 dB	-0.41 dB

PIUnet performed worse than bicubic on all sequences. SSIM also worse (0.570-0.597 vs 0.593-0.612).

Source: piunet/BOSS_MEETING_SUMMARY.md

Training vs Inference PSNR Discrepancy¶

Training reported 53-54 dB PSNR, but inference showed only 46-48 dB on raw metrics. Investigation (piunet/TRAINING_VS_INFERENCE_ANALYSIS.md) found the gap was due to: 1. Edge artifacts (-2.9 dB) -- boundary pixels from warping 2. Parallax regions (-3+ dB) -- trees/buildings at different altitudes 3. Brightness bias correction (-2-3 dB) -- training used cpsnr() with per-shift bias correction; inference did not 4. Normalization strategy difference was negligible (0.01 dB)

With proper masking (10px edge erosion + 90th percentile parallax exclusion), inference achieved 54.44 dB, matching training.

Results Directories¶

Directory	Date	Description
`Results_Residual/`	Oct-Nov 2025	V1 residual model checkpoints
`Results_Residual_v2/`	Nov 2025	V2 (GroupNorm) residual checkpoints
`Results_Residual_MosaicA/`	Nov 9, 2025	MosaicA-only training
`Results_Residual_MosaicA_Bicubic/`	Nov 2025	Bicubic-target training variant
`Results_Residual_MosaicA_HR_Ref/`	Nov 10, 2025	HR reference frame variant
`Results_Residual_MosaicA_HR_Ref_LearnedGain/`	Nov 2025	Learned gain variant
`Results_Residual_Multiscale_HR_Ref/`	Nov 12, 2025	Multi-scale with HR ref
`Results_LR_Fusion_V2/`	Nov 10-11, 2025	LR fusion V2 training

Known Issues and Struggles¶

Struggle: PIUnet Could Not Beat Bicubic¶

Hypothesis: MFSR network should improve over bicubic by fusing temporal information.
Failure mode: PIUnet consistently 0.3-1.0 dB worse than bicubic across all test sequences.
Root causes (inferred):
TERN's spatially-invariant 5x5 kernel cannot handle per-pixel misalignment from altitude parallax -- see PIUnet Architecture for details
End-to-end training creates gradient competition between alignment (TERN) and SR (TEFA+reconstruction)
No base-frame priority -- all frames treated equally despite reference frame being the best-registered
Small LWIR dataset (210 tiles) with noisy ground truth (parallax artifacts in mosaic)
Anti-pattern: Do not assume that a model achieving high training PSNR (53+ dB) will generalize to real inference when ground truth has systematic alignment errors.

Struggle: Mixed Precision Not Working (V1)¶

Hypothesis: torch.cuda.amp should reduce VRAM via FP16.
Failure mode: Hit a mysterious 13.5 GB wall regardless of batch size.
Root cause: nn.MultiheadAttention ignores autocast context in PyTorch 2.x, running in FP32.
Fix: Replaced with F.scaled_dot_product_attention in V2 models, saving 31% VRAM.
Anti-pattern: Always profile actual precision of each layer when using mixed precision -- do not assume all operations respect autocast.

Source: piunet/LR_FUSION_V2_IMPROVEMENTS.md

Struggle: Normalization/Denormalization Errors¶

Multiple inference runs produced incorrect results due to normalization bugs: - PIUnet_Inference_Results_FIXED had wrong normalization (PSNR = 8 dB, 4560 DN offset) - PIUnet_Inference_Results_16bit_FIXED was incomplete - PIUnet_Inference_Results_CORRECT was the first correct run

Per-sequence vs fixed normalization was tested and found nearly equivalent (0.01 dB difference).

Source: piunet/BOSS_MEETING_SUMMARY.md, piunet/TRAINING_VS_INFERENCE_ANALYSIS.md

Proposed But Unimplemented Improvements¶

From piunet/TODO_NEXT_IMPROVEMENTS.md (Nov 10, 2025):

Frequency-domain loss -- weight loss by empirical frequency spectrum of ground truth residuals (59.7% energy in mid-freq band). Expected +0.5-1.0 dB.
Deformable convolution in TERN -- replace fixed 5x5 kernel with learnable offsets for per-pixel alignment. Expected +0.5-0.7 dB.
GeLU activation -- trivial change, expected +0.2-0.3 dB.
MS-SSIM loss -- perceptual quality improvement.
Altitude conditioning via FiLM layers -- inject GSD/altitude metadata.
Self-supervised pretraining (MAE) -- pretrain on unlabeled LWIR frames.

None of these were implemented before the project paused.

Current State (as of Nov 2025)¶

The project reached an impasse: despite extensive experimentation with five model variants, multiple normalization strategies, and loss function modifications, PIUnet could not consistently outperform bicubic upsampling on real LWIR data. The decision was made to research modern architectures -- see RASD+Restormer, QMambaBSR -- rather than continue iterating on PIUnet.

Key artifacts preserved: - Multiple trained checkpoints across all variants - Complete inference pipeline (piunet/inference/run_piunet_inference_and_compare.py) - 26-tile interactive results website (piunet/browser.html) - Comprehensive documentation of every approach tried

File Map¶

Path	Purpose
`piunet/models/piunet.py`	Original PIUNET (Valsesia/Magli)
`piunet/models/piunet_residual.py`	V1 residual with learned gain
`piunet/models/piunet_residual_v2.py`	V2 with GroupNorm, stats-matched bicubic
`piunet/models/piunet_lr_fusion.py`	LR fusion (Stage 1 of 2-stage pipeline)
`piunet/models/piunet_lr_fusion_v2.py`	LR fusion V2 with Flash Attention
`piunet/config/config.py`	ProbaV hyperparameters
`piunet/config/config_lwir.py`	LWIR hyperparameters
`piunet/training/train.py`	Original ProbaV training loop
`piunet/training/losses.py`	Registered L1/uncertainty/evidential losses, cPSNR
`piunet/data/datasets.py`	ProbaV dataset loader
`scripts/train_lwir_residual*.py`	LWIR training scripts (many variants)
`scripts/train_lwir_lr_fusion*.py`	LR fusion training scripts