Skip to content

PIUnet

PIUnet (Permutation-Invariant Uncertainty Network) is the primary MFSR neural network used in the Ceres LWIR super-resolution project. It was originally developed by Valsesia & Magli at Politecnico di Torino for the ESA PROBA-V Kelvin Competition (satellite multi-image SR), then adapted for our LWIR thermal aerial imagery use case.

Repository: /home/geoff/projects/ceres/superrez/piunet/ Status: Last active November 2025. Could not consistently beat bicubic baseline on LWIR data. Paper: Valsesia & Magli, "Permutation invariance and uncertainty in multitemporal image super-resolution," IEEE TGRS 2022.

Architecture Overview

See PIUnet Architecture for full technical details.

The core architecture has three main components:

  1. TEFA blocks (Temporal Enhancement with Feature Attention) -- the main feature extraction backbone. Stack of 16 blocks, each containing 3D convolutions, multi-head self-attention across the temporal dimension, and channel-wise squeeze-excitation gating.
  2. TERN (Temporal Enhancement with Registration Network) -- implicit alignment module that predicts a spatially-invariant 5x5 convolution kernel per frame to compensate for sub-pixel misalignment.
  3. Reconstruction head -- pixel-shuffle upsampling (3x) with global residual learning (output = network_residual + bicubic_upsample(mean_LR)).

The network outputs both a super-resolved image (mu_sr) and an uncertainty map (sigma_sr).

Key hyperparameters (piunet/config/config.py, config_lwir.py): - N_feat = 42 feature channels - R_bneck = 8 attention bottleneck ratio - N_tefa = 16 TEFA blocks - N_heads = 1 attention head - patch_size = 32 (LR), maps to 96 (HR) at 3x scale

Model Variants

Five model files exist, reflecting the evolution of approaches tried:

1. Original PIUNET (piunet/models/piunet.py)

The unmodified Valsesia/Magli architecture. Takes (B, T, H, W) LR stack, returns (mu_sr, sigma_sr). Uses BatchNorm3d, bilinear upsampling, global residual from mean of all input frames. Trained on PROBA-V (NIR/RED satellite bands).

2. PIUNETResidual (piunet/models/piunet_residual.py)

First LWIR adaptation. Explicit residual learning with a learned gain parameter. Takes a designated reference frame x_ref plus the LR stack. Reconstruction formula:

hr_pred = bicubic_upsample(denormalize(x_ref)) + gain * zero_mean_residual

Gain is constrained to [20, 80] via sigmoid. Uses BatchNorm3d. This was the model used in the November 2025 boss meeting results.

3. PIUNETResidual V2 (piunet/models/piunet_residual_v2.py)

Modernized residual model. Key changes from V1: - GroupNorm (6 groups of 7 channels) replaces BatchNorm3d for stability at small batch sizes - Unfrozen gain parameter (learnable, range [0.05, 0.5] as fraction of HR std) - Statistics-matched bicubic baseline -- during training, bicubic is histogram-matched to HR target statistics before residual addition - Accepts mu_hr/sigma_hr for proper training/inference normalization separation

Source: piunet/PIUNET_V2_IMPROVEMENTS.md, piunet/STATISTICS_MATCHED_TRAINING.md

4. PIUNETLRFusion (piunet/models/piunet_lr_fusion.py)

Alternative approach: instead of doing SR directly, learn to fuse 9 noisy LR frames into a single clean LR image (matching downsampled HR mosaic). No upsampling -- stays at LR resolution. Intended as Stage 1 of a two-stage pipeline (fuse, then SR separately).

5. PIUNETLRFusion V2 (piunet/models/piunet_lr_fusion_v2.py)

Optimized fusion model with: - Flash Attention 2 (F.scaled_dot_product_attention) replacing nn.MultiheadAttention -- fixed a bug where attention was running in FP32 under mixed precision, saving 31% VRAM - Uncertainty-aware temporal pooling -- frames weighted by attention_scores * confidence^2 instead of plain mean - Identity-initialized TERN -- conv1 bias set to delta kernel so registration starts as identity - Enabled batch size increase from 8 to 10 on 16GB GPU

Source: piunet/LR_FUSION_V2_IMPROVEMENTS.md

Training Pipeline

ProbaV Training (piunet/training/train.py)

The original training loop. Adam optimizer, MultiStepLR scheduler (step at 150k iterations, gamma=0.2). Two-phase loss: epoch 0 uses L1-registered loss, then switches to L1-registered-with-uncertainty loss. Gradient clipping at 15. Trains for 750 epochs.

LWIR Training Scripts (scripts/train_lwir_residual*.py, scripts/train_lwir_lr_fusion*.py)

Multiple training scripts reflecting different experiments. Common pattern: - Per-sequence normalization (mean=0, std=1 per tile) - AdamW optimizer (V2) or Adam (V1) - Mixed precision training (torch.cuda.amp) - WandB logging - Data augmentation (flips, rotations)

Loss Functions (piunet/training/losses.py)

Registered L1 loss -- the signature PIUnet loss. Since LR-to-HR alignment is imperfect, the loss searches over a 7x7 grid of pixel shifts (border=3, yielding 49 candidate offsets) and picks the minimum L1 error. Also applies per-shift brightness bias correction. This makes training tolerant of small registration errors but is computationally expensive.

Registered uncertainty loss -- same shift-search, but uses Laplacian NLL: sigma + |y - mu| * exp(-sigma) instead of plain L1.

cPSNR/cSSIM metrics -- same shift-search approach applied to PSNR/SSIM computation. Finds the shift that maximizes PSNR, with brightness correction.

Dataset Configs

Config Dataset Tiles Frames Patch Batch LR Epochs
Config (ProbaV) NIR satellite 393 variable 32 24 1e-4 750
ConfigLWIR LWIR aerial 210 9 32 16 5e-5 2000

LWIR normalization constants: mu = 31951.56, sigma = 543.50 (16-bit DN values).

Source: piunet/config/config.py lines 1-42, piunet/config/config_lwir.py lines 1-42.

Results History

November 3, 2025 -- Boss Meeting (CORRECT results)

Inference on 3 test sequences from flight 21052 vs flight 21051 mosaic ground truth. Model: Results_Residual/model_best_20251031_0058.pt.

Sequence PIUnet PSNR Bicubic PSNR Delta
seq000 21.64 dB 22.64 dB -0.99 dB
seq001 19.40 dB 19.75 dB -0.35 dB
seq002 19.86 dB 20.27 dB -0.41 dB

PIUnet performed worse than bicubic on all sequences. SSIM also worse (0.570-0.597 vs 0.593-0.612).

Source: piunet/BOSS_MEETING_SUMMARY.md

Training vs Inference PSNR Discrepancy

Training reported 53-54 dB PSNR, but inference showed only 46-48 dB on raw metrics. Investigation (piunet/TRAINING_VS_INFERENCE_ANALYSIS.md) found the gap was due to: 1. Edge artifacts (-2.9 dB) -- boundary pixels from warping 2. Parallax regions (-3+ dB) -- trees/buildings at different altitudes 3. Brightness bias correction (-2-3 dB) -- training used cpsnr() with per-shift bias correction; inference did not 4. Normalization strategy difference was negligible (0.01 dB)

With proper masking (10px edge erosion + 90th percentile parallax exclusion), inference achieved 54.44 dB, matching training.

Results Directories

Directory Date Description
Results_Residual/ Oct-Nov 2025 V1 residual model checkpoints
Results_Residual_v2/ Nov 2025 V2 (GroupNorm) residual checkpoints
Results_Residual_MosaicA/ Nov 9, 2025 MosaicA-only training
Results_Residual_MosaicA_Bicubic/ Nov 2025 Bicubic-target training variant
Results_Residual_MosaicA_HR_Ref/ Nov 10, 2025 HR reference frame variant
Results_Residual_MosaicA_HR_Ref_LearnedGain/ Nov 2025 Learned gain variant
Results_Residual_Multiscale_HR_Ref/ Nov 12, 2025 Multi-scale with HR ref
Results_LR_Fusion_V2/ Nov 10-11, 2025 LR fusion V2 training

Known Issues and Struggles

Struggle: PIUnet Could Not Beat Bicubic

  • Hypothesis: MFSR network should improve over bicubic by fusing temporal information.
  • Failure mode: PIUnet consistently 0.3-1.0 dB worse than bicubic across all test sequences.
  • Root causes (inferred):
  • TERN's spatially-invariant 5x5 kernel cannot handle per-pixel misalignment from altitude parallax -- see PIUnet Architecture for details
  • End-to-end training creates gradient competition between alignment (TERN) and SR (TEFA+reconstruction)
  • No base-frame priority -- all frames treated equally despite reference frame being the best-registered
  • Small LWIR dataset (210 tiles) with noisy ground truth (parallax artifacts in mosaic)
  • Anti-pattern: Do not assume that a model achieving high training PSNR (53+ dB) will generalize to real inference when ground truth has systematic alignment errors.

Struggle: Mixed Precision Not Working (V1)

  • Hypothesis: torch.cuda.amp should reduce VRAM via FP16.
  • Failure mode: Hit a mysterious 13.5 GB wall regardless of batch size.
  • Root cause: nn.MultiheadAttention ignores autocast context in PyTorch 2.x, running in FP32.
  • Fix: Replaced with F.scaled_dot_product_attention in V2 models, saving 31% VRAM.
  • Anti-pattern: Always profile actual precision of each layer when using mixed precision -- do not assume all operations respect autocast.

Source: piunet/LR_FUSION_V2_IMPROVEMENTS.md

Struggle: Normalization/Denormalization Errors

Multiple inference runs produced incorrect results due to normalization bugs: - PIUnet_Inference_Results_FIXED had wrong normalization (PSNR = 8 dB, 4560 DN offset) - PIUnet_Inference_Results_16bit_FIXED was incomplete - PIUnet_Inference_Results_CORRECT was the first correct run

Per-sequence vs fixed normalization was tested and found nearly equivalent (0.01 dB difference).

Source: piunet/BOSS_MEETING_SUMMARY.md, piunet/TRAINING_VS_INFERENCE_ANALYSIS.md

Proposed But Unimplemented Improvements

From piunet/TODO_NEXT_IMPROVEMENTS.md (Nov 10, 2025):

  1. Frequency-domain loss -- weight loss by empirical frequency spectrum of ground truth residuals (59.7% energy in mid-freq band). Expected +0.5-1.0 dB.
  2. Deformable convolution in TERN -- replace fixed 5x5 kernel with learnable offsets for per-pixel alignment. Expected +0.5-0.7 dB.
  3. GeLU activation -- trivial change, expected +0.2-0.3 dB.
  4. MS-SSIM loss -- perceptual quality improvement.
  5. Altitude conditioning via FiLM layers -- inject GSD/altitude metadata.
  6. Self-supervised pretraining (MAE) -- pretrain on unlabeled LWIR frames.

None of these were implemented before the project paused.

Current State (as of Nov 2025)

The project reached an impasse: despite extensive experimentation with five model variants, multiple normalization strategies, and loss function modifications, PIUnet could not consistently outperform bicubic upsampling on real LWIR data. The decision was made to research modern architectures -- see RASD+Restormer, QMambaBSR -- rather than continue iterating on PIUnet.

Key artifacts preserved: - Multiple trained checkpoints across all variants - Complete inference pipeline (piunet/inference/run_piunet_inference_and_compare.py) - 26-tile interactive results website (piunet/browser.html) - Comprehensive documentation of every approach tried

File Map

Path Purpose
piunet/models/piunet.py Original PIUNET (Valsesia/Magli)
piunet/models/piunet_residual.py V1 residual with learned gain
piunet/models/piunet_residual_v2.py V2 with GroupNorm, stats-matched bicubic
piunet/models/piunet_lr_fusion.py LR fusion (Stage 1 of 2-stage pipeline)
piunet/models/piunet_lr_fusion_v2.py LR fusion V2 with Flash Attention
piunet/config/config.py ProbaV hyperparameters
piunet/config/config_lwir.py LWIR hyperparameters
piunet/training/train.py Original ProbaV training loop
piunet/training/losses.py Registered L1/uncertainty/evidential losses, cPSNR
piunet/data/datasets.py ProbaV dataset loader
scripts/train_lwir_residual*.py LWIR training scripts (many variants)
scripts/train_lwir_lr_fusion*.py LR fusion training scripts