Skip to content

HighRes-net

HighRes-net was the first MFSR architecture explored for the Ceres LWIR super-resolution project. It is a fork of the ESA Kelvin Competition-winning model by team Rarefin (ElementAI + Mila), originally designed for PROBA-V satellite multi-frame super-resolution. The project was active from February 2025 through April 2025, with a late-stage LWIR evaluation effort in October 2025. It was ultimately abandoned in favor of PIUnet.

Repository: /home/geoff/projects/ceres/superrez/HighRes-net/ Original authors: Zhichao Lin, Michel Deudon, Alfredo Kalaitzis, Julien Cornebise (ElementAI); Israel Goytom, Kris Sankaran, Md Rifat Arefin, Samira E. Kahou, Vincent Michalski (Mila) Paper: "HighRes-net: Recursive Fusion for Multi-Frame Super Resolution" (2019) Status: Abandoned. Last commit April 1, 2025. Brief LWIR evaluation in October 2025 used existing weights but no new training was done.

Original Architecture

The upstream HighRes-net (src/DeepNetworks/HRNet.py) has three components:

  1. Encoder -- Per-frame feature extraction. Each LR frame is concatenated with a reference frame (median of all views) to form a 2-channel input, then passed through a Conv2d init layer + N residual blocks (Conv-PReLU-Conv-PReLU with skip connection) + final Conv2d. Output: 64-channel feature maps at LR resolution.

  2. RecursiveNet (fusion) -- Pairwise recursive fusion. Views are paired (first half with reversed second half), concatenated across channels (128-ch), fused through a residual block + Conv2d back to 64 channels. Repeat log2(N) times until a single fused representation remains. Supports alpha_residual skip connections for padded views.

  3. Decoder -- ConvTranspose2d with stride=3 for 3x upsampling + PReLU + final Conv2d producing 1-channel output. No output activation -- values are unbounded, clipped only at evaluation time.

Key design choice: the reference frame is the median of up to 9 views, providing implicit robustness to outliers without explicit alignment.

Provenance: HighRes-net/src/DeepNetworks/HRNet.py, lines 172-211.

How It Differs from PIUnet

Aspect HighRes-net PIUnet
Alignment None (implicit via reference frame concat) TERN module (learned 5x5 registration kernels per frame)
Fusion Recursive pairwise (log2 depth) Permutation-invariant (mean pooling across temporal dim)
Feature extraction Simple residual blocks, 64 channels TEFA blocks with 3D convolutions + self-attention, 42 channels
Upsampling ConvTranspose2d (stride 3) PixelShuffle (3x)
Uncertainty None Predicts sigma_sr uncertainty map
Output Direct prediction (unbounded) Global residual learning (output = residual + bicubic(mean_LR))
Complexity ~200 lines of model code ~1000+ lines across TEFA, TERN, reconstruction head
ProbaV score Won Kelvin competition (team Rarefin) Top contender (Valsesia & Magli, IEEE TGRS 2022)

HighRes-net is architecturally much simpler. Its recursive fusion is elegant but treats all frames identically after pairing. PIUnet's TEFA blocks provide richer per-frame processing through 3D convolutions and temporal attention, and TERN adds explicit (though spatially-invariant) alignment.

LWIR Adaptation Work

Custom Architecture: EnhancedProgressiveMFSRNet

Rather than using the original HRNet directly on LWIR, significant effort went into building a custom enhanced architecture (src/DeepNetworks/ProgressiveMFSR/EnhancedProgressiveMFSRNet.py). This was a substantially different model:

  • Encoder: EncoderBlock with instance normalization, SiLU activations, CBAM attention, 5 residual blocks, 128 channels
  • Fusion: ProgressiveFusionModule with cross-frame attention (query/key/value transforms), 5 fusion layers
  • Decoder: DecoderBlock with PixelShuffle upsampling, 4 refinement blocks with attention, CBAM output attention
  • RRDB blocks: Optional Residual-in-Residual Dense Blocks (from ESRGAN) at pre-fusion, post-fusion, and pre-decoder stages
  • Residual learning: Predicted a residual to add to an upsampled normalized reference

This was dramatically more complex than the original HRNet -- essentially a new architecture that shared only the codebase infrastructure.

Provenance: HighRes-net/src/DeepNetworks/ProgressiveMFSR/EnhancedProgressiveMFSRNet.py, lines 56-458.

Other Architectures Explored

The repo contains evidence of multiple architectural experiments (Feb-Mar 2025):

  • HRNetResidual/ -- Residual learning variant of the original
  • NCCShiftNet/ -- Normalized cross-correlation based shift estimation
  • TRNet/ -- Transformer-based architecture inspired by TR-MISR
  • stablewindowedfusion/ and stablewindowedfusion2/ -- Windowed fusion approaches
  • AutoBlock/ -- Auto-configured blocks
  • MultiScaleHighResNet/ -- Multi-scale variant

Commit messages reveal the experimentation mood: "trying to take the quality of the shifts into account", "i am tired of these NCC blocks being modified", "well i trained this for several hours and did not get great results with it".

Provenance: HighRes-net/src/DeepNetworks/ directory listing; git log.

LWIR Dataset Preparation

A ProbaV-compatible dataset structure was created via prepare_lwir_dataset.py: - Source: tiles from lwir_tile_validator/probav_exports/ (Mosaic A: 126 tiles, Mosaic B: 84 tiles) - Structure: 8 LR views (128x128, uint16) + 1 HR target (384x384) per tile - 3x scale factor - Mosaic A for training, Mosaic B for testing - Symlinks to original tile directories for zero-copy dataset creation

A custom DataLoader_LWIR.py was written to handle the LWIR tile structure, distinct from the ProbaV DataLoader.

Provenance: HighRes-net/prepare_lwir_dataset.py; HighRes-net/src/DataLoaders/DataLoader_LWIR.py.

Training Results

ProbaV Training (Baseline Validation)

The original HRNet was trained on PROBA-V to validate the training pipeline. Best model achieved -53.35 cPSNR on PROBA-V validation. This was done with the EnhancedProgressiveMFSRNet, not the original HRNet architecture.

Provenance: Model checkpoint models/weights/EnhancedProgressiveMFSRNet_NoShiftModel_batch8_views8_fp32_epoch0_cPSNR_time_2025-03-22-11-05-13/best_model_score_-53.3525_EnhancedProgressiveMFSRNet.pth.

LWIR Baseline Evaluation (October 6, 2025)

The ProbaV-trained EnhancedProgressiveMFSRNet was evaluated on 210 LWIR tiles:

Metric ML Model Baseline (Bicubic) Improvement
Mean cPSNR -65.92 dB -65.92 dB 0.00 dB

The model produced output identical to bicubic upsampling -- it predicted near-zero residuals for out-of-distribution thermal data. This was deemed "expected and desirable" behavior showing the model was stable and well-regularized, but confirmed that optical-domain pretraining does not transfer to LWIR.

Provenance: HighRes-net/LWIR_BASELINE_RESULTS.md.

LWIR Training Attempts (October 2025)

Multiple training attempts on LWIR data were made: - Fine-tuning from ProbaV weights crashed with AttributeError: module 'train_helpers' has no attribute 'train' - From-scratch training on 673 tiles with n_views=1 (simplified for debugging) -- training proceeded but diagnostics showed SR output range exploding to [0.46, 4.51] while HR targets were in [0.46, 0.50] - The config_lwir_scratch.json set val_proportion=0.0 (pure overfitting test) and n_views=1 with the comment "SIMPLIFIED TO 1 VIEW"

The SR output explosion was the same residual learning instability identified in LWIR_TRAINING_STRATEGY.md: the unbounded residual from the decoder was added to a normalized reference, and denormalization amplified the already-unbounded values.

Provenance: HighRes-net/lwir_training.log, lwir_training_REALLY_fixed.log, config/config_lwir_scratch.json.

Saved Weights

ProbaV-trained weights (models/weights/)

Model Date Score
EnhancedProgressiveMFSRNet 2025-03-22 -53.35 cPSNR
HRNetResidualWithAttention 2025-03-06 (not recorded)
HRNetBaseline + ShiftNet 2025-02-27 (not recorded)
NCCShiftNet 2025-02-15 (not recorded)
AttentionMFSR 2025-03-10 (not recorded)
RefineLocalizedMFSR (3 runs) 2025-03-09-10 (not recorded)

LWIR-trained weights (models/weights_lwir/)

11 checkpoint directories from October 6-13, 2025, all EnhancedProgressiveMFSRNet_NoShiftModel_batch4_views*. These represent the failed LWIR training attempts.

Why It Was Abandoned

The Transition to PIUnet

The HighRes-net repo was the primary working codebase from February through early April 2025. The move to PIUnet happened around mid-2025 for several reasons:

  1. Architectural complexity spiral. Starting from HRNet's simple design, the codebase accumulated increasingly complex custom architectures (EnhancedProgressiveMFSRNet with RRDB, CBAM, cross-attention, progressive fusion). None consistently outperformed the original on ProbaV, and the code grew difficult to maintain.

  2. Training instability. The EnhancedProgressiveMFSRNet's residual learning approach produced unbounded outputs that diverged on LWIR data. The LWIR_TRAINING_STRATEGY.md document identified the root cause (unbounded residual + denormalization) but listed 5 strategic options without resolving them.

  3. PIUnet offered a cleaner starting point. PIUnet had explicit alignment (TERN), permutation-invariant fusion, uncertainty estimation, and was published with reproducible code (Valsesia & Magli). It was architecturally more principled for the multi-view LWIR problem.

  4. No pretrained weights for HighRes-net. The original competition weights were never released. Training from scratch was required regardless, so switching to a better-documented architecture had low switching cost.

What Was Preserved

The HighRes-net codebase contributed several reusable assets to the broader project: - The LWIR dataset preparation pipeline (prepare_lwir_dataset.py) - The LWIR DataLoader (DataLoader_LWIR.py) - The baseline evaluation methodology (LWIR_BASELINE_RESULTS.md) - The training strategy analysis (LWIR_TRAINING_STRATEGY.md) - The validated tile dataset (210 tiles from Mosaic A + B)

Struggle Log

Struggle: Architecture Complexity Spiral

  • Hypothesis: Adding RRDB blocks, CBAM attention, cross-frame attention, and progressive fusion to HRNet would improve super-resolution quality.
  • Failure Mode: 8+ distinct architectures were created over 2 months (Feb-Apr 2025), none clearly superior. The EnhancedProgressiveMFSRNet had 128 channels, 5 encoder layers, 5 fusion layers, 4 decoder refinement blocks, and 3 sets of RRDB blocks -- far more complex than the competition-winning original.
  • Root Cause: No systematic ablation study. Changes were made based on intuition and conversations with LLMs ("new networking I am working on based on conversations with grok"). Each architecture was trained briefly and abandoned before convergence.
  • Anti-Pattern: Do not add architectural complexity without first establishing a solid baseline with the unmodified architecture on your target domain. Always ablate one change at a time.

Struggle: Residual Learning Explosion on LWIR

  • Hypothesis: Predicting a residual (correction) to an upsampled reference would be easier for the network than predicting the full SR image.
  • Failure Mode: SR outputs exploded to range [0.46, 4.51] while targets were [0.46, 0.50]. The model predicted unbounded residuals that, after denormalization, produced values far outside the valid range.
  • Root Cause: Min-max normalization + unbounded residual + denormalization created a amplification loop. The original HRNet avoided this by having no normalization/denormalization -- it predicted raw pixel values and clipped at evaluation time.
  • Anti-Pattern: Do not add normalization/denormalization around residual learning unless you also bound the residual (e.g., with tanh). The original HRNet's "no activation on decoder output" was intentional.

Struggle: ProbaV-to-LWIR Transfer Failure

  • Hypothesis: A model trained on PROBA-V optical satellite imagery could be fine-tuned on LWIR thermal imagery.
  • Failure Mode: Zero improvement over bicubic on LWIR evaluation. The fine-tuning script itself crashed before training could begin.
  • Root Cause: Optical (RGB/NIR) and thermal (LWIR 8-14um) are fundamentally different domains -- different physics (reflectance vs. emission), different intensity distributions (0-65k vs. 29k-34k narrow range), different spatial statistics.
  • Anti-Pattern: Do not assume transfer learning will work across spectral domains. LWIR thermal imagery requires training from scratch or transfer from other thermal datasets.

Timeline

Date Event
2019-07-16 Original HighRes-net repo created (ElementAI)
2025-02-14 First Ceres commits: NCCShiftNet, NetworkDebugBaseClass
2025-02-20 Windowed NCC, attention-enhanced shift estimation
2025-02-24 HRNet upgrade with local cross-attention fusion
2025-03-01 New model with Swin Transformer-inspired residual prediction
2025-03-04 Added residual network variant
2025-03-06 HRNetResidualWithAttention trained, evaluated at -51.8 cPSNR
2025-03-09-10 RefineLocalizedMFSR and AttentionMFSR experiments
2025-03-22 EnhancedProgressiveMFSRNet best ProbaV score: -53.35 cPSNR
2025-04-01 Last significant commits (evaluation + diagram tools)
2025-10-06 LWIR baseline evaluation: 0.00 dB improvement over bicubic
2025-10-13 LWIR from-scratch training attempts (diverged)

Key Files

File Purpose
src/DeepNetworks/HRNet.py Original HRNet architecture (Encoder, RecursiveNet, Decoder)
src/DeepNetworks/ProgressiveMFSR/EnhancedProgressiveMFSRNet.py Custom enhanced architecture with RRDB, CBAM, progressive fusion
prepare_lwir_dataset.py Creates ProbaV-compatible symlink structure for LWIR tiles
evaluate_lwir.py Full LWIR evaluation with example image generation
src/DataLoaders/DataLoader_LWIR.py LWIR-specific dataset handler
config/config.json Main training config (ProbaV)
config/config_lwir_scratch.json LWIR from-scratch training config
LWIR_BASELINE_RESULTS.md October 2025 evaluation results document
LWIR_TRAINING_STRATEGY.md Strategic analysis of training options
src/train.py Main training script
train_lwir_finetune.py LWIR fine-tuning script (broken)