HighRes-net¶
HighRes-net was the first MFSR architecture explored for the Ceres LWIR super-resolution project. It is a fork of the ESA Kelvin Competition-winning model by team Rarefin (ElementAI + Mila), originally designed for PROBA-V satellite multi-frame super-resolution. The project was active from February 2025 through April 2025, with a late-stage LWIR evaluation effort in October 2025. It was ultimately abandoned in favor of PIUnet.
Repository: /home/geoff/projects/ceres/superrez/HighRes-net/
Original authors: Zhichao Lin, Michel Deudon, Alfredo Kalaitzis, Julien Cornebise (ElementAI); Israel Goytom, Kris Sankaran, Md Rifat Arefin, Samira E. Kahou, Vincent Michalski (Mila)
Paper: "HighRes-net: Recursive Fusion for Multi-Frame Super Resolution" (2019)
Status: Abandoned. Last commit April 1, 2025. Brief LWIR evaluation in October 2025 used existing weights but no new training was done.
Original Architecture¶
The upstream HighRes-net (src/DeepNetworks/HRNet.py) has three components:
-
Encoder -- Per-frame feature extraction. Each LR frame is concatenated with a reference frame (median of all views) to form a 2-channel input, then passed through a Conv2d init layer + N residual blocks (Conv-PReLU-Conv-PReLU with skip connection) + final Conv2d. Output: 64-channel feature maps at LR resolution.
-
RecursiveNet (fusion) -- Pairwise recursive fusion. Views are paired (first half with reversed second half), concatenated across channels (128-ch), fused through a residual block + Conv2d back to 64 channels. Repeat log2(N) times until a single fused representation remains. Supports
alpha_residualskip connections for padded views. -
Decoder -- ConvTranspose2d with stride=3 for 3x upsampling + PReLU + final Conv2d producing 1-channel output. No output activation -- values are unbounded, clipped only at evaluation time.
Key design choice: the reference frame is the median of up to 9 views, providing implicit robustness to outliers without explicit alignment.
Provenance: HighRes-net/src/DeepNetworks/HRNet.py, lines 172-211.
How It Differs from PIUnet¶
| Aspect | HighRes-net | PIUnet |
|---|---|---|
| Alignment | None (implicit via reference frame concat) | TERN module (learned 5x5 registration kernels per frame) |
| Fusion | Recursive pairwise (log2 depth) | Permutation-invariant (mean pooling across temporal dim) |
| Feature extraction | Simple residual blocks, 64 channels | TEFA blocks with 3D convolutions + self-attention, 42 channels |
| Upsampling | ConvTranspose2d (stride 3) | PixelShuffle (3x) |
| Uncertainty | None | Predicts sigma_sr uncertainty map |
| Output | Direct prediction (unbounded) | Global residual learning (output = residual + bicubic(mean_LR)) |
| Complexity | ~200 lines of model code | ~1000+ lines across TEFA, TERN, reconstruction head |
| ProbaV score | Won Kelvin competition (team Rarefin) | Top contender (Valsesia & Magli, IEEE TGRS 2022) |
HighRes-net is architecturally much simpler. Its recursive fusion is elegant but treats all frames identically after pairing. PIUnet's TEFA blocks provide richer per-frame processing through 3D convolutions and temporal attention, and TERN adds explicit (though spatially-invariant) alignment.
LWIR Adaptation Work¶
Custom Architecture: EnhancedProgressiveMFSRNet¶
Rather than using the original HRNet directly on LWIR, significant effort went into building a custom enhanced architecture (src/DeepNetworks/ProgressiveMFSR/EnhancedProgressiveMFSRNet.py). This was a substantially different model:
- Encoder:
EncoderBlockwith instance normalization, SiLU activations, CBAM attention, 5 residual blocks, 128 channels - Fusion:
ProgressiveFusionModulewith cross-frame attention (query/key/value transforms), 5 fusion layers - Decoder:
DecoderBlockwith PixelShuffle upsampling, 4 refinement blocks with attention, CBAM output attention - RRDB blocks: Optional Residual-in-Residual Dense Blocks (from ESRGAN) at pre-fusion, post-fusion, and pre-decoder stages
- Residual learning: Predicted a residual to add to an upsampled normalized reference
This was dramatically more complex than the original HRNet -- essentially a new architecture that shared only the codebase infrastructure.
Provenance: HighRes-net/src/DeepNetworks/ProgressiveMFSR/EnhancedProgressiveMFSRNet.py, lines 56-458.
Other Architectures Explored¶
The repo contains evidence of multiple architectural experiments (Feb-Mar 2025):
HRNetResidual/-- Residual learning variant of the originalNCCShiftNet/-- Normalized cross-correlation based shift estimationTRNet/-- Transformer-based architecture inspired by TR-MISRstablewindowedfusion/andstablewindowedfusion2/-- Windowed fusion approachesAutoBlock/-- Auto-configured blocksMultiScaleHighResNet/-- Multi-scale variant
Commit messages reveal the experimentation mood: "trying to take the quality of the shifts into account", "i am tired of these NCC blocks being modified", "well i trained this for several hours and did not get great results with it".
Provenance: HighRes-net/src/DeepNetworks/ directory listing; git log.
LWIR Dataset Preparation¶
A ProbaV-compatible dataset structure was created via prepare_lwir_dataset.py:
- Source: tiles from lwir_tile_validator/probav_exports/ (Mosaic A: 126 tiles, Mosaic B: 84 tiles)
- Structure: 8 LR views (128x128, uint16) + 1 HR target (384x384) per tile
- 3x scale factor
- Mosaic A for training, Mosaic B for testing
- Symlinks to original tile directories for zero-copy dataset creation
A custom DataLoader_LWIR.py was written to handle the LWIR tile structure, distinct from the ProbaV DataLoader.
Provenance: HighRes-net/prepare_lwir_dataset.py; HighRes-net/src/DataLoaders/DataLoader_LWIR.py.
Training Results¶
ProbaV Training (Baseline Validation)¶
The original HRNet was trained on PROBA-V to validate the training pipeline. Best model achieved -53.35 cPSNR on PROBA-V validation. This was done with the EnhancedProgressiveMFSRNet, not the original HRNet architecture.
Provenance: Model checkpoint models/weights/EnhancedProgressiveMFSRNet_NoShiftModel_batch8_views8_fp32_epoch0_cPSNR_time_2025-03-22-11-05-13/best_model_score_-53.3525_EnhancedProgressiveMFSRNet.pth.
LWIR Baseline Evaluation (October 6, 2025)¶
The ProbaV-trained EnhancedProgressiveMFSRNet was evaluated on 210 LWIR tiles:
| Metric | ML Model | Baseline (Bicubic) | Improvement |
|---|---|---|---|
| Mean cPSNR | -65.92 dB | -65.92 dB | 0.00 dB |
The model produced output identical to bicubic upsampling -- it predicted near-zero residuals for out-of-distribution thermal data. This was deemed "expected and desirable" behavior showing the model was stable and well-regularized, but confirmed that optical-domain pretraining does not transfer to LWIR.
Provenance: HighRes-net/LWIR_BASELINE_RESULTS.md.
LWIR Training Attempts (October 2025)¶
Multiple training attempts on LWIR data were made:
- Fine-tuning from ProbaV weights crashed with AttributeError: module 'train_helpers' has no attribute 'train'
- From-scratch training on 673 tiles with n_views=1 (simplified for debugging) -- training proceeded but diagnostics showed SR output range exploding to [0.46, 4.51] while HR targets were in [0.46, 0.50]
- The config_lwir_scratch.json set val_proportion=0.0 (pure overfitting test) and n_views=1 with the comment "SIMPLIFIED TO 1 VIEW"
The SR output explosion was the same residual learning instability identified in LWIR_TRAINING_STRATEGY.md: the unbounded residual from the decoder was added to a normalized reference, and denormalization amplified the already-unbounded values.
Provenance: HighRes-net/lwir_training.log, lwir_training_REALLY_fixed.log, config/config_lwir_scratch.json.
Saved Weights¶
ProbaV-trained weights (models/weights/)¶
| Model | Date | Score |
|---|---|---|
| EnhancedProgressiveMFSRNet | 2025-03-22 | -53.35 cPSNR |
| HRNetResidualWithAttention | 2025-03-06 | (not recorded) |
| HRNetBaseline + ShiftNet | 2025-02-27 | (not recorded) |
| NCCShiftNet | 2025-02-15 | (not recorded) |
| AttentionMFSR | 2025-03-10 | (not recorded) |
| RefineLocalizedMFSR (3 runs) | 2025-03-09-10 | (not recorded) |
LWIR-trained weights (models/weights_lwir/)¶
11 checkpoint directories from October 6-13, 2025, all EnhancedProgressiveMFSRNet_NoShiftModel_batch4_views*. These represent the failed LWIR training attempts.
Why It Was Abandoned¶
The Transition to PIUnet¶
The HighRes-net repo was the primary working codebase from February through early April 2025. The move to PIUnet happened around mid-2025 for several reasons:
-
Architectural complexity spiral. Starting from HRNet's simple design, the codebase accumulated increasingly complex custom architectures (EnhancedProgressiveMFSRNet with RRDB, CBAM, cross-attention, progressive fusion). None consistently outperformed the original on ProbaV, and the code grew difficult to maintain.
-
Training instability. The EnhancedProgressiveMFSRNet's residual learning approach produced unbounded outputs that diverged on LWIR data. The
LWIR_TRAINING_STRATEGY.mddocument identified the root cause (unbounded residual + denormalization) but listed 5 strategic options without resolving them. -
PIUnet offered a cleaner starting point. PIUnet had explicit alignment (TERN), permutation-invariant fusion, uncertainty estimation, and was published with reproducible code (Valsesia & Magli). It was architecturally more principled for the multi-view LWIR problem.
-
No pretrained weights for HighRes-net. The original competition weights were never released. Training from scratch was required regardless, so switching to a better-documented architecture had low switching cost.
What Was Preserved¶
The HighRes-net codebase contributed several reusable assets to the broader project:
- The LWIR dataset preparation pipeline (prepare_lwir_dataset.py)
- The LWIR DataLoader (DataLoader_LWIR.py)
- The baseline evaluation methodology (LWIR_BASELINE_RESULTS.md)
- The training strategy analysis (LWIR_TRAINING_STRATEGY.md)
- The validated tile dataset (210 tiles from Mosaic A + B)
Struggle Log¶
Struggle: Architecture Complexity Spiral¶
- Hypothesis: Adding RRDB blocks, CBAM attention, cross-frame attention, and progressive fusion to HRNet would improve super-resolution quality.
- Failure Mode: 8+ distinct architectures were created over 2 months (Feb-Apr 2025), none clearly superior. The EnhancedProgressiveMFSRNet had 128 channels, 5 encoder layers, 5 fusion layers, 4 decoder refinement blocks, and 3 sets of RRDB blocks -- far more complex than the competition-winning original.
- Root Cause: No systematic ablation study. Changes were made based on intuition and conversations with LLMs ("new networking I am working on based on conversations with grok"). Each architecture was trained briefly and abandoned before convergence.
- Anti-Pattern: Do not add architectural complexity without first establishing a solid baseline with the unmodified architecture on your target domain. Always ablate one change at a time.
Struggle: Residual Learning Explosion on LWIR¶
- Hypothesis: Predicting a residual (correction) to an upsampled reference would be easier for the network than predicting the full SR image.
- Failure Mode: SR outputs exploded to range [0.46, 4.51] while targets were [0.46, 0.50]. The model predicted unbounded residuals that, after denormalization, produced values far outside the valid range.
- Root Cause: Min-max normalization + unbounded residual + denormalization created a amplification loop. The original HRNet avoided this by having no normalization/denormalization -- it predicted raw pixel values and clipped at evaluation time.
- Anti-Pattern: Do not add normalization/denormalization around residual learning unless you also bound the residual (e.g., with tanh). The original HRNet's "no activation on decoder output" was intentional.
Struggle: ProbaV-to-LWIR Transfer Failure¶
- Hypothesis: A model trained on PROBA-V optical satellite imagery could be fine-tuned on LWIR thermal imagery.
- Failure Mode: Zero improvement over bicubic on LWIR evaluation. The fine-tuning script itself crashed before training could begin.
- Root Cause: Optical (RGB/NIR) and thermal (LWIR 8-14um) are fundamentally different domains -- different physics (reflectance vs. emission), different intensity distributions (0-65k vs. 29k-34k narrow range), different spatial statistics.
- Anti-Pattern: Do not assume transfer learning will work across spectral domains. LWIR thermal imagery requires training from scratch or transfer from other thermal datasets.
Timeline¶
| Date | Event |
|---|---|
| 2019-07-16 | Original HighRes-net repo created (ElementAI) |
| 2025-02-14 | First Ceres commits: NCCShiftNet, NetworkDebugBaseClass |
| 2025-02-20 | Windowed NCC, attention-enhanced shift estimation |
| 2025-02-24 | HRNet upgrade with local cross-attention fusion |
| 2025-03-01 | New model with Swin Transformer-inspired residual prediction |
| 2025-03-04 | Added residual network variant |
| 2025-03-06 | HRNetResidualWithAttention trained, evaluated at -51.8 cPSNR |
| 2025-03-09-10 | RefineLocalizedMFSR and AttentionMFSR experiments |
| 2025-03-22 | EnhancedProgressiveMFSRNet best ProbaV score: -53.35 cPSNR |
| 2025-04-01 | Last significant commits (evaluation + diagram tools) |
| 2025-10-06 | LWIR baseline evaluation: 0.00 dB improvement over bicubic |
| 2025-10-13 | LWIR from-scratch training attempts (diverged) |
Key Files¶
| File | Purpose |
|---|---|
src/DeepNetworks/HRNet.py |
Original HRNet architecture (Encoder, RecursiveNet, Decoder) |
src/DeepNetworks/ProgressiveMFSR/EnhancedProgressiveMFSRNet.py |
Custom enhanced architecture with RRDB, CBAM, progressive fusion |
prepare_lwir_dataset.py |
Creates ProbaV-compatible symlink structure for LWIR tiles |
evaluate_lwir.py |
Full LWIR evaluation with example image generation |
src/DataLoaders/DataLoader_LWIR.py |
LWIR-specific dataset handler |
config/config.json |
Main training config (ProbaV) |
config/config_lwir_scratch.json |
LWIR from-scratch training config |
LWIR_BASELINE_RESULTS.md |
October 2025 evaluation results document |
LWIR_TRAINING_STRATEGY.md |
Strategic analysis of training options |
src/train.py |
Main training script |
train_lwir_finetune.py |
LWIR fine-tuning script (broken) |