HighRes-net¶

HighRes-net was the first MFSR architecture explored for the Ceres LWIR super-resolution project. It is a fork of the ESA Kelvin Competition-winning model by team Rarefin (ElementAI + Mila), originally designed for PROBA-V satellite multi-frame super-resolution. The project was active from February 2025 through April 2025, with a late-stage LWIR evaluation effort in October 2025. It was ultimately abandoned in favor of PIUnet.

Repository: /home/geoff/projects/ceres/superrez/HighRes-net/ Original authors: Zhichao Lin, Michel Deudon, Alfredo Kalaitzis, Julien Cornebise (ElementAI); Israel Goytom, Kris Sankaran, Md Rifat Arefin, Samira E. Kahou, Vincent Michalski (Mila) Paper: "HighRes-net: Recursive Fusion for Multi-Frame Super Resolution" (2019) Status: Abandoned. Last commit April 1, 2025. Brief LWIR evaluation in October 2025 used existing weights but no new training was done.

Original Architecture¶

The upstream HighRes-net (src/DeepNetworks/HRNet.py) has three components:

Encoder -- Per-frame feature extraction. Each LR frame is concatenated with a reference frame (median of all views) to form a 2-channel input, then passed through a Conv2d init layer + N residual blocks (Conv-PReLU-Conv-PReLU with skip connection) + final Conv2d. Output: 64-channel feature maps at LR resolution.
RecursiveNet (fusion) -- Pairwise recursive fusion. Views are paired (first half with reversed second half), concatenated across channels (128-ch), fused through a residual block + Conv2d back to 64 channels. Repeat log2(N) times until a single fused representation remains. Supports alpha_residual skip connections for padded views.
Decoder -- ConvTranspose2d with stride=3 for 3x upsampling + PReLU + final Conv2d producing 1-channel output. No output activation -- values are unbounded, clipped only at evaluation time.

Key design choice: the reference frame is the median of up to 9 views, providing implicit robustness to outliers without explicit alignment.

Provenance: HighRes-net/src/DeepNetworks/HRNet.py, lines 172-211.

How It Differs from PIUnet¶

Aspect	HighRes-net	PIUnet
Alignment	None (implicit via reference frame concat)	TERN module (learned 5x5 registration kernels per frame)
Fusion	Recursive pairwise (log2 depth)	Permutation-invariant (mean pooling across temporal dim)
Feature extraction	Simple residual blocks, 64 channels	TEFA blocks with 3D convolutions + self-attention, 42 channels
Upsampling	ConvTranspose2d (stride 3)	PixelShuffle (3x)
Uncertainty	None	Predicts sigma_sr uncertainty map
Output	Direct prediction (unbounded)	Global residual learning (output = residual + bicubic(mean_LR))
Complexity	~200 lines of model code	~1000+ lines across TEFA, TERN, reconstruction head
ProbaV score	Won Kelvin competition (team Rarefin)	Top contender (Valsesia & Magli, IEEE TGRS 2022)

HighRes-net is architecturally much simpler. Its recursive fusion is elegant but treats all frames identically after pairing. PIUnet's TEFA blocks provide richer per-frame processing through 3D convolutions and temporal attention, and TERN adds explicit (though spatially-invariant) alignment.

LWIR Adaptation Work¶

Custom Architecture: EnhancedProgressiveMFSRNet¶

Rather than using the original HRNet directly on LWIR, significant effort went into building a custom enhanced architecture (src/DeepNetworks/ProgressiveMFSR/EnhancedProgressiveMFSRNet.py). This was a substantially different model:

Encoder: EncoderBlock with instance normalization, SiLU activations, CBAM attention, 5 residual blocks, 128 channels
Fusion: ProgressiveFusionModule with cross-frame attention (query/key/value transforms), 5 fusion layers
Decoder: DecoderBlock with PixelShuffle upsampling, 4 refinement blocks with attention, CBAM output attention
RRDB blocks: Optional Residual-in-Residual Dense Blocks (from ESRGAN) at pre-fusion, post-fusion, and pre-decoder stages
Residual learning: Predicted a residual to add to an upsampled normalized reference

This was dramatically more complex than the original HRNet -- essentially a new architecture that shared only the codebase infrastructure.

Provenance: HighRes-net/src/DeepNetworks/ProgressiveMFSR/EnhancedProgressiveMFSRNet.py, lines 56-458.

Other Architectures Explored¶

The repo contains evidence of multiple architectural experiments (Feb-Mar 2025):

HRNetResidual/ -- Residual learning variant of the original
NCCShiftNet/ -- Normalized cross-correlation based shift estimation
TRNet/ -- Transformer-based architecture inspired by TR-MISR
stablewindowedfusion/ and stablewindowedfusion2/ -- Windowed fusion approaches
AutoBlock/ -- Auto-configured blocks
MultiScaleHighResNet/ -- Multi-scale variant

Commit messages reveal the experimentation mood: "trying to take the quality of the shifts into account", "i am tired of these NCC blocks being modified", "well i trained this for several hours and did not get great results with it".

Provenance: HighRes-net/src/DeepNetworks/ directory listing; git log.

LWIR Dataset Preparation¶

A ProbaV-compatible dataset structure was created via prepare_lwir_dataset.py: - Source: tiles from lwir_tile_validator/probav_exports/ (Mosaic A: 126 tiles, Mosaic B: 84 tiles) - Structure: 8 LR views (128x128, uint16) + 1 HR target (384x384) per tile - 3x scale factor - Mosaic A for training, Mosaic B for testing - Symlinks to original tile directories for zero-copy dataset creation

A custom DataLoader_LWIR.py was written to handle the LWIR tile structure, distinct from the ProbaV DataLoader.

Provenance: HighRes-net/prepare_lwir_dataset.py; HighRes-net/src/DataLoaders/DataLoader_LWIR.py.

Training Results¶

ProbaV Training (Baseline Validation)¶

The original HRNet was trained on PROBA-V to validate the training pipeline. Best model achieved -53.35 cPSNR on PROBA-V validation. This was done with the EnhancedProgressiveMFSRNet, not the original HRNet architecture.

Provenance: Model checkpoint models/weights/EnhancedProgressiveMFSRNet_NoShiftModel_batch8_views8_fp32_epoch0_cPSNR_time_2025-03-22-11-05-13/best_model_score_-53.3525_EnhancedProgressiveMFSRNet.pth.

LWIR Baseline Evaluation (October 6, 2025)¶

The ProbaV-trained EnhancedProgressiveMFSRNet was evaluated on 210 LWIR tiles:

Metric	ML Model	Baseline (Bicubic)	Improvement
Mean cPSNR	-65.92 dB	-65.92 dB	0.00 dB

The model produced output identical to bicubic upsampling -- it predicted near-zero residuals for out-of-distribution thermal data. This was deemed "expected and desirable" behavior showing the model was stable and well-regularized, but confirmed that optical-domain pretraining does not transfer to LWIR.

Provenance: HighRes-net/LWIR_BASELINE_RESULTS.md.

LWIR Training Attempts (October 2025)¶

Multiple training attempts on LWIR data were made: - Fine-tuning from ProbaV weights crashed with AttributeError: module 'train_helpers' has no attribute 'train' - From-scratch training on 673 tiles with n_views=1 (simplified for debugging) -- training proceeded but diagnostics showed SR output range exploding to [0.46, 4.51] while HR targets were in [0.46, 0.50] - The config_lwir_scratch.json set val_proportion=0.0 (pure overfitting test) and n_views=1 with the comment "SIMPLIFIED TO 1 VIEW"

The SR output explosion was the same residual learning instability identified in LWIR_TRAINING_STRATEGY.md: the unbounded residual from the decoder was added to a normalized reference, and denormalization amplified the already-unbounded values.

Provenance: HighRes-net/lwir_training.log, lwir_training_REALLY_fixed.log, config/config_lwir_scratch.json.

Saved Weights¶

ProbaV-trained weights (`models/weights/`)¶

Model	Date	Score
EnhancedProgressiveMFSRNet	2025-03-22	-53.35 cPSNR
HRNetResidualWithAttention	2025-03-06	(not recorded)
HRNetBaseline + ShiftNet	2025-02-27	(not recorded)
NCCShiftNet	2025-02-15	(not recorded)
AttentionMFSR	2025-03-10	(not recorded)
RefineLocalizedMFSR (3 runs)	2025-03-09-10	(not recorded)

LWIR-trained weights (`models/weights_lwir/`)¶

11 checkpoint directories from October 6-13, 2025, all EnhancedProgressiveMFSRNet_NoShiftModel_batch4_views*. These represent the failed LWIR training attempts.

Why It Was Abandoned¶

The Transition to PIUnet¶

The HighRes-net repo was the primary working codebase from February through early April 2025. The move to PIUnet happened around mid-2025 for several reasons:

Architectural complexity spiral. Starting from HRNet's simple design, the codebase accumulated increasingly complex custom architectures (EnhancedProgressiveMFSRNet with RRDB, CBAM, cross-attention, progressive fusion). None consistently outperformed the original on ProbaV, and the code grew difficult to maintain.
Training instability. The EnhancedProgressiveMFSRNet's residual learning approach produced unbounded outputs that diverged on LWIR data. The LWIR_TRAINING_STRATEGY.md document identified the root cause (unbounded residual + denormalization) but listed 5 strategic options without resolving them.
PIUnet offered a cleaner starting point. PIUnet had explicit alignment (TERN), permutation-invariant fusion, uncertainty estimation, and was published with reproducible code (Valsesia & Magli). It was architecturally more principled for the multi-view LWIR problem.
No pretrained weights for HighRes-net. The original competition weights were never released. Training from scratch was required regardless, so switching to a better-documented architecture had low switching cost.

What Was Preserved¶

The HighRes-net codebase contributed several reusable assets to the broader project: - The LWIR dataset preparation pipeline (prepare_lwir_dataset.py) - The LWIR DataLoader (DataLoader_LWIR.py) - The baseline evaluation methodology (LWIR_BASELINE_RESULTS.md) - The training strategy analysis (LWIR_TRAINING_STRATEGY.md) - The validated tile dataset (210 tiles from Mosaic A + B)

Struggle Log¶

Struggle: Architecture Complexity Spiral¶

Hypothesis: Adding RRDB blocks, CBAM attention, cross-frame attention, and progressive fusion to HRNet would improve super-resolution quality.
Failure Mode: 8+ distinct architectures were created over 2 months (Feb-Apr 2025), none clearly superior. The EnhancedProgressiveMFSRNet had 128 channels, 5 encoder layers, 5 fusion layers, 4 decoder refinement blocks, and 3 sets of RRDB blocks -- far more complex than the competition-winning original.
Root Cause: No systematic ablation study. Changes were made based on intuition and conversations with LLMs ("new networking I am working on based on conversations with grok"). Each architecture was trained briefly and abandoned before convergence.
Anti-Pattern: Do not add architectural complexity without first establishing a solid baseline with the unmodified architecture on your target domain. Always ablate one change at a time.

Struggle: Residual Learning Explosion on LWIR¶

Hypothesis: Predicting a residual (correction) to an upsampled reference would be easier for the network than predicting the full SR image.
Failure Mode: SR outputs exploded to range [0.46, 4.51] while targets were [0.46, 0.50]. The model predicted unbounded residuals that, after denormalization, produced values far outside the valid range.
Root Cause: Min-max normalization + unbounded residual + denormalization created a amplification loop. The original HRNet avoided this by having no normalization/denormalization -- it predicted raw pixel values and clipped at evaluation time.
Anti-Pattern: Do not add normalization/denormalization around residual learning unless you also bound the residual (e.g., with tanh). The original HRNet's "no activation on decoder output" was intentional.

Struggle: ProbaV-to-LWIR Transfer Failure¶

Hypothesis: A model trained on PROBA-V optical satellite imagery could be fine-tuned on LWIR thermal imagery.
Failure Mode: Zero improvement over bicubic on LWIR evaluation. The fine-tuning script itself crashed before training could begin.
Root Cause: Optical (RGB/NIR) and thermal (LWIR 8-14um) are fundamentally different domains -- different physics (reflectance vs. emission), different intensity distributions (0-65k vs. 29k-34k narrow range), different spatial statistics.
Anti-Pattern: Do not assume transfer learning will work across spectral domains. LWIR thermal imagery requires training from scratch or transfer from other thermal datasets.

Timeline¶

Date	Event
2019-07-16	Original HighRes-net repo created (ElementAI)
2025-02-14	First Ceres commits: NCCShiftNet, NetworkDebugBaseClass
2025-02-20	Windowed NCC, attention-enhanced shift estimation
2025-02-24	HRNet upgrade with local cross-attention fusion
2025-03-01	New model with Swin Transformer-inspired residual prediction
2025-03-04	Added residual network variant
2025-03-06	HRNetResidualWithAttention trained, evaluated at -51.8 cPSNR
2025-03-09-10	RefineLocalizedMFSR and AttentionMFSR experiments
2025-03-22	EnhancedProgressiveMFSRNet best ProbaV score: -53.35 cPSNR
2025-04-01	Last significant commits (evaluation + diagram tools)
2025-10-06	LWIR baseline evaluation: 0.00 dB improvement over bicubic
2025-10-13	LWIR from-scratch training attempts (diverged)

Key Files¶

File	Purpose
`src/DeepNetworks/HRNet.py`	Original HRNet architecture (Encoder, RecursiveNet, Decoder)
`src/DeepNetworks/ProgressiveMFSR/EnhancedProgressiveMFSRNet.py`	Custom enhanced architecture with RRDB, CBAM, progressive fusion
`prepare_lwir_dataset.py`	Creates ProbaV-compatible symlink structure for LWIR tiles
`evaluate_lwir.py`	Full LWIR evaluation with example image generation
`src/DataLoaders/DataLoader_LWIR.py`	LWIR-specific dataset handler
`config/config.json`	Main training config (ProbaV)
`config/config_lwir_scratch.json`	LWIR from-scratch training config
`LWIR_BASELINE_RESULTS.md`	October 2025 evaluation results document
`LWIR_TRAINING_STRATEGY.md`	Strategic analysis of training options
`src/train.py`	Main training script
`train_lwir_finetune.py`	LWIR fine-tuning script (broken)