Hephaestus¶

Hephaestus is an interactive thermal image registration system built with PyQt6 and OpenGL. It provides real-time red-cyan anaglyph display with interactive corner manipulation for homography-based image alignment, plus AI-assisted backend refinement using multiple registration algorithms.

Repository: /home/geoff/projects/ceres/superrez/hephaestus/ Active: Jul 2025 -- Oct 2025 (100 commits) Conda env: superglue-env

Why It Was Built¶

Existing registration tools could not handle the specific challenges of LWIR image alignment:

Phase 1 (LWIR-to-LWIR): Aligning individual HA thermal frames to LA thermal mosaics for MFSR dataset creation. Standard tools lacked the interactive feedback loop needed to verify alignment quality on low-contrast thermal imagery. The real-time anaglyph overlay lets a human operator instantly see misalignment.
Phase 2 (LWIR-to-VNIR cross-modality): Aligning thermal mosaics to visible-spectrum mosaics. This is a harder problem because thermal and visible images have fundamentally different appearance -- edges, contrast, and textures differ across modalities. Off-the-shelf feature matchers like SuperGlue (trained on same-modality natural images) fail on this task.

Architecture¶

Frontend (PyQt6 + OpenGL)¶

Component	File	Role
Main Window	`frontend/main_window.py`	Top-level coordinator, manages all major components and UI
Perspective Widget	`frontend/perspective_widget.py`	OpenGL rendering -- real-time anaglyph, corner dragging, rotation, scaling
Shader Renderer	`frontend/shader_renderer.py`	GPU shader compilation and rendering pipeline
Time Series Widget	`frontend/time_series_widget.py`	Score history graph for tracking alignment progress
VRAM Monitor	`frontend/vram_monitor_widget.py`	Live GPU memory usage display

State Management¶

Component	File	Role
Registration Session	`models/registration_session.py`	Central alignment state singleton -- homography, coordinate transforms, crop building
Registration Tracker	`models/registration_tracker.py`	Undo/redo history, quality metric tracking over time
Image Manager	`models/image_manager.py`	Centralized image loading and caching singleton
Notification Center	`models/notification_center.py`	Central message bus (iOS-style NotificationCenter pattern) for loose coupling

Backend Registration Methods¶

The codebase supports multiple registration algorithms because different modality combinations require different approaches:

Method	Files	Use Case	Status
SuperGlue	`backend/superglue_bridge.py`, `superglue_gpu.py`	LWIR-to-LWIR (same modality)	Works well
MatchAnything	`backend/matchanything_bridge.py`, `matchanything_gpu.py`	LWIR-to-VNIR (cross-modality)	Integrated, ready for testing
RAFT	`backend/raft_bridge.py`, `raft_gpu.py`	Dense optical flow, small displacements	Available
ECC	`backend/ecc_bridge.py`	Intensity-based fallback, subpixel refinement	Available

All deep learning matchers support tiling for mosaic-scale images (3598x2972+) that exceed VRAM even on RTX 5080 (16GB). Tiles are processed in batched GPU forward passes with coordinate offset correction. (Source: hephaestus/CLAUDE.md, lines 51--65)

Homography Estimation¶

Modern RANSAC variants replace classical RANSAC for robust homography fitting:

MAGSAC++ (default) -- uses match confidence scores as sampling priors
GC-RANSAC -- graph-cut spatial coherence enforcement
PROSAC -- progressive sampling by quality
ACCURATE / PARALLEL -- OpenCV USAC variants

(Source: hephaestus/HANDOFF.md, lines 79--100)

Coordinate Systems¶

The application manages four coordinate spaces through RegistrationMatrix:

IMAGE Space -- full reference image coordinates (e.g., 2048x2048 mosaic)
TEMPLATE Space -- template image coordinates (e.g., 512x640 thermal frame)
WORLD Space -- display/widget coordinates for GUI rendering
CROP Space -- cropped region coordinates for processing pipelines

RegistrationSession manages all transformations between these spaces.

Key Features¶

Real-time OpenGL-accelerated anaglyph rendering (red-cyan overlay) at 60fps
Interactive corner dragging for perspective transforms
Alt+drag rotation, Ctrl+drag isotropic scaling
Multiple render modes: anaglyph, reference, template, difference
AI-assisted registration refinement (trigger from GUI, runs backend method, updates alignment)
MFSR dataset browsing with mosaic A/B switching (M key)
Auto-save homography matrices to JSON (compatible with external tools via "H" key)
Score history tracking and visualization with undo/redo
File-based debug logger (debug_helpers/file_debug_logger.py) for per-file console noise control

Data Flow¶

Image Loading¶

SimpleIndexBrowser navigates to matrix JSON -> images_load_requested notification -> ImageManager loads images -> images_loaded notification -> RegistrationSession binds and updates sizes -> PerspectiveWidget renders.

Alignment Update¶

User drags corner (or backend finds better alignment) -> homography_changed notification -> RegistrationSession updates -> RegistrationTracker scores -> TimeSeriesWidget updates graph.

Backend Optimization¶

User triggers button -> ai_refinement_requested notification -> RegistrationManager gets crop from RegistrationSession -> backend method processes using RegistrationPairScorerGPU -> homography_refined notification -> standard alignment update flow.

Current State¶

Phase 1 -- Complete¶

Core interactive registration for LWIR-to-LWIR alignment. Clean architecture with notification-based decoupling, type-safe data classes (RegistrationMatrix, RegistrationImage, ImagePair), and singleton state management.

Phase 2 -- In Progress (paused Oct 2025)¶

Cross-modality LWIR-to-VNIR registration: - MatchAnything/ELoFTR bridge fully integrated with batched tiling (backend/matchanything_gpu.py) - Modern RANSAC methods (MAGSAC++, GC-RANSAC) integrated with confidence score support - Next step: Test MatchAnything on registration_samples data, compare RANSAC methods, tune parameters - Test data at /home/geoff/projects/ceres/registration_samples/ with Metashape ground-truth homographies

(Source: hephaestus/HANDOFF.md, lines 390--412)

Struggles¶

SuperGlue Fails Cross-Modality¶

Hypothesis: SuperGlue (learned keypoints) might generalize across thermal-visible like SIFT (hand-crafted features) does, since SIFT works cross-modality.
Failure Mode: SuperGlue produced no usable matches on LWIR-VNIR pairs.
Root Cause: SuperPoint descriptors are trained on same-modality natural images; learned descriptors do not generalize across thermal/visible domains.
Anti-Pattern: Do not assume learned feature matchers will generalize to cross-modality tasks without cross-modality training data.
Resolution: Adopted MatchAnything (ELoFTR-based), pre-trained specifically on cross-modality pairs (thermal-visible, SAR-optical).

(Source: hephaestus/CLAUDE.md, lines 30--39)

VRAM Leaks During Interactive Use¶

Multiple commits in the git history address CUDA/VRAM memory leaks during interactive sessions. The VRAM monitor widget (frontend/vram_monitor_widget.py) was added to track GPU memory in real time.

Relationship to Other Pipeline Tools¶

Produces homography matrices consumed by LWIR-Align for batch registration
Browsing mode designed for PIUnet MFSR dataset (flight 21051/21052 pairs)
MatchAnything integration uses code from MatchAnything-Standalone
Cross-modality alignment supports planned Metashape integration for ground truth
Aligned outputs feed into LWIR Tile Validator for QC

Design Principles (from CLAUDE.md)¶

Fail fast: No try/except blocks unless absolutely necessary
Separation of concerns: Each class has one responsibility
Notification pattern: Components communicate via typed notifications, not direct references
Type safety: Dataclasses replace dictionary passing throughout
Health monitoring: Code health check scripts track 200+ methods across 18+ files