Skip to content

Hephaestus

Hephaestus is an interactive thermal image registration system built with PyQt6 and OpenGL. It provides real-time red-cyan anaglyph display with interactive corner manipulation for homography-based image alignment, plus AI-assisted backend refinement using multiple registration algorithms.

Repository: /home/geoff/projects/ceres/superrez/hephaestus/ Active: Jul 2025 -- Oct 2025 (100 commits) Conda env: superglue-env

Why It Was Built

Existing registration tools could not handle the specific challenges of LWIR image alignment:

  1. Phase 1 (LWIR-to-LWIR): Aligning individual HA thermal frames to LA thermal mosaics for MFSR dataset creation. Standard tools lacked the interactive feedback loop needed to verify alignment quality on low-contrast thermal imagery. The real-time anaglyph overlay lets a human operator instantly see misalignment.

  2. Phase 2 (LWIR-to-VNIR cross-modality): Aligning thermal mosaics to visible-spectrum mosaics. This is a harder problem because thermal and visible images have fundamentally different appearance -- edges, contrast, and textures differ across modalities. Off-the-shelf feature matchers like SuperGlue (trained on same-modality natural images) fail on this task.

Architecture

Frontend (PyQt6 + OpenGL)

Component File Role
Main Window frontend/main_window.py Top-level coordinator, manages all major components and UI
Perspective Widget frontend/perspective_widget.py OpenGL rendering -- real-time anaglyph, corner dragging, rotation, scaling
Shader Renderer frontend/shader_renderer.py GPU shader compilation and rendering pipeline
Time Series Widget frontend/time_series_widget.py Score history graph for tracking alignment progress
VRAM Monitor frontend/vram_monitor_widget.py Live GPU memory usage display

State Management

Component File Role
Registration Session models/registration_session.py Central alignment state singleton -- homography, coordinate transforms, crop building
Registration Tracker models/registration_tracker.py Undo/redo history, quality metric tracking over time
Image Manager models/image_manager.py Centralized image loading and caching singleton
Notification Center models/notification_center.py Central message bus (iOS-style NotificationCenter pattern) for loose coupling

Backend Registration Methods

The codebase supports multiple registration algorithms because different modality combinations require different approaches:

Method Files Use Case Status
SuperGlue backend/superglue_bridge.py, superglue_gpu.py LWIR-to-LWIR (same modality) Works well
MatchAnything backend/matchanything_bridge.py, matchanything_gpu.py LWIR-to-VNIR (cross-modality) Integrated, ready for testing
RAFT backend/raft_bridge.py, raft_gpu.py Dense optical flow, small displacements Available
ECC backend/ecc_bridge.py Intensity-based fallback, subpixel refinement Available

All deep learning matchers support tiling for mosaic-scale images (3598x2972+) that exceed VRAM even on RTX 5080 (16GB). Tiles are processed in batched GPU forward passes with coordinate offset correction. (Source: hephaestus/CLAUDE.md, lines 51--65)

Homography Estimation

Modern RANSAC variants replace classical RANSAC for robust homography fitting:

  • MAGSAC++ (default) -- uses match confidence scores as sampling priors
  • GC-RANSAC -- graph-cut spatial coherence enforcement
  • PROSAC -- progressive sampling by quality
  • ACCURATE / PARALLEL -- OpenCV USAC variants

(Source: hephaestus/HANDOFF.md, lines 79--100)

Coordinate Systems

The application manages four coordinate spaces through RegistrationMatrix:

  1. IMAGE Space -- full reference image coordinates (e.g., 2048x2048 mosaic)
  2. TEMPLATE Space -- template image coordinates (e.g., 512x640 thermal frame)
  3. WORLD Space -- display/widget coordinates for GUI rendering
  4. CROP Space -- cropped region coordinates for processing pipelines

RegistrationSession manages all transformations between these spaces.

Key Features

  • Real-time OpenGL-accelerated anaglyph rendering (red-cyan overlay) at 60fps
  • Interactive corner dragging for perspective transforms
  • Alt+drag rotation, Ctrl+drag isotropic scaling
  • Multiple render modes: anaglyph, reference, template, difference
  • AI-assisted registration refinement (trigger from GUI, runs backend method, updates alignment)
  • MFSR dataset browsing with mosaic A/B switching (M key)
  • Auto-save homography matrices to JSON (compatible with external tools via "H" key)
  • Score history tracking and visualization with undo/redo
  • File-based debug logger (debug_helpers/file_debug_logger.py) for per-file console noise control

Data Flow

Image Loading

SimpleIndexBrowser navigates to matrix JSON -> images_load_requested notification -> ImageManager loads images -> images_loaded notification -> RegistrationSession binds and updates sizes -> PerspectiveWidget renders.

Alignment Update

User drags corner (or backend finds better alignment) -> homography_changed notification -> RegistrationSession updates -> RegistrationTracker scores -> TimeSeriesWidget updates graph.

Backend Optimization

User triggers button -> ai_refinement_requested notification -> RegistrationManager gets crop from RegistrationSession -> backend method processes using RegistrationPairScorerGPU -> homography_refined notification -> standard alignment update flow.

Current State

Phase 1 -- Complete

Core interactive registration for LWIR-to-LWIR alignment. Clean architecture with notification-based decoupling, type-safe data classes (RegistrationMatrix, RegistrationImage, ImagePair), and singleton state management.

Phase 2 -- In Progress (paused Oct 2025)

Cross-modality LWIR-to-VNIR registration: - MatchAnything/ELoFTR bridge fully integrated with batched tiling (backend/matchanything_gpu.py) - Modern RANSAC methods (MAGSAC++, GC-RANSAC) integrated with confidence score support - Next step: Test MatchAnything on registration_samples data, compare RANSAC methods, tune parameters - Test data at /home/geoff/projects/ceres/registration_samples/ with Metashape ground-truth homographies

(Source: hephaestus/HANDOFF.md, lines 390--412)

Struggles

SuperGlue Fails Cross-Modality

  • Hypothesis: SuperGlue (learned keypoints) might generalize across thermal-visible like SIFT (hand-crafted features) does, since SIFT works cross-modality.
  • Failure Mode: SuperGlue produced no usable matches on LWIR-VNIR pairs.
  • Root Cause: SuperPoint descriptors are trained on same-modality natural images; learned descriptors do not generalize across thermal/visible domains.
  • Anti-Pattern: Do not assume learned feature matchers will generalize to cross-modality tasks without cross-modality training data.
  • Resolution: Adopted MatchAnything (ELoFTR-based), pre-trained specifically on cross-modality pairs (thermal-visible, SAR-optical).

(Source: hephaestus/CLAUDE.md, lines 30--39)

VRAM Leaks During Interactive Use

Multiple commits in the git history address CUDA/VRAM memory leaks during interactive sessions. The VRAM monitor widget (frontend/vram_monitor_widget.py) was added to track GPU memory in real time.

Relationship to Other Pipeline Tools

  • Produces homography matrices consumed by LWIR-Align for batch registration
  • Browsing mode designed for PIUnet MFSR dataset (flight 21051/21052 pairs)
  • MatchAnything integration uses code from MatchAnything-Standalone
  • Cross-modality alignment supports planned Metashape integration for ground truth
  • Aligned outputs feed into LWIR Tile Validator for QC

Design Principles (from CLAUDE.md)

  • Fail fast: No try/except blocks unless absolutely necessary
  • Separation of concerns: Each class has one responsibility
  • Notification pattern: Components communicate via typed notifications, not direct references
  • Type safety: Dataclasses replace dictionary passing throughout
  • Health monitoring: Code health check scripts track 200+ methods across 18+ files