Hephaestus¶
Hephaestus is an interactive thermal image registration system built with PyQt6 and OpenGL. It provides real-time red-cyan anaglyph display with interactive corner manipulation for homography-based image alignment, plus AI-assisted backend refinement using multiple registration algorithms.
Repository: /home/geoff/projects/ceres/superrez/hephaestus/
Active: Jul 2025 -- Oct 2025 (100 commits)
Conda env: superglue-env
Why It Was Built¶
Existing registration tools could not handle the specific challenges of LWIR image alignment:
-
Phase 1 (LWIR-to-LWIR): Aligning individual HA thermal frames to LA thermal mosaics for MFSR dataset creation. Standard tools lacked the interactive feedback loop needed to verify alignment quality on low-contrast thermal imagery. The real-time anaglyph overlay lets a human operator instantly see misalignment.
-
Phase 2 (LWIR-to-VNIR cross-modality): Aligning thermal mosaics to visible-spectrum mosaics. This is a harder problem because thermal and visible images have fundamentally different appearance -- edges, contrast, and textures differ across modalities. Off-the-shelf feature matchers like SuperGlue (trained on same-modality natural images) fail on this task.
Architecture¶
Frontend (PyQt6 + OpenGL)¶
| Component | File | Role |
|---|---|---|
| Main Window | frontend/main_window.py |
Top-level coordinator, manages all major components and UI |
| Perspective Widget | frontend/perspective_widget.py |
OpenGL rendering -- real-time anaglyph, corner dragging, rotation, scaling |
| Shader Renderer | frontend/shader_renderer.py |
GPU shader compilation and rendering pipeline |
| Time Series Widget | frontend/time_series_widget.py |
Score history graph for tracking alignment progress |
| VRAM Monitor | frontend/vram_monitor_widget.py |
Live GPU memory usage display |
State Management¶
| Component | File | Role |
|---|---|---|
| Registration Session | models/registration_session.py |
Central alignment state singleton -- homography, coordinate transforms, crop building |
| Registration Tracker | models/registration_tracker.py |
Undo/redo history, quality metric tracking over time |
| Image Manager | models/image_manager.py |
Centralized image loading and caching singleton |
| Notification Center | models/notification_center.py |
Central message bus (iOS-style NotificationCenter pattern) for loose coupling |
Backend Registration Methods¶
The codebase supports multiple registration algorithms because different modality combinations require different approaches:
| Method | Files | Use Case | Status |
|---|---|---|---|
| SuperGlue | backend/superglue_bridge.py, superglue_gpu.py |
LWIR-to-LWIR (same modality) | Works well |
| MatchAnything | backend/matchanything_bridge.py, matchanything_gpu.py |
LWIR-to-VNIR (cross-modality) | Integrated, ready for testing |
| RAFT | backend/raft_bridge.py, raft_gpu.py |
Dense optical flow, small displacements | Available |
| ECC | backend/ecc_bridge.py |
Intensity-based fallback, subpixel refinement | Available |
All deep learning matchers support tiling for mosaic-scale images (3598x2972+) that exceed VRAM even on RTX 5080 (16GB). Tiles are processed in batched GPU forward passes with coordinate offset correction.
(Source: hephaestus/CLAUDE.md, lines 51--65)
Homography Estimation¶
Modern RANSAC variants replace classical RANSAC for robust homography fitting:
- MAGSAC++ (default) -- uses match confidence scores as sampling priors
- GC-RANSAC -- graph-cut spatial coherence enforcement
- PROSAC -- progressive sampling by quality
- ACCURATE / PARALLEL -- OpenCV USAC variants
(Source: hephaestus/HANDOFF.md, lines 79--100)
Coordinate Systems¶
The application manages four coordinate spaces through RegistrationMatrix:
- IMAGE Space -- full reference image coordinates (e.g., 2048x2048 mosaic)
- TEMPLATE Space -- template image coordinates (e.g., 512x640 thermal frame)
- WORLD Space -- display/widget coordinates for GUI rendering
- CROP Space -- cropped region coordinates for processing pipelines
RegistrationSession manages all transformations between these spaces.
Key Features¶
- Real-time OpenGL-accelerated anaglyph rendering (red-cyan overlay) at 60fps
- Interactive corner dragging for perspective transforms
- Alt+drag rotation, Ctrl+drag isotropic scaling
- Multiple render modes: anaglyph, reference, template, difference
- AI-assisted registration refinement (trigger from GUI, runs backend method, updates alignment)
- MFSR dataset browsing with mosaic A/B switching (M key)
- Auto-save homography matrices to JSON (compatible with external tools via
"H"key) - Score history tracking and visualization with undo/redo
- File-based debug logger (
debug_helpers/file_debug_logger.py) for per-file console noise control
Data Flow¶
Image Loading¶
SimpleIndexBrowser navigates to matrix JSON -> images_load_requested notification -> ImageManager loads images -> images_loaded notification -> RegistrationSession binds and updates sizes -> PerspectiveWidget renders.
Alignment Update¶
User drags corner (or backend finds better alignment) -> homography_changed notification -> RegistrationSession updates -> RegistrationTracker scores -> TimeSeriesWidget updates graph.
Backend Optimization¶
User triggers button -> ai_refinement_requested notification -> RegistrationManager gets crop from RegistrationSession -> backend method processes using RegistrationPairScorerGPU -> homography_refined notification -> standard alignment update flow.
Current State¶
Phase 1 -- Complete¶
Core interactive registration for LWIR-to-LWIR alignment. Clean architecture with notification-based decoupling, type-safe data classes (RegistrationMatrix, RegistrationImage, ImagePair), and singleton state management.
Phase 2 -- In Progress (paused Oct 2025)¶
Cross-modality LWIR-to-VNIR registration:
- MatchAnything/ELoFTR bridge fully integrated with batched tiling (backend/matchanything_gpu.py)
- Modern RANSAC methods (MAGSAC++, GC-RANSAC) integrated with confidence score support
- Next step: Test MatchAnything on registration_samples data, compare RANSAC methods, tune parameters
- Test data at /home/geoff/projects/ceres/registration_samples/ with Metashape ground-truth homographies
(Source: hephaestus/HANDOFF.md, lines 390--412)
Struggles¶
SuperGlue Fails Cross-Modality¶
- Hypothesis: SuperGlue (learned keypoints) might generalize across thermal-visible like SIFT (hand-crafted features) does, since SIFT works cross-modality.
- Failure Mode: SuperGlue produced no usable matches on LWIR-VNIR pairs.
- Root Cause: SuperPoint descriptors are trained on same-modality natural images; learned descriptors do not generalize across thermal/visible domains.
- Anti-Pattern: Do not assume learned feature matchers will generalize to cross-modality tasks without cross-modality training data.
- Resolution: Adopted MatchAnything (ELoFTR-based), pre-trained specifically on cross-modality pairs (thermal-visible, SAR-optical).
(Source: hephaestus/CLAUDE.md, lines 30--39)
VRAM Leaks During Interactive Use¶
Multiple commits in the git history address CUDA/VRAM memory leaks during interactive sessions. The VRAM monitor widget (frontend/vram_monitor_widget.py) was added to track GPU memory in real time.
Relationship to Other Pipeline Tools¶
- Produces homography matrices consumed by LWIR-Align for batch registration
- Browsing mode designed for PIUnet MFSR dataset (flight 21051/21052 pairs)
- MatchAnything integration uses code from MatchAnything-Standalone
- Cross-modality alignment supports planned Metashape integration for ground truth
- Aligned outputs feed into LWIR Tile Validator for QC
Design Principles (from CLAUDE.md)¶
- Fail fast: No try/except blocks unless absolutely necessary
- Separation of concerns: Each class has one responsibility
- Notification pattern: Components communicate via typed notifications, not direct references
- Type safety: Dataclasses replace dictionary passing throughout
- Health monitoring: Code health check scripts track 200+ methods across 18+ files