Skip to content

Existing Data Limitations: The Case for New Collection

The existing dataset consists of two flights over Carinalli Ranch: flight 21051 (LA, ~800m ASL) and flight 21052 (HA, ~1218m ASL). These flights were not designed for multi-frame super-resolution research. Every limitation described here motivated a specific design choice in Data Collection v2.

1. Sparse Frame Overlap

Only a handful of regions in the existing dataset had 8-9+ overlapping HA frames. This limits both training data volume and the coverage available for inference evaluation. Most of the survey area has insufficient overlap for MFSR, restricting us to isolated pockets of usable data.

The downstream effect is twofold: we cannot generate enough training tiles for robust learning, and we cannot evaluate the model across diverse spatial contexts. The few usable regions are not representative of the full scene.

Design response in v2: Higher HA pass overlap targeting 12-15+ frames per ground point. At 30 Hz capture and reasonable flight parameters, overlap is never the limiting factor — quality filtering (blur, vibration) becomes the bottleneck instead.

2. Single Site, No Diversity

All training and evaluation data comes from two mosaics at one location: Carinalli Ranch. The model has seen exactly one set of crop types, soil conditions, and thermal characteristics. Generalization to other sites is completely untested.

A model trained on a single scene can overfit to that scene's specific thermal patterns, spatial frequencies, and noise characteristics without learning transferable super-resolution features.

Design response in v2: Collect from 10-20 distinct sites with varied crops, soil types, irrigation methods, terrain, and thermal conditions. See Data Collection v2 for site selection criteria.

3. Training/Inference Gap

Training tiles were carved from the few dense-overlap pockets. At inference time, the model encounters arbitrary regions with different overlap patterns and spatial context. This domain shift was never properly characterized, but the symptom was clear: models that looked acceptable on training tiles degraded at full-frame inference.

This gap means training metrics (tile-level PSNR) are not predictive of deployment performance. The model learns to exploit the specific statistical properties of the dense-overlap training regions rather than learning general super-resolution.

Design response in v2: Uniform HA coverage so training tiles can come from anywhere in the survey area, matching the conditions the model will encounter at inference. Training and inference contexts converge.

4. Parallax in HA Frames

Tall objects (trees, buildings, infrastructure) create parallax that homography-based registration cannot resolve. HA frames are not orthorectified — parallax exists even at altitude, though it is reduced compared to lower altitudes.

Analysis of parallax masking showed the top 10-15% of L1 error pixels corresponded to tall objects. This contamination affects both training (the network learns from misaligned pixels) and evaluation (PSNR is penalized by registration error, not SR quality). See Evaluation Strategy for how this distorts metrics.

Potential mitigations: - Site selection: flat open agricultural fields with minimal vertical structure and young/low crops - Masking: use L1 error maps from HA-vs-LA comparison to identify parallax-contaminated pixels, exclude from loss - Quality-weighted loss: weight down high-error regions rather than binary masking - Deformable convolutions: alignment modules with per-pixel offset prediction can handle some residual parallax

Design response in v2: Careful site selection minimizes the problem at the source. Parallax masks computed once per site from the LA orthomosaic (which is parallax-free) can be reused across all MFSR stacks.

5. Non-Designed SR Factor

The existing data was not collected with MFSR in mind. The altitude ratio was whatever the flights happened to be, not a deliberate choice for a specific magnification factor.

However, the SR factor is enforced at tile extraction (e.g., 64x64 HA tiles mapped to 192x192 LA tiles for 3x, or 128x128 to 256x256 for 2x), not by altitude precision. A manned aircraft will never fly a mathematically perfect altitude ratio. The issue is more about the 3x factor itself being too ambitious: at 3x, 89% of output pixels are "novel" (the network must invent 9x the input information). Our previous 3x attempts could not consistently beat bicubic interpolation.

Design response in v2: Target roughly 2x altitude ratio, enforce exact 2x at tile extraction. At 2x, 75% of pixels are novel (4x the input) — substantially more tractable. SR difficulty scales superlinearly with factor, so 2x is a fundamentally different and easier problem.

Cumulative Effect

These five limitations compound. Sparse overlap means few training samples. Single-site means those few samples lack diversity. The training/inference gap means even good training metrics do not predict deployment quality. Parallax contaminates both training targets and evaluation. And the 3x factor makes the reconstruction task itself harder than necessary.

The result was a model (PIUnet, Nov 2025) that could not consistently beat bicubic upsampling. Rather than continuing to iterate on architectures with fundamentally flawed data, the project direction shifted toward purpose-built data collection. See Data Collection v2 for the full design.