Data Collection v2: Purpose-Built MFSR Flights¶

This article describes the proposed new data collection designed to address every limitation documented in Existing Data Limitations. Each design choice traces back to a specific problem with the flight 21051/21052 dataset.

The 2x Magnification Math¶

The sensor has a 32-degree horizontal field of view and 1024x768 resolution. Ground sample distance (GSD) scales linearly with altitude above ground level:

GSD = AGL x 0.056 cm/pixel

For 2x super-resolution, the HA altitude must be exactly twice the LA altitude so that HA GSD is twice LA GSD. Example pairs:

LA AGL	LA GSD	HA AGL	HA GSD	Factor
100 m	5.6 cm/px	200 m	11.2 cm/px	2x
150 m	8.4 cm/px	300 m	16.8 cm/px	2x
200 m	11.2 cm/px	400 m	22.4 cm/px	2x

The exact 2x factor is enforced at tile extraction (128x128 HA patches mapped to 256x256 LA patches), not by altitude precision. A manned aircraft will never hold a perfect altitude ratio, and that is fine.

Addresses: non-designed SR factor from Existing Data Limitations.

Why 2x Instead of 3x¶

At 3x magnification, 89% of output pixels are "novel" — the network must invent 9x the input pixel count. At 2x, 75% are novel (4x the input). SR difficulty scales superlinearly with factor. Our previous 3x attempts with PIUnet could not beat bicubic, so reducing the task difficulty is the pragmatic move. There is also substantially more frequency overlap between LR and HR at 2x, giving the network more information to exploit.

Flight Parameters¶

Capture rate: 30 Hz continuous, 16-bit radiometric, AGC disabled
HA overlap: At AGL 400m and 50 m/s groundspeed, ~103 frames overlap per ground point. Even worst-case geometry gives 25+ frames. Overlap is never the limiting factor.
Quality filtering: Subsample best 8-9 clean frames from the pool after rejecting blur and vibration artifacts
Metashape alignment: Subsample to 3-5 Hz for SfM processing (see Metashape integration below)
Speed constraint: Thermal blur must be less than 0.5 pixels per frame
Positioning: RTK/PPK GNSS for precision camera positions
Flight geometry: Long straight passes with 200-400m turn buffers outside the target area

Addresses: sparse frame overlap from Existing Data Limitations. Dense 30 Hz capture guarantees abundant overlapping frames everywhere, not just in isolated pockets.

Site Selection Criteria¶

Criteria are listed in priority order. There is inherent tension between them — the design philosophy is to optimize for the top priorities and accept tradeoffs on the lower ones.

1. Flat Terrain (Highest Priority)¶

Minimizes parallax in HA frames. Parallax is not eliminated at altitude, but flat terrain reduces it substantially. Combined with parallax masking (see below), this makes the problem manageable.

Addresses: parallax from Existing Data Limitations.

2. Feature-Rich Crops for Registration¶

The alignment pipeline needs thermal features to match between frames. Good candidates: row crops with exposed soil (strong thermal boundaries), mixed canopy with irrigation variation, field edges, roads. Bad candidates: uniform dense canopy (alfalfa, mature soybean) where every pixel looks the same thermally.

Key tension: the flattest, easiest-to-fly fields (large uniform center pivots) may have the worst thermal features. The sweet spot is rectangular row crop fields with varied irrigation — geometric regularity plus thermal heterogeneity.

3. Low Turbulence Conditions¶

Maximizes the percentage of clean frames from the 30 Hz stream. Higher altitude is generally smoother (above the boundary layer). Post-sunset timing is ideal: the boundary layer collapses, turbulence drops dramatically, and thermal contrast is at its peak. Morning before thermal convection starts is another option.

4. Long Rectangular Field Geometry¶

Smooth, stable flight legs for fixed-wing aircraft. Long straight passes with 200-400m turn buffers outside the target area minimize attitude changes that corrupt frames.

5. Varied Thermal Contrast Across Sites¶

Training diversity: different crop types, soil types, irrigation states, times of day. This directly addresses the single-site limitation.

Addresses: single site / no diversity from Existing Data Limitations.

6. Multiple Distinct Sites¶

Target 10-20 sites initially, expanding to 50-100. Different geography, crop species, phenological stages. Prevents overfitting to one scene.

Addresses: single site / no diversity and training/inference gap from Existing Data Limitations.

Metashape Integration¶

Agisoft Metashape replaces several manual pipeline steps (dewarping, registration, HA-to-LA alignment). The workflow:

Subsample HA stream to 3-5 Hz
Run SfM alignment in Metashape to compute camera poses (6-DOF extrinsics + calibrated intrinsics)
Use SE(3) B-spline interpolation to recover full 30 Hz poses from the sparse alignment
Build LA orthomosaic (ground truth) in the same georeferenced coordinate system
Both HA frames and LA mosaic now share a common spatial reference

Our frames are 1024x768 (0.78 MP) — roughly 60x smaller than typical drone imagery. This makes Metashape processing feasible even with high frame counts.

Remaining requirements after Metashape: tiling optimization and per-window optical flow refinement for sub-pixel accuracy.

Addresses: pipeline complexity (fewer manual tools) and noisy ground truth comparison (shared georef coordinate system improves HA-to-LA alignment for evaluation). See Evaluation Strategy for how cleaner alignment improves metrics.

Parallax-Aware Data Collection¶

Rather than just avoiding parallax, the v2 design deliberately plans for it:

Select sites where parallax masking will work well (flat with isolated tall objects rather than pervasive vertical structure)
Use the LA orthomosaic as a parallax-free reference for identifying tall objects
Compute a parallax mask once per site from HA-vs-LA comparison
Apply mask-aware loss or quality-weighted loss during training
Consider what features to deliberately include or exclude in training data

Uniform Coverage Closes the Training/Inference Gap¶

With dense HA overlap across the full survey area, training tiles can be extracted from anywhere — not just the few dense-overlap pockets. This means training and inference encounter the same statistical distribution of spatial contexts, overlap patterns, and scene content.

Addresses: training/inference gap from Existing Data Limitations.

Open Questions¶

What thermal ground control points should we use?
Optimal frame count per ground point: is 9 the sweet spot, or should we go higher? (MAST shows diminishing returns above 32 frames, but PIUnet was designed for 9.)
Can Metashape handle 30 Hz LWIR data at 3-5 Hz subsample rates reliably?
Minimum number of sites for meaningful generalization?
Should later collection rounds include deliberately challenging terrain (orchards, mixed) for robustness testing?