Data Scientist - Geospatial Foundation Models

SatSure
SatSure

Data Science

Bengaluru, Karnataka, India

Posted on Jun 19, 2026
About SatSure
SatSure is a deep tech, decision intelligence company working at the nexus of agriculture, infrastructure, and climate action — creating impact for the other millions, with a focus on the developing world. As part of this mission, we're building geospatial foundation models that learn directly from Earth observation data — optical, SAR, and elevation — at scale. This role sits at the heart of that effort: architecting and training large-scale models that can generalize across geographies, sensors, and time. You'll be shaping the core intelligence layer that powers insights for millions, not just fine-tuning someone else's model.
Role
In foundation model development, data is the moat. You will drive the transformation of petabytes of raw geospatial data into a high-quality, high-entropy training and evaluation corpus.
This role sits at the intersection of remote sensing, data engineering, and ML, ensuring that models learn from diverse, representative, and well-curated data at scale.
Key Responsibilities
Data Curation & Pre-training Datasets
  • Design and implement data curation pipelines for large-scale pre-training datasets
  • Develop sampling strategies to ensure:
    • Geographic and biome diversity
    • Coverage across seasons, sensors, and resolutions
  • Mitigate dataset biases (e.g., over-representation of cloud-free or high-income regions)
  • Balance trade-offs between data quality, diversity, and scale
Evaluation Frameworks (Earth-Bench)
  • Design and own a comprehensive evaluation framework (“Earth-Bench”) to assess:
    • Representation quality (post-SSL embeddings)
    • Transfer performance on downstream tasks:
      • Segmentation
      • Yield prediction
      • Disaster mapping
  • Define metrics and benchmarks that reflect real-world generalization across geographies and time
  • Continuously evolve evaluation as new datasets, sensors, and tasks emerge
Data Systems & Pipeline Thinking
  • Build and maintain scalable data pipelines for ingestion, processing, versioning, and access
  • Work with ML and platform teams to:
    • Enable efficient data loading and training at scale
    • Optimize storage formats and access patterns (e.g., chunking, caching)
  • Ensure datasets are:
    • Reproducible
    • Well-documented
    • Easily usable across teams
Data-Centric ML Thinking
  • Analyze how data quality, diversity, and freshness impact model performance
  • Partner with researchers to:
    • Identify failure modes driven by data gaps
    • Improve datasets to unlock model gains (not just model changes)
  • Treat data as a first-class lever for improving model quality
Preferred Background
Domain Expertise
  • 3–5 years of experience in Applied Data Science at scale
  • Strong understanding of remote sensing fundamentals, including:
    • Atmospheric correction
    • SAR backscatter
    • Orthorectification
  • Familiarity with multi-sensor data (optical, SAR, DEM, etc.)
Data Engineering at Scale
  • Experience working with large-scale (TB–PB) datasets across the ML lifecycle
  • Hands-on experience with:
    • Distributed data processing
    • Efficient storage and retrieval strategies
  • Understanding of how data pipelines interact with model training workflows
Tooling (Geo Stack)
  • Experience with geospatial data tooling, such as:
    • Xarray, Dask, Rasterio, Zarr
    • Google Earth Engine (nice to have)
Mindset
  • Strong data intuition—ability to reason about bias, coverage, and representativeness
  • Systems thinking: understands how data decisions impact model behavior at scale
  • Comfortable working in ambiguous, evolving problem spaces
Benefits:
  • Medical Health Cover for you and your family, including unlimited online doctor consultations
  • Access to mental health experts for you and your family
  • Dedicated allowances for learning and skill development
  • Comprehensive leave policy with casual leaves, paid leaves, marriage leaves, and bereavement leaves
Interview Process:
  • Intro call
  • Assessment
  • Presentation
  • Interview rounds (ideally up to 3-4 rounds)
  • Culture Round / HR round