Senior Data Scientist - Geospatial Foundation Models
Data Science
Bengaluru, Karnataka, India
Posted on Jun 19, 2026
About SatSure
SatSure is a deep tech, decision intelligence company working at the nexus of agriculture, infrastructure, and climate action — creating impact for the other millions, with a focus on the developing world. As part of this mission, we're building geospatial foundation models that learn directly from Earth observation data — optical, SAR, and elevation — at scale. This role sits at the heart of that effort: architecting and training large-scale models that can generalize across geographies, sensors, and time. You'll be shaping the core intelligence layer that powers insights for millions, not just fine-tuning someone else's model.
Role
In foundation model development, data is the moat. You will drive the transformation of petabytes of raw geospatial data into a high-quality, high-entropy training and evaluation corpus.
This role sits at the intersection of remote sensing, data engineering, and ML, ensuring that models learn from diverse, representative, and well-curated data at scale.
Key Responsibilities
Data Curation & Pre-training Datasets
- Design and implement data curation pipelines for large-scale pre-training datasets
- Develop sampling strategies to ensure:
- Geographic and biome diversity
- Coverage across seasons, sensors, and resolutions
- Mitigate dataset biases (e.g., over-representation of cloud-free or high-income regions)
- Balance trade-offs between data quality, diversity, and scale
Evaluation Frameworks (Earth-Bench)
- Design and own a comprehensive evaluation framework (“Earth-Bench”) to assess:
- Representation quality (post-SSL embeddings)
- Transfer performance on downstream tasks:
- Segmentation
- Yield prediction
- Disaster mapping
- Define metrics and benchmarks that reflect real-world generalization across geographies and time
- Continuously evolve evaluation as new datasets, sensors, and tasks emerge
Data Systems & Pipeline Thinking
- Build and maintain scalable data pipelines for ingestion, processing, versioning, and access
- Work with ML and platform teams to:
- Enable efficient data loading and training at scale
- Optimize storage formats and access patterns (e.g., chunking, caching)
- Ensure datasets are:
- Reproducible
- Well-documented
- Easily usable across teams
Data-Centric ML Thinking
- Analyze how data quality, diversity, and freshness impact model performance
- Partner with researchers to:
- Identify failure modes driven by data gaps
- Improve datasets to unlock model gains (not just model changes)
- Treat data as a first-class lever for improving model quality
Preferred Background
Domain Expertise
- 5–8 years of experience in Applied Data Science at scale
- Strong understanding of remote sensing fundamentals, including:
- Atmospheric correction
- SAR backscatter
- Orthorectification
- Familiarity with multi-sensor data (optical, SAR, DEM, etc.)
Data Engineering at Scale
- Experience working with large-scale (TB–PB) datasets across the ML lifecycle
- Hands-on experience with:
- Distributed data processing
- Efficient storage and retrieval strategies
- Understanding of how data pipelines interact with model training workflows
Tooling (Geo Stack)
- Experience with geospatial data tooling, such as:
- Xarray, Dask, Rasterio, Zarr
- Google Earth Engine (nice to have)
Mindset
- Strong data intuition—ability to reason about bias, coverage, and representativeness
- Systems thinking: understands how data decisions impact model behavior at scale
- Comfortable working in ambiguous, evolving problem spaces
Benefits:
- Medical Health Cover for you and your family including unlimited online doctor consultations
- Access to mental health experts for you and your family
- Dedicated allowances for learning and skill development
- Comprehensive leave policy with casual leaves, paid leaves, marriage leaves, bereavement leaves
Interview Process:
- Intro call
- Assessment
- Presentation
- Interview rounds (ideally up to 3-4 rounds)
- Culture Round / HR round