Data Engineer

Citylitics

Citylitics

Software Engineering, Data Science
Toronto, ON, Canada
Posted on Oct 4, 2024

About Citylitics

Citylitics delivers predictive intelligence on local utility & public infrastructure markets

What is Infrastructure? It is the roadways you rely on to safely get to Grandma's house, it's the potable water that comes out of your kitchen tap that you wash your family's food with and it's the energy that heats our homes and powers our digital lifestyles.

Every year, trillions of dollars are spent on all areas of infrastructure to maintain our quality life and move our economy forward. However, our infrastructure is no longer equipped to meet the needs of the future. We hear about infrastructure failures, whether bridge collapses, power blackouts, or water main breaks, every day in the news. Climate change and extreme weather events are disrupting the basic infrastructure we took for granted for years.

Citylitics is solving the hardest data problems in infrastructure while building the sales intelligence platform that enables a faster, more transparent, and more efficient infrastructure marketplace. We turn millions of unstructured documents into high value intelligence feeds and datasets that are available on an intuitive user experience. Our goal is to enable solution providers to connect with cities with relevant infrastructure needs in a faster and more digital way than historic market channels. As more companies adopt our platform, cities & utilities will be able to access solutions that deliver on the promise of moving towards a more resilient, sustainable, and equitable infrastructure future.

Who Are We Looking For?

We're looking for a skilled and enthusiastic Data Engineer to join our growing team! We're a close-knit group building and maintaining mission-critical data pipelines, and we need someone who can hit the ground running.

This role requires a solid understanding of and experience with Apache Airflow. You'll be responsible for designing, building, and maintaining both streaming and batch data pipelines, leveraging various technologies we already use, including Google Cloud Platform (GCP) services like Dataflow, BigQuery, and Vertex AI. Experience with these specific GCP services is a must.

We're not looking for just someone who knows these tools, but someone who has built with them – ideally, complex and robust data pipelines in a production environment. We're interested in seeing examples of your work, so please be prepared to discuss past projects and challenges you've overcome.

Beyond the technical skills, we value collaboration, a proactive approach to problem-solving, and a willingness to learn and adapt to evolving technologies. If you're passionate about data engineering, thrive in a collaborative environment, and are excited by the prospect of working on impactful projects, we encourage you to apply.

What Will You Accomplish?

  • Design, build, and maintain: Develop and deploy highly scalable and reliable data pipelines using Apache Airflow, Dataflow, and other GCP services. This includes everything from initial design and development through testing, deployment, and ongoing maintenance.
  • Data Modeling & Optimization: Collaborate with data analysts and stakeholders to define data requirements, and design efficient and effective data models within BigQuery. You’ll be optimizing queries and pipeline performance for maximum efficiency.
  • Monitoring & Troubleshooting: Implement robust monitoring and alerting for data pipelines. Proactively identify and resolve issues, ensuring data quality and pipeline uptime.
  • Collaboration & Communication: Work closely with other engineers, data scientists, and product teams to understand business requirements and translate them into technical solutions. Clearly communicate technical details and project progress.
  • Continuous Improvement: Contribute to the ongoing improvement of our data infrastructure and processes, including suggesting and implementing new technologies and best practices. We're always looking for ways to optimize our workflows and improve efficiency.
  • Other duties as assigned.

Technologies We Use:

  • Backend: Python, Django, Cloud SQL and Airflow/Cloud Composer as the main language, web framework, database and orchestration tool respectively
  • Cloud Infrastructure: Google Cloud Platform
  • Other Tools: Dash & Plotly as the main framework for our dashboards hosted in Cloud Run