Source Job

Europe

  • Design, build, and maintain scalable data lake solutions and processing pipelines handling large volumes of data.
  • Develop distributed data processing applications using Apache Spark on Databricks and build real-time streaming pipelines with Apache Kafka.
  • Apply software engineering best practices to data pipelines including CI/CD, automated testing, and peer code review.

Apache Spark Databricks Python SQL

20 jobs similar to Data Engineer

Jobs ranked by similarity.

  • Design, build, and maintain scalable data pipelines using AWS Glue (PySpark), or equivalent orchestration and transformation tools.
  • Engineer and optimise the ClickHouse warehouse for sub-second query performance across all back-offices.
  • Implement data contracts between back-office and the platform.

Block Labs is a premier technology studio operating at the bleeding edge of Web3, Artificial Intelligence, and iGaming. We are a collective of senior engineers, product strategists, and builders who refuse to compromise on architecture.

Global

  • Design and implement modern data platforms and scalable data pipelines to enable better data-driven decisions.
  • Develop and maintain ETL/ELT pipelines using SQL, Spark/PySpark, and Microsoft Fabric or Databricks.
  • Work closely with data architects, BI developers, and customer stakeholders in an Agile environment.

Tieto, through MentorMate, creates durable technical solutions that deliver digital transformation at scale by blending strategic insights and thoughtful design with brilliant engineering. The company provides its people with the opportunity to work on impactful, global projects for recognizable brands.

US Unlimited PTO

  • Lead and manage a global data engineering team building large-scale data pipelines and production datasets for the Public Investor business.
  • Collaborate with product, research, and operations teams to translate roadmap priorities into scalable technical plans and customer-facing data feeds.
  • Drive operational excellence through data quality frameworks, observability, and AI-assisted development practices.

YipitData is the leading market research and analytics firm for the disruptive economy, providing actionable insights from alternative data. With over $475M raised and offices globally, it has a people-centric culture recognized as a Best Workplace for three consecutive years.

United States

  • Build and improve scalable, fault-tolerant, self-serve data infrastructure technologies to support ML and analytics workflows.
  • Own the Data Movement Platform for batch and stream data processing, and invest in building new infrastructure for Spark, Flink, and Airflow.
  • Collaborate with teammates on on-call responsibilities and monitoring/alerting to improve reliability, scalability, latency, and efficiency.

Reddit is a community of communities built on shared interests, passion, and trust, hosting the most open and authentic conversations on the internet. With over 100,000 active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet's largest sources of information.

Canada

  • Design, build, and operate high-scale data ingestion and replication systems from production data stores into the data lakehouse.
  • Build and maintain reliable, scalable data platform infrastructure capable of handling petabytes of data across analytics, AI, and operational use cases.
  • Develop internal libraries, APIs, frameworks, and tooling in languages such as Go and Python to help teams move and access data safely.

Samsara is the pioneer of the Connected Operations Cloud, enabling organizations that depend on physical operations to harness IoT data for actionable insights. As a publicly traded company, Samsara fosters a growth-oriented culture and serves industries that represent over 40% of global GDP.

Global

  • Design and deliver scalable, low-latency streaming data solutions for real-time customer analytics.
  • Analyze business needs, optimize data models, and write clean code using Scala, Python, and SQL.
  • Mentor team members and optimize performance of data platforms like AWS Kinesis, Kafka, and Redshift.

Aircall is an AI-powered customer communications platform used by 22,000+ companies worldwide, unifying voice, SMS, WhatsApp, and AI. The company is a unicorn backed by world-class investors, with 45+ nationalities and a strong, collaborative culture.

$123,696–$254,667/yr
US

  • Design and implement robust data infrastructure in AWS, using Spark with Scala.
  • Evolve our core data pipelines to efficiently scale for our massive growth.
  • Store data in optimal engines and formats, matching your designs to our performance needs and cost factors.

tvScientific is the first and only CTV advertising platform purpose-built for performance marketers. Our solution combines media buying, optimization, measurement, and attribution in one, efficient platform. Our platform is built by industry leaders with a long history in programmatic advertising, digital media, and ad verification.

Spain

  • Design, develop, and maintain backend data processing solutions using Apache Spark.
  • Write and optimize SQL queries for data extraction, transformation, and analysis.
  • Develop scalable data pipelines and ETL processes, collaborating with cross-functional teams.

Talan is an international advisory group specializing in innovation and transformation through technology. The company has 5,000 employees and an annual turnover of 600M€, and has been recognized as a Great Place to Work in Spain and Poland.

US 4w PTO

  • Leverage test-driven development to deliver backend systems and user interfaces for healthcare data integration.
  • Design, implement, and maintain data models, ETL processes, and APIs for performance and scalability.
  • Contribute to automated testing suites and optimize data operations for integrity and security.

Bellese is a mission-driven digital services company pioneering innovative technology solutions in civic healthcare. With a collaborative, remote-first culture, the team is focused on improving public health outcomes through service design and skilled engineering.

Mexico

  • Contribute to the design and implementation of scalable data solutions.
  • Build and optimize batch and streaming ingestion pipelines.
  • Ensure data quality, reliability, and performance across pipelines and datasets.

Blend is an AI services provider that co-creates impact for clients through data science, AI, technology, and people. They aim to fuel bold visions by aligning human expertise with artificial intelligence, fostering innovation, and unlocking value for their clients.

US

  • Owns organizational-wide data architecture, defining standards, patterns, and designs that our teams will implement.
  • Reviews data-related designs and implementations across teams for architectural consistency, performance, and scalability.
  • Designs and develops data pipelines, integrations, and platform features with performance and scalability in mind.

Tenna provides a platform that revolutionizes construction equipment fleet operations. They provide innovative solutions to customers looking for competitive ways to better manage and track their assets, such as heavy and light equipment, large fleets, tools, and materials. They value quality-obsessed, gritty, continuous learners, and collaborative problem solvers.

US

  • Design and build scalable cloud data pipelines for high-volume manufacturing and IoT data using Spark, Kafka, Airflow, and Delta Lake.
  • Implement medallion/lakehouse architectures on Databricks, Snowflake, AWS, or Azure with strong SQL and Python proficiency.
  • Apply manufacturing domain expertise in MES, SCADA, ERP, and industrial protocols to bridge OT/IT systems for real-time data extraction.

We are a Digital Product Engineering company that builds products, services, and experiences that inspire, excite, and delight. We have 17000+ experts across 39 countries and our culture is dynamic and non-hierarchical.

$90,000–$120,000/yr
US 4w PTO

  • Design, build, and maintain scalable data pipelines using Python, Spark, and Airflow.
  • Collaborate cross-functionally with AI/ML and Product teams to implement new features.
  • Proactively identify and resolve bottlenecks in our complex ETL processes.

Sayari provides judgment infrastructure for trustworthy AI in economic security and commercial risk. They resolve primary-source records forming the ground truth of global commerce, and are headquartered in Washington, D.C., with offices in London, Singapore, Tokyo, and Tel Aviv.

Canada Unlimited PTO 12w maternity 12w paternity

  • Design and implement scalable, high-performance data pipelines to ingest and transform data from a variety of sources.
  • Build and maintain APIs that enable flexible, secure, and tenant-aware data integrations with external systems.
  • Implement observability, monitoring, and alerting to track data freshness, failures, and performance issues.

Northbeam is building the world's most advanced marketing intelligence platform for top eCommerce brands, providing powerful attribution modeling and customizable dashboards. The company is experiencing rapid growth with a strong product-market fit and a remote-friendly culture.

Global

  • Build streaming and batch pipelines that ingest, normalise, and distribute market, trading, and portfolio data.
  • Build the self-serve tooling so other teams publish, consume, and build on data products without waiting.
  • Own data contracts and schema evolution; keep schema changes from turning into multi-team coordination events.

Keyrock is a change-maker in the digital asset space, renowned for its partnerships and innovation. They have over 250 team members around the world with diverse backgrounds and hubs in London, Brussels, and Singapore, hosting regular online and offline hangouts.

Brazil

  • Design and build scalable data pipelines and architectures using Databricks, Azure Data Factory, and ADLS to support analytics and AI use cases.
  • Integrate structured and unstructured data from multiple enterprise sources into robust cloud data platforms for financial domains like credit analysis and document intelligence.
  • Apply DevOps practices and collaborate with stakeholders to modernize legacy reporting systems and enable real-time data-powered decision-making.

This role is listed on behalf of a partner company that focuses on data-driven transformation initiatives, designing scalable data pipelines for advanced analytics and AI use cases. They offer a collaborative technical environment and invest in continuous learning and cutting-edge technologies.

India

  • Design scalable data pipelines and backend systems from the ground up.
  • Leverage AWS and GCP for real-time and batch processing.
  • Manage databases and Data Warehouses, optimizing ETL workflows.

Delivery Solutions, a UPS company, is looking for a Senior Data Engineer to join their team. They are a growing company.

US East Coast 4w PTO

  • Own day-to-day administration, configuration, and health of Oura's global Databricks environment.
  • Contribute to data pipeline development and Spark workload optimization across cross-functional growth areas.
  • Manage workspace governance including access controls, cluster policies, cost monitoring, and security configurations.

Oura empowers people to own their inner potential through award-winning products that help gain deeper knowledge of readiness, activity, and sleep quality. They are a quickly growing company focused on helping people live healthier and happier lives, ensuring team members have what they need to do their best work.

$190,000–$280,500/yr
US Canada

  • Architect and evolve scalable data ingestion and egress frameworks and pipelines that are well tested and offer strong data quality monitoring.
  • Architect and evolve our CI/CD processes - enhancing the testing environment and observability.
  • Enhance our Claude Code / LLM development support capabilities - creating tools / skills / agents that give our LLMs more context and help us continually improve their abilities to debug, create code, and maintain systems.

Life360’s mission is to keep people close to the ones they love. They have a mobile app, tracking devices, and a pet GPS tracker. Life360 has more than 500 (and growing!) remote-first employees and delivers peace of mind and enhances everyday family life.

US EMEA

  • Design, build, and maintain distributed data pipelines that power Spotify Wrapped data stories and personalized experiences for more than 300M users globally.
  • Partner with Data Scientists to evaluate and operationalize new Wrapped story concepts, balancing personalization, scalability, and eligibility requirements.
  • Build scalable systems that process large-scale listening data and generate insights that celebrate users’ unique listening journeys.

The Personalization team makes deciding what to play next easier and more enjoyable for every listener. They are behind some of Spotify’s most-loved features. Join them and you’ll keep millions of users listening by making great recommendations to each and every one of them.