Source Job

  • Translate business requirements into highly available data solutions using PySpark, SQL, and Python.
  • Collaborate with cross-functional teams including machine learning engineers and software developers.
  • Implement data pipelines and manage ETL processes with data warehousing fundamentals.

PySpark SQL Python ETL Data Warehousing

20 jobs similar to Lead Data Engineer

Jobs ranked by similarity.

Latin America

  • Design, develop, and maintain ETL data engineering processes using Python (PySpark) and Azure Synapse Analytics.
  • Apply expertise in data warehousing to create effective data storage structures in a Massively Parallel Processing SQL Pool.
  • Collaborate with cross-functional teams to understand data requirements and provide support for data-related initiatives.

Bluelight is a leading software consultancy dedicated to designing and developing innovative technology that enhances users' lives. With a presence across the United States and Central/South America, Bluelight is in an exciting phase of expansion, continually seeking exceptional talent to join its dynamic and diverse community.

Mexico

  • Contribute to the design and implementation of scalable data solutions.
  • Build and optimize batch and streaming ingestion pipelines.
  • Ensure data quality, reliability, and performance across pipelines and datasets.

Blend is an AI services provider that co-creates impact for clients through data science, AI, technology, and people. They aim to fuel bold visions by aligning human expertise with artificial intelligence, fostering innovation, and unlocking value for their clients.

Global

  • Design and implement modern data platforms and scalable data pipelines to enable better data-driven decisions.
  • Develop and maintain ETL/ELT pipelines using SQL, Spark/PySpark, and Microsoft Fabric or Databricks.
  • Work closely with data architects, BI developers, and customer stakeholders in an Agile environment.

Tieto, through MentorMate, creates durable technical solutions that deliver digital transformation at scale by blending strategic insights and thoughtful design with brilliant engineering. The company provides its people with the opportunity to work on impactful, global projects for recognizable brands.

Latin America

  • Design and build critical data infrastructure to onboard a new marketing mix modeling and incrementality testing vendor.
  • Translate external vendor schema into a production-ready data mart, bridging internal DTC data with external ingestion.
  • Manage full merge request workflow using GitLab, adhering to strict peer review processes before deployment.

Truelogic is a leading provider of nearshore staff augmentation services headquartered in New York, delivering top-tier technology solutions to companies of all sizes. The team of 600+ highly skilled tech professionals based in Latin America drives digital disruption by partnering with U.S. companies on impactful projects.

Latin America

  • Design and evolve scalable data platforms, ensuring reliability and governance.
  • Define data architecture standards, models, and integration patterns for business needs.
  • Collaborate with stakeholders to translate requirements into cloud-based data solutions.

Nortal shapes digital transformation with complex solutions for global enterprises and the public sector. With over 25 years of experience and 160+ new hires yearly, the company fosters a culture of autonomy, open communication, and diversity.

Global 6w PTO

  • Development of various services in Python: integration with marketing partners, obtaining data from various sources.
  • Creation and support of processes on Airflow.
  • Supporting the migration of marketing data pipelines and DWH components from MS SQL to Google Cloud Platform (including BigQuery), contributing to architecture decisions and best practices.

Social Discovery Group (SDG) is one of the world's largest groups of social discovery companies, uniting millions of users on dozens of products. Our international team of 1000+ professionals and digital nomads works all over the world and we are proud to be a two-time “Great Place to Work” winner.

US Unlimited PTO

  • Lead and manage a global data engineering team building large-scale data pipelines and production datasets for the Public Investor business.
  • Collaborate with product, research, and operations teams to translate roadmap priorities into scalable technical plans and customer-facing data feeds.
  • Drive operational excellence through data quality frameworks, observability, and AI-assisted development practices.

YipitData is the leading market research and analytics firm for the disruptive economy, providing actionable insights from alternative data. With over $475M raised and offices globally, it has a people-centric culture recognized as a Best Workplace for three consecutive years.

Europe

  • Design, build, and maintain scalable data lake solutions and processing pipelines handling large volumes of data.
  • Develop distributed data processing applications using Apache Spark on Databricks and build real-time streaming pipelines with Apache Kafka.
  • Apply software engineering best practices to data pipelines including CI/CD, automated testing, and peer code review.

InPost is an e-commerce parcel delivery company that operates a network of Automated Parcel Machines (APMs) and pick-up points across nine European countries. Founded in 1999, the company employs thousands and fosters a diverse, international, and cross-functional culture with opportunities for growth and training.

Latin America

  • Develop and maintain data models for core package application and reporting databases.
  • Monitor execution and performance of daily pipelines and escalate issues.
  • Collaborate with analytics and business teams to improve data models and pipelines.

Bluelight Consulting is a leading software consultancy dedicated to designing and developing innovative technology that enhances users' lives. With a presence across the United States and Central/South America, Bluelight is in an exciting phase of expansion, continually seeking exceptional talent to join its dynamic and diverse community.

$160,000–$190,000/yr
US Canada Unlimited PTO

  • Own and maintain data pipeline architectures, ensuring reliability and monitoring.
  • Manage and evolve data modeling environments for analysts and engineers.
  • Implement observability for data systems, detecting issues early and continuously monitoring data quality.

Voltus unlocks the full value of distributed energy resources for customers and the grid. They are a fast-growing climate-tech company with a bright, gritty, and good team that values innovation, impact, and integrity.

US

  • Lead workspace architecture, Unity Catalog governance, and cluster policy design for client tenant organizations.
  • Perform tenant discovery, requirements gathering, source profiling, and security classification for new data intake requests.
  • Develop end-to-end technical designs for tenant onboarding, including Data Sharing Agreements and SLA documentation.

M9 Solutions provides IT services and solutions to the Federal Government, mobilizing skilled people and technologies for improved performance and sustainable change. With 15+ years of proven delivery and growth, the company has been recognized as an Inc. 5000 Fastest-Growing Private Company multiple times and values diverse perspectives.

Global

  • Query and process large datasets using Trino (SQL).
  • Work with data in AWS environment using PySpark.
  • Build audience segments based on website activity, call data, behavioral patterns and segment rules.

Kyivstar is one of the largest and most beloved telecom companies in Ukraine. They offer opportunities to work with large-scale real-world data in a friendly and collaborative team environment, with possibilities for professional development and career growth.

$145,000–$200,000/yr
US Unlimited PTO

  • Design and build ETL processes in collaboration with software and model development teams.
  • Create and maintain scalable data infrastructure.
  • Own full pipeline and infrastructure lifecycle including performance monitoring and optimization.

OpenTeams builds AI that empowers, with models that are energy-efficient, cost-effective, and fully yours. They are proponents of open source, reinvesting 3% of profits back into the open-source community and value freedom, teamwork, accountability, and uncompromising quality.

  • Design, build, and maintain scalable data pipelines using AWS Glue (PySpark), or equivalent orchestration and transformation tools.
  • Engineer and optimise the ClickHouse warehouse for sub-second query performance across all back-offices.
  • Implement data contracts between back-office and the platform.

Block Labs is a premier technology studio operating at the bleeding edge of Web3, Artificial Intelligence, and iGaming. We are a collective of senior engineers, product strategists, and builders who refuse to compromise on architecture.

US

  • Build scalable Python-based data pipelines and backend services for analytics workflows.
  • Design software systems using object-oriented programming and sound engineering practices.
  • Create and support platforms for analytics development, model training, and model deployment.

Experian is a global data and technology company that powers opportunities for people and businesses worldwide across markets like financial services, healthcare, and automotive. With a team of 25,200 people in 32 countries, Experian invests in advanced technologies and its people to unlock the power of data.

India

  • Design scalable data pipelines and backend systems from the ground up.
  • Leverage AWS and GCP for real-time and batch processing.
  • Manage databases and Data Warehouses, optimizing ETL workflows.

Delivery Solutions, a UPS company, is looking for a Senior Data Engineer to join their team. They are a growing company.

LATAM

  • Build and optimize scalable data pipelines using Python and dbt.
  • Design and maintain Snowflake warehouse structures, database tables, and performant data models.
  • Develop reliable ETL/ELT workflows for extracting, transforming, loading, and validating data from multiple sources.

We are seeking a Senior Data Engineer to support core marketplace analytics data products and platform work. Enterprise experience is strongly preferred.

$190,000–$280,500/yr
US Canada

  • Architect and evolve scalable data ingestion and egress frameworks and pipelines that are well tested and offer strong data quality monitoring.
  • Architect and evolve our CI/CD processes - enhancing the testing environment and observability.
  • Enhance our Claude Code / LLM development support capabilities - creating tools / skills / agents that give our LLMs more context and help us continually improve their abilities to debug, create code, and maintain systems.

Life360’s mission is to keep people close to the ones they love. They have a mobile app, tracking devices, and a pet GPS tracker. Life360 has more than 500 (and growing!) remote-first employees and delivers peace of mind and enhances everyday family life.

Global

  • Design, build, and maintain scalable data pipelines in Microsoft Fabric using pipelines, Dataflows Gen2, and notebooks.
  • Integrate and consolidate data from multiple enterprise sources (ERP, CRM, APIs) into a centralized Lakehouse platform.
  • Develop and manage Bronze, Silver, and Gold layers, ensuring data is structured, clean, and business-ready.

Anord Mardix, a Flex company, is a global leader in critical power solutions supporting industries from financial institutions to data centers. The Flex family has ~160,000 members in 30 countries with a values-driven, high-performance culture focused on doing the right thing, collaboration, and resilience.

Canada

  • Design and implement data-driven solutions on GCP including BigQuery, Cloud Storage, Dataflow, Pub/Sub, and Looker/BI.
  • Build and optimize ETL pipelines using SQL and Python to extract, clean, and transform structured and unstructured data from ERP, procurement, logistics, and facility management systems.
  • Ensure data governance, lineage, and compliance across supply chain datasets while continuously optimizing query performance and pipeline reliability.

Innodata is a global data engineering company that enables the responsible advancement of artificial intelligence by providing data, evaluation frameworks, and human expertise. With over 36 years of legacy, Innodata delivers high-quality data and outstanding outcomes for generative AI builders and adopters.