Source Job

Global

  • Be scrappy to find new sources of audio data and bring it into our ingestion pipeline
  • Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform.
  • Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models.

Python Linux Docker Terraform GCP

20 jobs similar to Software Engineer, Data Infrastructure & Acquisition

Jobs ranked by similarity.

Latin America

  • Create and maintain optimal data pipeline architecture.
  • Assemble large, complex data sets that meet functional / non-functional business requirements.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for high-intensive applications and greater scalability, etc.

We are a leading software provider of Item Chain Management solutions to consumer brand, retail and industrial enterprises around the globe. We also provide development services and support to third-party customers across the globe.

ANZ

  • Shaping the Python language ecosystem with a strong product and platform mindset.
  • Architecting, building and delivering high-impact solutions that uplift the Python developer experience.
  • Developing internal observability tooling and metrics that give the team actionable insights.

Canva is a design platform that enables users to create a variety of visual content. They have campuses in Sydney and Melbourne and co-working spaces in Brisbane, Perth and Adelaide; they value work-life balance by providing their teams with the choice in where and how they work.

Global

  • Architect our AWS-based data warehouse and ingestion pipelines.
  • Transform high-volume simulation outputs into clean, trusted datasets.
  • Establish schema standards and data contracts with engineering.

Onebrief provides collaboration and AI-powered workflow software designed for military staffs, making them faster, smarter, and more efficient. The company, founded in 2019, values ownership and excellence, with a team spanning veterans and technologists; it has raised $320m+ from investors and is valued at $2.15B.

EMEA

  • Design and implement tooling that enables researchers to quickly deploy and evaluate new models in production
  • Design, build, and maintain high-performance, cost-efficient inference pipelines, making architectural decisions about scaling, reliability, and cost trade-offs
  • Proactively identify and resolve infrastructure bottlenecks, proposing and scoping improvements to iteration speed and production reliability

AssemblyAI builds best-in-class Speech AI models that power the next generation of voice applications. They are a remote team building one of the next great AI companies where teammates define and build their company culture.

US Unlimited PTO 12w maternity 12w paternity

  • Design, implement, and maintain cloud-based infrastructure using AWS, Azure, or GCP.
  • Build, optimize, and manage continuous integration and continuous deployment (CI/CD) pipelines.
  • Integrate AI-powered tooling into engineering workflows to accelerate delivery and improve code quality.

Givebutter is a nonprofit fundraising and CRM platform. They empower millions to raise more, pay less, and give better by offering tools like fundraisers, donation forms, donor management, emails, and text blasts all in one place.

$230,000–$265,000/yr
US Unlimited PTO

  • Design and build robust, highly scalable data pipelines and lakehouse infrastructure with PySpark, Databricks, and Airflow on AWS.
  • Improve the data platform development experience for Engineering, Data Science, and Product by creating intuitive abstractions, self‑service tooling, and clear documentation.
  • Own and maintain core data pipelines and models that power internal dashboards, ML models, and customer-facing products.

Parafin aims to grow small businesses by providing them with the financial tools they need through the platforms they already sell on. They are a Series C company backed by prominent venture capitalists, with a tight-knit team of innovators from companies like Stripe, Square, and Coinbase.

US Canada Unlimited PTO

  • Collaborate closely with business stakeholders and other engineers to deliver impactful solutions.
  • Integrate services and product features with databases and messaging queues.
  • Contribute to the development of our MLOps tools for ML models.

Trellis is rewriting the insurance experience from the inside out. They are a profitable, fast-growing Series A startup backed by General Catalyst, QED, NYCA, and Amex Ventures that brings clarity and ease to insurance shopping.

$120,000–$150,000/yr
US

  • Architect and maintain central storage and cloud environment.
  • Design and automate scalable ELT/ETL pipelines for data.
  • Support scientists and operational teams by designing data models.

Funga is a public benefit corporation using forest fungal networks to address climate change. They combine DNA sequencing and machine learning with forest microbiome research to improve wood creation, carbon sequestration, and forest resilience. They are a team of scientists and builders aiming to remove three gigatons of carbon dioxide from the atmosphere by 2050.

US Canada

  • Own and evolve our data infrastructure, including pipelines into our data warehouse
  • Manage and improve cloud infrastructure and DevOps workflows
  • Ensure platform reliability so product and design teams aren’t pulled into backend or operational firefighting

Meridio is a remote-first company on a mission to make health benefits for small businesses simple, affordable, and accessible. As they scale smart, they’re focused on building systems that reduce complexity instead of adding it.

Canada 3w PTO

  • Own model serving: Design, build, and maintain low-latency, highly-available serving stacks for in-house ML model serving and integrating with LLM serving partners.
  • Automate training pipelines: Orchestrate data prep, training, evaluation, and registry workflows on Kubernetes with solid MLOps practices.
  • Optimize at scale: Profile and tune throughput, memory, and cost; introduce caching, sharding, batching, and GPU/CPU autoscaling where it pays off.

Cresta aims to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines AI and human intelligence to help contact centers discover customer insights and automate conversations.

US

  • Design, build, and maintain AWS-based data pipelines that process vehicle telemetry and validation data
  • Develop Python services and workflows supporting safety and performance metric computation
  • Interpret truck-generated data and translate it into measurable system signals

Torc is developing Level 4 autonomous semi-truck software to transform how freight moves across the world. As a part of the Daimler family with over a decade of experience, they are focused on developing software for automated trucks. Torc's culture is collaborative, energetic, and team-focused.

India

  • Run deep, structured discovery with customer stakeholders translating pain into clear outcomes.
  • Own the technical solution end-to-end, making pragmatic tradeoffs that balance speed and security.
  • Build technical assets that accelerate success: repeatable rollout playbooks and automation scripts.

Relyance AI helps businesses navigate the complexities of privacy, data security, and AI governance. They are building in a fast-evolving space and partnering tightly with Product and Engineering shaping product direction based on what they see in the field.

Europe

  • Research & Train: Design, train, and evaluate our proprietary deep learning models.
  • High-Performance ML Systems: Optimize our models for maximum inference speed and efficiency, ensuring they can handle massive datasets and real-time workloads at scale.
  • Software Engineering: Write clean, production-ready code.

Deepslate is building Speech to Speech Voice AI models that sound and act indistinguishable from a human, believing everyone should be able to use it. Backed by top-tier investors from the Tech and AI sectors, as well as a major German VC fund, they are incredibly well-funded and moving fast.

$101,000–$237,000/yr
Canada

  • Design, build, and maintain secure, compliant ML infrastructure and automation adapted for high-sensitivity environments.
  • Develop and productionize machine learning and data pipelines serving real-time models that fight fraudulent traffic, spam, and bots.
  • Extract valuable signals from massive datasets, using your expertise to turn raw data into actionable insights.

Yelp is driven by their values, they’re a cooperative team that values individual authenticity and encourages creative solutions to problems. They are all about helping their users, growing as engineers, and having fun in a collaborative environment and are an equal opportunity employer.

$160,800–$193,000/yr
US Unlimited PTO

  • Create robust pipelines to process massive daily volumes of data.
  • Build and support scalable pipelines as part of Torc’s Data Factory.
  • Scale Torc’s data lake through a distributed storage system.

Torc has been a leader in autonomous driving since 2007 and is now part of the Daimler family, focused on developing software for automated trucks. Their culture is collaborative, energetic, and team-focused, offering flexibility and valuing work/life balance.

US

  • Design and evolve the backend architecture that powers our AI-driven acquisition systems.
  • Own cloud architecture in GCP/AWS/Digital Ocean (multi-environment, production-grade systems).
  • Collaborate with Data and Product teams to productionize intelligent workflows.

Home Solutions is building an AI-powered customer acquisition platform that combines voice agents, intelligent routing, scheduling systems, and partner integrations into a unified operating system for growth. The company targets the rapidly digitizing home services vertical and matches homeowners with the right service provider.

$140,000–$170,000/yr
US

  • Design, build, and maintain reliable software within a defined problem space.
  • Focus on strong execution, sound technical decision making, and delivering high quality software.
  • Support modernization and quality improvements through hands-on development and technical leadership.

Reveleer likely offers solutions in the healthcare technology sector. They value strong technical skills, code quality, and delivering high-quality software efficiently.

$107,000–$145,000/yr
Canada

  • Support the full operational lifecycle of both traditional machine learning systems and emerging generative AI driven applications.
  • Enable scalable training, evaluation, deployment, and monitoring for a wide range of ML and GenAI workloads.
  • Manage model upgrades, framework versions, regression testing, maintenance tasks and maintaining performance across systems and solutions.

Achievers' employee recognition and rewards platform empowers organizations to build cultures where people feel seen and valued, everyday. They're a team of passionate, thoughtful builders with more than 4.3 million users across 190 countries, who care deeply about their product, their customers, and each other.

US

  • Design and implement production-grade RAG pipelines and agentic workflows using Python.
  • Evaluate new models and prototype approaches for SBIR/government deliverables.
  • Document architectures and contribute to technical reports for contract deliverables.

Unstructured is focused on transforming unstructured data into a format usable by LLMs. Their Public Sector team works on high-impact contracts and seek to bridge the gap between custom builds and a scalable product roadmap.

US

  • Focuses on simplifying the infrastructure behind large language model (LLM) integrations, runtime orchestration, and data workflows.
  • Work at the intersection of LLM tooling, serverless infrastructure, and financial data systems.
  • Make spawning new research pipelines seamless and scalable.

The client is one of the world's fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems. They help customers by working with the world’s leading AI labs to advance frontier model capabilities and leveraging that work to build real-world AI systems that solve mission-critical priorities for companies.