Shape and scale critical infrastructure for one of the largest online platforms in the world. Build, maintain, and optimize multi-cloud compute systems for high-performance, reliable, and secure operations. Influence the technical direction of infrastructure platforms while mentoring and guiding other engineers.
Source Job
20 jobs similar to Principal Software Engineer, Infra
Jobs ranked by similarity.
- Work collaboratively on a team to build out Reddit’s multi-cloud compute infrastructure.
- Contribute to the design, implementation, and operations for one of the largest sites in the world.
- Write software to improve the compute infrastructure and analyze problems as Reddit scales.
Reddit is a community of communities built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.
- Own challenging infrastructure problems end-to-end by understanding how engineers use the platform.
- Design scalable, maintainable services and contribute to technical proposals.
- Contribute to the roadmap, highlighting opportunities, validating approaches and helping keep our platform solutions current with cloud best practices.
Canva's intuitive suite of design products is powered by our large distributed infrastructure group, setting large and ambitious goals.
Lead and manage the Platform Engineering team, providing technical guidance and mentorship. Design, build, and evangelize Golden Paths and Service Scaffolding to reduce friction across the development lifecycle. Oversee the design, implementation, and maintenance of Shared DB Platforms, ensuring optimal performance, integrity, and security across the organization.
Founded in 2012, EasyPost is a YC unicorn whose mission is to make shipping simple for businesses from garage startups to the Fortune 500.
- Design and evolve infrastructure systems to ensure scalability, reliability, and cost efficiency.
- Lead and mentor a distributed infrastructure team, fostering a collaborative and inclusive culture.
- Oversee all cloud environments supporting MZLA’s products and business systems.
MZLA Technologies Corporation (MZLA) is a wholly owned, for-profit subsidiary of the Mozilla Foundation and home to Thunderbird. They are a small but growing team of 50+ people distributed across seven countries building an open-source email and productivity platform.
Design, build, operate, and maintain critical backend systems for alerting, ensuring reliability, scalability, and performance. Drive projects from ideation through to production and operations, actively contributing to roadmap planning. Collaborate with cross-functional teams to deliver features that meet user needs and business objectives.
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements.
Design, implement, and evolve large-scale, cloud-native infrastructure supporting MariaDB's global SaaS platform. Lead reliability and scalability initiatives, driving automation and resilience through infrastructure-as-code and GitOps practices. Proactively identify and remediate systemic reliability issues, ensuring high service availability and performance across multi-cloud environments.
MariaDB is making a big impact on the world and is the backbone of applications used everyday, including 75% of the Fortune 500 companies.
Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.
NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.
- Define and drive the roadmap for the cloud infrastructure platform.
- Architect the Infrastructure as Code (IaC) platform.
- Set technical standards and best practices for infrastructure engineering.
Engine is transforming business travel into something personalized, rewarding, and simple, and is backed by Telescope Partners, Blackstone, and Permira.
Design, develop, and maintain resilient backend services handling critical user-facing functionality. Build and maintain reusable libraries, frameworks, and tooling. Partner with product and platform teams to design APIs and distributed system patterns that are reliable, scalable, and maintainable.
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.
- Define the technical architecture for Docker's unified enterprise governance platform.
- Own end-to-end delivery of major platform components.
- Mentor engineers across the organization, helping them grow their technical skills and judgment.
Docker makes app development easier so developers can focus on what matters. Their remote-first team spans the globe and is united by a passion for innovation and great developer experiences. Docker is the #1 tool for building, sharing, and running apps and is trusted by startups and Fortune 100s alike.
- Influence and align cross-functional teams on platform evolution.
- Architect and evolve hypervisor integrations across thousands of hosts.
- Drive advanced performance tuning across CPU, memory, I/O, networking, and storage layers.
Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.
- Lead infrastructure resiliency efforts including recovery mechanisms, tenant isolation, and load spike handling
- Improve observability and operability of systems
- Build performance-critical, user-facing infrastructure like real-time event processing
Jobgether is a platform that uses AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against a role's core requirements. They identify top-fitting candidates and share this shortlist directly with the hiring company.
- Design and implement cloud-native infrastructure that powers core product capabilities at scale.
- Build proprietary solutions (sync engines, observability pipelines, DNS management systems) that differentiate Files.com.
- Engineer infrastructure for speed, resilience, and maintainability across high-volume, distributed workloads.
Files.com powers secure file transfer and automation for over 4,000 brands. They are a profitable, founder-led SaaS company with a flat, high-trust engineering organization, where engineers are empowered to take ownership of projects.
- Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
- Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
- Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.
VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.
Help build and operate core cloud-native systems including VKE, VLB, VCR, Vultr Inference, NAT Gateways, and our internal APIs. The ideal candidate has a strong understanding of Kubernetes components, container runtime internals, and modern IaC/automation practices. This role will have a direct impact on Vultr’s global cloud infrastructure footprint.
Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.
- Lead maintenance and operations for production and development environments.
- Architect and implement complex solutions spanning OS, virtualization, network, and cloud layers.
- Lead automation initiatives for infrastructure provisioning and operational tasks.
NMI enables partners with choice in payments, challenging the one-size-fits-all approach. They power innovative tech for SMBs, entrepreneurs, and fintech startups, fostering a diverse and welcoming workplace with a dedicated Diversity, Equity & Inclusion action group.
As a Platform Engineer, enhance and maintain foundational tools and systems, working hands-on with Kubernetes clusters and AWS infrastructure. Build and maintain services that abstract and orchestrate our infrastructure, designing and implementing backend services like APIs and controllers. Develop software for complex projects, and manage infrastructure migrations and security tooling.
Monzo is on a mission to make money work for everyone, waving goodbye to the complicated ways of traditional banking, offering personal and business bank accounts.
Design, build, and maintain scalable, reliable services that power high-volume software solutions. Take ownership of features from end-to-end across the software development lifecycle, including infrastructure, observability, and production operations. Write clean, production-grade code, focusing on maintainability, test coverage, and system resilience.
Rithum is the world’s most trusted commerce network, accelerating how brands, suppliers, and retailers work together to deliver seamless e-commerce experiences.
- Design, build, and scale systems, APIs, and tools for efficient software deployment and management.
- Contribute to creating secure, reliable, and scalable software that enhances developer workflows and automates infrastructure capabilities.
- Improve the overall efficiency and effectiveness of the development process.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
As an SRE you will be responsible for ensuring the availability, performance and cost effectiveness of these services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability. Proactively identifying and mitigating reliability risks.
In 2019, our founders were working as engineers solving complex cross domain problems within government organisations TwinStream was formed.