Design, build, and manage our cloud infrastructure using modern tools (Pulumi) to ensure all infrastructure changes are reproducible, secure, and easily auditable.
Orchestrate and optimize our Kubernetes clusters for complex, compute-heavy AI workloads, guaranteeing maximum efficiency and fault tolerance.
Implement a flawless monitoring setup using Datadog and OpenTelemetry to make the black box of our distributed systems transparent, hunting down latency spikes or bottlenecks before they impact users.
Maximize the velocity of our product engineering team.
Ensure platform scalability, reliability, and security.
Champion best practices and shape the engineering culture.
They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.
Developing infrastructure to support cloud-based applications.
Creating deployment architect and continuous delivery pipelines.
Designing high-availability approaches, and implementing monitoring architecture.
Nearform is a digital and AI engineering consultancy with a reputation for experience-led modernization. They focus on creating transformative digital products for enterprise customers across the UK and Ireland. Nearformers form a close-knit community built on trust and camaraderie.
Building world-class AI infrastructure to support a 100+ person research team.
Designing and scaling multi-cloud systems that support high-performance model training and inference.
Improving monitoring, alerting and system observability for AI workloads
Canva is redefining how the world experiences design. It has campuses in Sydney and Melbourne, and co-working spaces in other major cities, trusting employees to choose the balance that empowers them and their team to achieve their goals.
Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Focus on automation so we can spend energy where it matters.
Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.
Design, deploy and maintain a cloud infrastructure to support a Dataiku SaaS offering mainly on AWS and Azure and GCP
Continuously improve the infrastructure, deployment and configuration to deliver more reliable, resilient, scalable and secure services
Automate as much as possible all technical operations
Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. They connect many data science technologies and integrate the best of data and AI tech.
Standardize CI/CD pipelines (GitHub Actions) and Helm charts across 10+ microservices
Build centralized logging, metrics, and alerting (currently a gap)
Extend Terraform to cover full AWS infrastructure
Kiefer Tech delivers cutting-edge AI, robotics, and enterprise solutions across Greece and the EU, leveraging over 20 years of engineering heritage from the Green Energy sector. As the technology arm of Kiefer, they are guided by innovation, quality, and long-term client partnerships and are building sovereign AI infrastructure.
Maintain and continuously improve production uptime, supporting our ≥99.9% target for 2026.
Monitor systems proactively and respond effectively to production incidents.
Drive improvements in MTTR (Mean Time to Resolution).
Infiterra's B2B SaaS platform simplifies subscription service delivery, helping IT Distributors and Managed Service Providers (MSPs) automate and grow their subscription business. With 100+ customers in 75 countries, Infiterra is known for its collaborative and growth-oriented culture.
Contribute to building and operating the infrastructure that supports the HackerOne platform.
Improve the reliability, security, and scalability of our systems.
Design and operate highly available cloud systems and apply best practices for reliability, observability, and security.
HackerOne is a global leader in Continuous Threat Exposure Management (CTEM). The HackerOne Platform unites agentic AI solutions with the ingenuity of the world’s largest community of security researchers to continuously discover, validate, prioritize, and remediate exposures across code, cloud, and AI systems. They combine the ingenuity of the largest security research community with a best-in-class AI-powered platform, trusted by the world’s top organizations.
Implement SLI/SLO frameworks with error budgets to drive reliability decisions
Design release strategies including blue/green deployments and version tracking
Lead incident response and develop automated runbooks to reduce MTTR
Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.
Evolve progressive delivery with Argo Rollouts and GitOps to improve automated health checks and rollback triggers.
Optimize CI/CD infrastructure at scale to improve CI workflows and optimize build times.
Build deployment gates and guardrails that prevent production incidents by designing and implementing automated quality checks.
Wealthsimple is on a mission to help everyone achieve financial freedom by reimagining what it means to manage your money. They are the largest fintech company in Canada, with 3+ million users who trust them with more than $100 billion in assets.
Provide and own automation of the provisioning of CSP resources, including networking, Kubernetes clusters and specific CSP resources required by our application teams.
Work with users (Grafana Cloud application teams) to help understand their needs and ensure investment in the right capabilities.
Participate in the Platform department Infrastructure wing on-call rotation.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. The team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything that they do.
Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.
Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.
Build and lead the team responsible for the reliability, security, and scalability of Gensyn’s production infrastructure and developer platform.
Own the availability, scalability, and security posture of production systems: SLOs/SLIs, incident response, postmortems, reliability improvements, and hardening.
Drive delivery across ambiguous, high-stakes initiatives: roadmap planning, prioritization, and execution against tight timelines.
Gensyn is building a protocol that networks together the core resources required for machine intelligence to flourish alongside human intelligence. They value autonomy, independence, direct feedback and an extreme learning rate, and strive to reject mediocrity and waste.
Own the end-to-end lifecycle (design, provisioning, upgrades, and decommissioning) of core platform components.
Lead the design and implementation of infrastructure bootstrap orchestration, including: Automated cluster and environment provisioning.
Apply and promote SRE practices across the platform, including: Clear ownership and runbooks for platform components.
Pismo provides a comprehensive processing platform for banking, card issuing and financial market infrastructure and helps customers innovate and build the next generation of banking and payment solutions. Pismo’s 500+ employees are located in more than 10 countries around the world.
Partner with engineering leadership, EMs, and Product Managers to define and deliver AI products.
Architect scalable, high-performance systems that support a growing number of AI-powered products.
Drive technical strategy and make architectural decisions that compound - enabling the team to ship more AI experiences faster.
Webflow is building the world’s leading AI-native Digital Experience Platform as a remote-first company built on trust, transparency, and a whole lot of creativity. They empower teams to design, launch, and optimize for the web without barriers, from entrepreneurs launching their first idea to global enterprises scaling their digital presence.
Design, implement, and maintain cloud-based infrastructure using AWS, Azure, or GCP.
Build, optimize, and manage continuous integration and continuous deployment (CI/CD) pipelines.
Integrate AI-powered tooling into engineering workflows to accelerate delivery and improve code quality.
Givebutter is a nonprofit fundraising and CRM platform. They empower millions to raise more, pay less, and give better by offering tools like fundraisers, donation forms, donor management, emails, and text blasts all in one place.
Define and execute a technical vision for Onebrief’s infrastructure.
Design and evolve a deployment strategy focused on AWS and on-prem.
Build security and compliance directly into the infrastructure lifecycle.
Onebrief provides collaboration and AI-powered workflow software designed specifically for military staffs, making them faster, smarter, and more efficient. They have raised $320m+ from top-tier investors and are valued at $2.15B, with a team spanning veterans and technologists.
Design and maintain scalable cloud environments using tools like Terraform, CloudFormation, or Ansible.
Build and optimize automated deployment pipelines to ensure rapid and reliable software delivery.
Implement robust monitoring, logging, and alerting frameworks to ensure 24/7 system health.
CodeRoad offers end-to-end software development services, helping businesses scale with infrastructure solutions. They provide staff augmentation, dedicated IT teams, and software engineering to empower businesses in a digital landscape.
Ensure we continuously deliver and improve our infrastructure platform
Guide the team through massive reduction of manual work through employing an AI-driven engineering approach. You will reach a zero-touch scalable infrastructure, where AI handles dependency management, assists with incident resolution, cloud cost optimization, and developer support - ensuring the team can handle increasing scope and scale without having to grow in size.
Quality surge: You will prepare, define and ensure execution of the strategy for our CI/CD and artifact distribution systems to scale without increasing engineering toil
Camunda is the leader in enterprise agentic automation, orchestrating complex business processes across agents, people, and systems. As a fully remote, global company, they're rewriting the rules of modern business and growing fast, looking for top talent to join their team.