Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.
Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.
Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Maximize the velocity of our product engineering team.
Ensure platform scalability, reliability, and security.
Champion best practices and shape the engineering culture.
They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.
Implement SLI/SLO frameworks with error budgets to drive reliability decisions
Design release strategies including blue/green deployments and version tracking
Lead incident response and develop automated runbooks to reduce MTTR
Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.
Standardize CI/CD pipelines (GitHub Actions) and Helm charts across 10+ microservices
Build centralized logging, metrics, and alerting (currently a gap)
Extend Terraform to cover full AWS infrastructure
Kiefer Tech delivers cutting-edge AI, robotics, and enterprise solutions across Greece and the EU, leveraging over 20 years of engineering heritage from the Green Energy sector. As the technology arm of Kiefer, they are guided by innovation, quality, and long-term client partnerships and are building sovereign AI infrastructure.
Build and maintain CI/CD pipelines and infrastructure-as-code.
Lead observability and monitoring initiatives.
Truelogic is a nearshore staff augmentation services provider headquartered in New York. They deliver technology solutions to companies of all sizes, helping them achieve their digital transformation goals with a team of 600+ highly skilled tech professionals based in Latin America.
Provide and own automation of the provisioning of CSP resources, including networking, Kubernetes clusters and specific CSP resources required by our application teams.
Work with users (Grafana Cloud application teams) to help understand their needs and ensure investment in the right capabilities.
Participate in the Platform department Infrastructure wing on-call rotation.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. The team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything that they do.
Design, build, and maintain our core cloud infrastructure on AWS/GCP using Infrastructure as Code.
Manage and scale our mission-critical services on Kubernetes, ensuring high availability and resilience.
Enhance and operate our CI/CD systems and developer tools within a GitLab-based workflow.
Mambu is a leading SaaS cloud banking platform that is on a mission to make banking better for a billion people. They empower customers to build innovative and secure financial products, and power billions of transactions for millions of end-users.
Automate the provisioning of all of Juniper Square’s infrastructure in code.
Partner with our Platform Engineering team on building developer tooling / improving developer experiences via joint initiatives and enhancements.
Partner with our Data Engineering team on improving our data posture and driving operational excellence.
Juniper Square's mission is to unlock the full potential of private markets by digitizing them to bring efficiency, transparency, and access. They are a values-driven organization with a hybrid workplace strategy, allowing employees to collaborate effectively across multiple countries and offering physical offices in several major cities.
Operate and maintain large-scale data systems, ensuring stability and performance.
Design, implement, and optimize deployment processes using virtualization.
Monitor system health, analyze failures, and identify instability sources.
Jobgether is a platform that uses AI-powered matching to connect candidates with companies. They ensure applications are reviewed quickly, objectively, and fairly, then share a shortlist of top candidates directly with the hiring company.
Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure
Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available
Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one
Peec AI is one of Europe’s fastest-growing Series A startups (no employee count/culture details given). They provide exciting and challenging work in the AI space.
Design, deploy and maintain a cloud infrastructure to support a Dataiku SaaS offering mainly on AWS and Azure and GCP
Continuously improve the infrastructure, deployment and configuration to deliver more reliable, resilient, scalable and secure services
Automate as much as possible all technical operations
Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. They connect many data science technologies and integrate the best of data and AI tech.
Build and maintain Infrastructure as Code to power our production systems, Python tools to automate toil, and monitoring systems to detect problems early.
Independently execute on large DevOps projects such as major migrations, product rollouts, and infrastructure enhancements
Participate in the infrastructure on-call rotation & incident response process, including triaging alerts, coordinating responders, and contributing to blame-free RCAs. Leverage senior level expertise to drive rapid resolutions.
Super.com aims to maximize the lives of both customers and employees, providing opportunities to unlock potential through learning and impact. They are a fast-paced, high-growth tech company that values career progression and supports employees through various programs.
Lead reliability-focused design and readiness reviews.
Build, operate, and continuously improve our observability stack.
Own and evolve incident management practices.
Transcend is building the privacy platform that easily embeds privacy into your entire tech stack. They are growing quickly, backed by top-tier investors and are proud to serve some of the world's most iconic brands.
Develop automation to eliminate manual and repetitive operational tasks.
Investigate and resolve customer complaints escalated beyond L1 and L2 support.
Moniepoint is an all-in-one financial services platform for emerging markets. Since 2019, Moniepoint’s technology has powered over 3 million people, offering personal and business banking, payment, credit and business management tools to help them succeed.
Lead cross-team infrastructure security initiatives from design through delivery, owning technical outcomes and stakeholder communication
Design and implement security solutions for cloud infrastructure, container platforms, and orchestration systems
Partner with SRE, Infrastructure, and Engineering teams to integrate security into platform services and deployment pipelines
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Their mission is to enable everyone to contribute to and co-create the software that powers our world.
Design, architect, implement, review, and test frameworks, libraries, tools, and services primarily using Go.
Participate in requirement, design, planning, and retrospective meetings as an integral part of an Agile software development team.
Be an active maintainer of Mirantis projects by managing contributions and patches to open-source projects, reviewing submissions, and participating in design decisions.
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI, machine learning, and data-intensive applications. They combine open source innovation with expertise in Kubernetes orchestration, empowering platform engineering teams to deliver composable developer platforms across any environment.
Develop automation code to provision and operate infrastructure at scale.
Build resilient, scalable, secure, and observable services with cost optimization.
Proactively identify and address security concerns across systems and infrastructure.
Globality uses AI to transform enterprise spending into a more efficient and inclusive process. They aim to revolutionize enterprise procurement with AI and have a culture built on trust, collaboration, and innovation, fostering an environment where every individual feels valued and included.
Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack.
Manage cloud infrastructure and optimize costs, particularly in AWS environments using Terraform and Python.
Design, develop, and maintain CI/CD pipelines and infrastructure for AI model training and deployment.
Ensure platform scalability and efficient resource utilization.
NEORIS, now part of EPAM Systems, is a Digital Accelerator that helps companies step into the future. With more than 20 years of experience as Digital Partners to some of the world’s leading organizations, they are over 4,000 professionals across 11 countries and foster a multicultural, startup-minded culture that promotes innovation, continuous learning, and the delivery of high-impact solutions for their clients.