We are looking for a Cloud Observability and Performance Engineer to join our Chaos Cloud Engineering team. In this role, you will design and implement observability, monitoring, and performance strategies for cloud-hosted microservices that manage and orchestrate endpoint security agents at scale. This position is critical to ensuring the reliability, visibility, and performance optimization of our backend systems that power cloud-based security operations for millions of endpoints worldwide.
Job listings
As a Site Reliability Engineer at Masabi, you will ensure the platform's reliability, performance, and security. In this role, you'll be pivotal to scaling and modernising our platform while ensuring uptime, performance, and security. You'll work across legacy and modern infrastructure, drive key improvements, and collaborate closely with architecture and product teams to enable reliable delivery across the business.
We're looking for a Site Reliability Engineer to join Masabi and be at the forefront of ensuring our platform's reliability, performance, and security. In this role, you'll be pivotal to scaling and modernising our platform while ensuring uptime, performance, and security. You'll work across legacy and modern infrastructure, drive key improvements, and collaborate closely with architecture and product teams to enable reliable delivery across the business.
Join the CoreWeave Kubernetes Service (CKS) team as a Senior Software Engineer, the core of CoreWeave's infrastructure stack, where youโll help scale and evolve one of the largest Kubernetes environments in the industry. As a Senior Engineer, youโll deliver impactful features, improve system performance and reliability, and grow into a key technical contributor.
In this fully remote position, youโll be developing complex products and working with an infrastructure processing petabytes of data. Expect challenges that will elevate your expertise, loads of ownership, the latest tech stack, and effective collaboration with a large team of engineering professionals. Maintain current product infrastructure and participate in building Private/Public Cloud based platforms.
Planning is managed using a backlog tool and change requests are related to listed tools. The main task is to set up components and integrate interfaces in an automated way. The role involves executing tasks, participating in daily team calls, and maintaining, operating, and debugging enterprise tools to troubleshoot and resolve incidents, handle support tickets, and assist users. Integrating data sources into dashboards and log archives, developing predictive analytics models, and analyzing time-series data for forecasting is required.
Optimize critical engineering applications to ensure reliability, scalability and security. Establish tooling and automation processes for infrastructure as code, service recovery and service monitoring. Provide guidance and standards on application capabilities and usage. Develop and maintain infrastructure and security guidelines, procedures, and documentation to streamline operations and promote knowledge sharing.