Weโre looking for a Senior Software Engineer with an interest in infrastructure to join our Platform & Resiliency team to configure, maintain and improve our systems running on Kubernetes in GCP, configured via Terraform. Youโll enhance our monitoring and observability systems which use Prometheus, Grafana and OTel and build and improve our product hosting platform.
Job listings
You will closely collaborate with AI scientists and software developers to design, build, and maintain compute infrastructure (AWS Kubernetes, AWS ECS, Lambda and EC2 instances) that powers everything Artera does at scale, setting Compute Infrastructure Engineering vision and driving both independent and collaborative software development projects end-to-end.
Employees up to 50 km from the office (Katowice): 4 times a month Employees from 50 to 100 km from the office: 2 times a month Employees over 100 km: fully remote work. Developing and maintaining infrastructure using infrastructure as code approach and using CI/CD tools and scripts to create automatic application deployments for microservices. Helping developers in daily activities related to tools maintenance. Working in collaborative and iterative software development process with agile teams.
Roadie is seeking a Senior Site Reliability Engineer to join our growing Technical Operations Team. We are looking for a candidate who has experience implementing site reliability principals, as well as production level Kubernetes experience. The ideal candidate is a skilled problem solver with intimate knowledge of site reliability practices, standard dev ops principles, AWS, scripting languages and Kubernetes.
You'll focus on designing, deploying, and maintaining infrastructure and automation within Azure Cloud environments. This role emphasizes best practice leadership, technical expertise in Azure, and collaboration across various teams, providing opportunities for independent and team-driven work. Day-to-day challenges include deploying Azure Cloud infrastructure, managing Azure DevOps, serving as a technical expert, and optimizing IaC pipelines.
Responsible for designing, implementing, and maintaining scalable and reproducible pipelines. Acts as a bridge between engineering and operations, ensuring the reliable deployment, monitoring, and governance of software in production environments. Develop and maintain robust data pipelines, adhering to best practices in modularity, version control, and data lineage.
Design, build, and operate distributed systems that power observability across ClickHouse Cloud. Take part in the on-call rotation and help drive root-cause resolution and long-term fixes. Build tooling and automation to eliminate repetitive operational work.
Lead the planning, execution, and manage our observability infrastructure, which processes trillions of observability events (logs, traces, metrics) daily. Create and manage monitoring, logging and alerting systems utilizing various technologies such as GrafanaLab, CaptainHook, Zabbix, fluentd, filebeat, ELK, Kafka, Prometheus, OpenTelemetry and other related tools. Design and develop parts of a highly scalable software observability platform.
This is a pivotal moment as we migrate more of our infrastructure and services to container-based technologies and transition from Azure to GCP. You will have the opportunity to lead critical projects, collaborate with cross-functional teams, and contribute to the evolution of our infrastructure. Youโll work closely with developers to ensure efficient and secure deployments while participating in a rotating support schedule.
Lead the design, deployment, and optimization of scalable machine learning pipelines, focusing on Generative AI and large language models (LLMs). This role involves collaboration across teams to streamline workflows, ensure system reliability, and integrate the latest MLOps tools and practices. You will work on projects that push the boundaries of AI-powered insights and automation in a forward-thinking environment.