Design, implement, and manage AI Platform architecture. Control AI-related costs, including models, GPUs, and other resources. Work closely with Product teams to provide technical expertise and propose innovative solutions. Guarantee highly available AI services through best practices and automation. Collaborate with ML teams to operationalize AI models and integrate them into systems. Troubleshoot critical issues and continuously optimize system performance.
Job listings
We are looking for a seasoned Site Reliability Engineer (SRE) to join our distributed team. This is a fully remote, work-from-home opportunity. As a key member of our DevOps team, you will be responsible for designing, implementing, and maintaining mission-critical monitoring, alerting, and incident response systems. This role ensures high availability, reliability, and performance of our infrastructure, supporting scalable services in production environments.
Seeking an experienced Azure Cloud Engineer to specialize in migrating and modernizing applications to the cloud. The ideal candidate will have deep expertise in Azure Cloud, Terraform (Enterprise), containers (Docker), Kubernetes (AKS), CI/CD with GitHub Actions, and Python scripting. Strong soft skills are essential to communicate effectively with technical and non-technical stakeholders during migration and modernization projects.
You will design architectures implemented in Cloud Services and implement deployment strategies for multi-tenant systems, ensuring scalable and reliable distribution of modules. You will use Terraform to define, provision, and manage cloud infrastructure. Additionally, you'll implement and manage CI/CD pipelines to automate the deployment process, ensuring seamless updates and rollouts of new features and modules.