Source Job

US 20w maternity 12w paternity

  • Architect and optimize distributed training and inference systems for large-scale AI models
  • Design and deliver customer-focused solutions that maximize performance and business value
  • Lead the transition of ML pipelines from POC to scalable production systems

Python Terraform Kubernetes PyTorch CUDA

20 jobs similar to Senior AI/ML Specialist Solutions Architect

Jobs ranked by similarity.

Global

  • Contribute to the development of the Everywhere Inference platform, a Kubernetes-based solution.
  • Design and implement APIs and developer tools to simplify deployment, management, and monitoring of AI applications.
  • Optimize serverless container workflows for AI workloads, ensuring performance, scalability, and seamless autoscaling.

Gcore provides infrastructure and software solutions for AI, cloud, network, and security. They have 550+ professionals globally and collaborate with technology partners such as Intel, NVIDIA, Dell, and Equinix.

EMEA

  • Design and implement tooling that enables researchers to quickly deploy and evaluate new models in production
  • Design, build, and maintain high-performance, cost-efficient inference pipelines, making architectural decisions about scaling, reliability, and cost trade-offs
  • Proactively identify and resolve infrastructure bottlenecks, proposing and scoping improvements to iteration speed and production reliability

AssemblyAI builds best-in-class Speech AI models that power the next generation of voice applications. They are a remote team building one of the next great AI companies where teammates define and build their company culture.

$107,000–$145,000/yr
Canada

  • Support the full operational lifecycle of both traditional machine learning systems and emerging generative AI driven applications.
  • Enable scalable training, evaluation, deployment, and monitoring for a wide range of ML and GenAI workloads.
  • Manage model upgrades, framework versions, regression testing, maintenance tasks and maintaining performance across systems and solutions.

Achievers' employee recognition and rewards platform empowers organizations to build cultures where people feel seen and valued, everyday. They're a team of passionate, thoughtful builders with more than 4.3 million users across 190 countries, who care deeply about their product, their customers, and each other.

US

  • Design machine learning solutions and execute projects from proof-of-concept to production.
  • Collaborate with business representatives to gather and understand requirements.
  • Oversee all project phases including problem definition, data annotation, and training documentation.

Jobgether is a platform that connects job seekers with companies. They use AI-powered matching to ensure applications are reviewed fairly.

US Canada

  • Own delivery of large, cross-functional initiatives across Cerebras’ AI training and inference platforms.
  • Partner with engineering, product, hardware, and infrastructure teams to define scope, priorities, and timelines.
  • Turn ambiguous goals into clear roadmaps, milestones, and measurable outcomes.

Cerebras Systems builds the world's largest AI chip, which is 56 times larger than GPUs. They empower machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs, and have a simple, non-corporate work culture that respects individual beliefs.

Australia New Zealand

  • Building world-class AI infrastructure to support a 100+ person research team.
  • Designing and scaling multi-cloud systems that support high-performance model training and inference.
  • Improving monitoring, alerting and system observability for AI workloads

Canva is redefining how the world experiences design. It has campuses in Sydney and Melbourne, and co-working spaces in other major cities, trusting employees to choose the balance that empowers them and their team to achieve their goals.

$117,180–$154,588/yr
Canada

  • You will work to build, maintain and improve our Torc ML frameworks.
  • You have built ML solutions that have reached production.
  • You want to build, maintain, grow, and improve our ML platform.

Torc has been a leader in autonomous driving since 2007. Now a part of the Daimler family, they are focused solely on developing software for automated trucks to transform how the world moves freight.

US

  • Define and evolve the technical vision for AI and agentic systems across products.
  • Design orchestration, data, and serving patterns that handle global scale with reliability.
  • Collaborate with AI Research to turn prototypes into extensible, governed production frameworks.

KnowBe4 is a cybersecurity company that puts security first, empowering over 70,000 organizations worldwide to strengthen their security culture. They value radical transparency, extreme ownership, and continuous professional development in a welcoming workplace that encourages all employees to be themselves.

Global

  • Co-create evaluation frameworks, proctoring solutions, and critical security mechanisms.
  • Evaluate and implement state-of-the-art AI/ML techniques.
  • Design, build, and deploy scalable AI services and pipelines.

The company develops and scales a global Certification-as-a-Service platform that automates the entire lifecycle of professional exams. The solution enables companies and organizations to quickly and inexpensively create and administer official online exams/certifications.

Global

  • Partner with teams to co-design scalable solutions.
  • Lead deployments, considering security and maintainability.
  • Work with customers to design tailored solutions.

Sama provides high-quality training data that powers AI technology for Fortune 2000 companies. They are experts in data annotation, supporting data for machine learning algorithms and generative AI models and committed to expand opportunities for those who are underprivileged.

North America 4w PTO

  • Partner with stakeholders to tackle technical problems at scale, building framework agnostic services.
  • Establish roadmap and architecture for Wealthsimple’s Machine Learning platform.
  • Build highly performant scalable systems, contributing to our ML platform on Kubernetes, Bedrock and Sagemaker.

Wealthsimple aims to provide financial freedom by making financial services transparent and low-cost. As the largest fintech company in Canada, with over 1,500 employees, they manage over $100 billion in assets and foster a collaborative and quality-focused culture.

Canada

  • Design and build advanced machine learning models for generative tasks.
  • Optimize models for performance enhancements and scalability.
  • Preprocess and manage large datasets for model training.

Jobgether is a platform that connects job seekers with companies. They use an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements.

Europe

  • Lead and manage a team of ML engineers, scientists, and researchers, fostering mentorship, development, and retention.
  • Execute the machine learning roadmap, focusing on 3D deep learning, computer vision, and generative AI applications.
  • Partner with product, engineering, and research stakeholders to align technical strategy with business objectives.

Jobgether is using an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They appreciate your interest and wish you the best!

  • Design and deploy high-performance agentic systems that leverage Fastino’s optimized model architectures.
  • Collaborate with engineering teams to turn novel architectural breakthroughs into scalable solutions for enterprise customers.
  • Drive rapid, iterative prototyping of AI functionalities, refining model performance and task-accuracy based on real-world telemetry.

Fastino is building the next generation of LLMs with a team of alumni from Google Research, Apple, Stanford, and Cambridge and has developed the GLiNER family of open source models. Fastino has raised $25M through seed round and is backed by leading investors including Microsoft, Khosla Ventures, and Insight Partners.

UK

  • Act as the overall technical authority for the programme, owning architectural decisions, execution patterns, and technical quality across all workstreams.
  • Define and enforce standard migration patterns for moving ML workloads from Databricks into AWS SageMaker, while managing exceptions for complex or legacy cases.
  • Lead and contribute across areas such as AWS SageMaker-based ML execution, Databricks to SageMaker migration, and Python-based ML workloads.

CreateFuture is a digital consultancy that builds digital products and services. They have over 500 people and a safe, supportive, and friendly culture.

Europe 5w PTO

  • Design, implement, and manage AI Platform architecture.
  • Control AI-related costs, including models, GPUs, and other resources.
  • Collaborate with ML teams to operationalize AI models and integrate them into systems.

Docplanner empowers patients by giving them access to leave and read reviews about their visit and provides doctors with the technology to manage bookings easily and save time. They are leaders in 13 countries with 2,500+ employees globally and maintain a startup-mindset.

$170,000–$190,000/yr
US

  • Lead the end-to-end machine learning lifecycle.
  • Own and evolve ML Ops architecture, including CI/CD for models.
  • Serve as a player-coach, contributing directly to design reviews.

AvaSure is revolutionizing healthcare with cutting-edge virtual care solutions that protect patients and empower clinical teams. We're proud of our collaborative culture where innovation thrives and every team member is valued.

US Canada 3w PTO 20w maternity

  • Design, build, and maintain machine learning model productionization infrastructure.
  • Streamline model training, validation, and deployment in collaboration with the data science team.
  • Implement robust monitoring and alerting for model performance, drift, and data quality.

The Athletic delivers in-depth coverage of sports, teams, and athletes. Their newsroom of 500+ full-time staff covers hundreds of professional and college teams across North American markets and football clubs.

North America Europe Asia

  • Build and productionize LLM and NLP models across retrieval, summarization, classification, and generative tasks.
  • Design and implement scalable ML services and inference pipelines in Python using modern ML frameworks.
  • Translate complex NLP and LLM product requirements into structured engineering plans with clear milestones.

Loopio provides a workplace that recognizes the advantages of working flexibly, operating as a remote-first company. They have established hub regions around the world and foster a supportive culture with opportunities for connection.

$170,000–$240,000/yr
US Unlimited PTO

  • Own SentiLink’s real-time ML model monitoring domain.
  • Own our ML experimentation, model tracking, and versioning infrastructure.
  • Drive improvements to the model development process.

SentiLink provides identity and risk solutions for secure transactions. They are backed by investors like Craft Ventures and Andreessen Horowitz, recognized by Forbes Fintech 50, and have offices across the U.S. and India.