Source Job

Australia

  • Build and evolve systems that enable agents to discover, invoke, and safely execute capabilities across Canva at scale.
  • Design tool schemas and definition patterns that maximize LLM tool selection accuracy and reliable invocation.
  • Build and operate evaluation pipelines that measure tool calling behavior in production and catch regressions.

Java Python TypeScript LangChain

20 jobs similar to Senior Machine Learning Engineer - Agent Tools Interop (AU remote)

Jobs ranked by similarity.

Australia

  • Design and optimise AI-ready tools and APIs that enable LLM platforms to reliably interact with Canva's design capabilities.
  • Build and maintain evaluation frameworks to systematically measure tool-use accuracy across platforms.
  • Experiment with LLM orchestration and agent architectures – Develop Canva agents that any 3rd party provider can call to design quickly, efficiently and at scale.

Canva is a platform redefining how the world experiences design. They have a flagship campus in Sydney, with a second campus in Melbourne and co-working spaces in Brisbane, Perth, Adelaide, and Auckland, NZ.

Australia

  • Drive the design and evolution of AI-ready tools and APIs for LLM platforms.
  • Own and evolve evaluation frameworks that measure tool-use accuracy across platforms.
  • Shape Canva's agent architecture, making strategic technical decisions about intelligence location.

Canva is a design platform that enables users to create various visual content. They have offices in multiple locations in Australia and New Zealand, and they offer a flexible work environment.

Australia

  • Design and build scalable evaluation systems for machine learning models, including generative AI.
  • Collaborate with cross-functional teams to integrate evaluation into the ML lifecycle.
  • Improve experimentation velocity by enabling reliable and automated evaluation workflows.

Canva is a design platform that enables users to create a variety of visual content. They are a global company with offices in multiple locations including Sydney and Melbourne and they value a flexible and inclusive work environment.

Europe 4w PTO

  • Design multi-agent systems to automate business workflows.
  • Build and integrate AI agents using LLMs, APIs, and internal tools.
  • Continuously improve agent performance based on metrics and feedback.

Findies is a social shopping network helping women discover better products for their lifestyle. It's a small, hands-on team looking for professionals who want ownership and the opportunity to shape the business, operating in the e-commerce industry.

US Unlimited PTO

  • Rapidly prototype MVPs using LLM APIs to address business bottlenecks.
  • Develop production-grade internal applications with reliable frontends and robust backends (Python).
  • Design and implement RAG architectures and structured output pipelines grounded in company data.

Bestow is a leading vertical technology platform that serves some of the largest and most innovative life insurers. Their platform unifies the fragmented, legacy value chain, enabling carriers to launch products in weeks instead of years. They are backed by leading investors and trusted by major carriers.

Australia

  • Research, develop and deploy AI based solutions to automate content moderation and review of Canva’s content
  • Work with diverse stakeholders to guide the technical vision while being responsible for break-down and delivery of large projects
  • Design, develop and deploy solutions and hands-on software development – working closely with leads, designers, and product managers

Canva is a design platform that empowers users to create and share visual content. They have campuses in Sydney and Melbourne, with co-working spaces in other Australian cities, and aims for a culture of flexibility and empowerment.

US

  • Build and deploy agent-driven systems that take on real internal workflows.
  • Own a system end-to-end: define the problem, design the architecture, ship to production, and iterate based on real usage.
  • Turn ambiguous, messy processes into structured systems that execute reliably.

Founded in 2020, Nascent builds, expands, and captures opportunity in open markets and permissionless technologies. They deploy assets across a range of liquid and long-term strategies and have made venture investments in 100+ early-stage teams.

$160,000–$190,000/yr
US Unlimited PTO

  • Design, build, and deploy production AI agents and multi-agent orchestration systems.
  • Architect RAG pipelines with vector search and knowledge base management for AI-driven support.
  • Build production microservices and APIs serving as orchestration layers for AI agent systems.

Greenlight is a family fintech company helping parents raise financially smart kids. They serve over 6 million parents and kids with their banking app, aiming to ensure every child has the opportunity to become financially healthy and happy.

US

  • Partner closely with clients to understand their business objectives.
  • Monitor AI agent performance and apply prompt engineering techniques.
  • Identify opportunities for optimization and expand use cases.

AnswerRocket builds transformative AI solutions that drive measurable results for Fortune 2000 enterprises. For over a decade, they've helped industry leaders across sectors harness AI to achieve tangible business outcomes, and they value collaboration, transparency, and inclusivity.

$140,000–$160,000/yr
US 4w PTO

  • Design and build agentic AI systems and RAG pipelines for production features across the marketplace.
  • Integrate LLMs into product experiences across search, categorization, communication, and trust & safety.
  • Partner with Data Scientists and Engineers to turn research into shipped products.

OfferUp is dedicated to creating the simplest and most trusted way for people to buy, sell, and connect in their local communities. OfferUp used by more than 1 in 6 adults in the U.S. in 2024.

  • Build and maintain context infrastructure for AI tools.
  • Design and run evaluation frameworks for AI-generated insights.
  • Build and orchestrate AI agent systems for analytics tools.

Airtable is a no-code app platform empowering people to accelerate critical business processes. More than 500,000 organizations rely on Airtable to transform how work gets done, suggesting a large company size and a culture of innovation.

Global

  • Build and improve AI-enabled product features from prototype to production.
  • Build and maintain product features and supporting backend components, including services, integrations, and internal workflows.
  • Create automated tests and proactively fix issues to keep releases stable, as engineers own quality without a dedicated QA function.

Varicent transforms the Sales Performance Management (SPM) market by redefining how organizations achieve revenue success. They are a market leader in SaaS solutions, empowering revenue leaders globally to design smarter go-to-market strategies and maximize seller performance.

$170,000–$200,000/yr
US Unlimited PTO

  • Contribute to the design and evolution of agentic systems that participate directly in care delivery.
  • Define and build architectural patterns for agent reasoning, tool use, memory, and human-in-the-loop collaboration.
  • Own complex problem spaces end-to-end — from system design and implementation through observability, evaluation, and continuous improvement in production.

Pair Team is building a new kind of healthcare system across Medicaid, Medicare, and public assistance programs. As a public benefit corporation and AI-enabled medical group, they partner with shelters, food pantries, and community organizations to deliver “whole-person” care to the 115 million Americans who rely on the safety net, employing over 500 people while expanding nationally.

Europe

  • Build AI-powered features and agentic workflows
  • Integrate AI into real products and business processes
  • Contribute to AI-Driven Development practices across projects

They are looking for an AI Developer who doesn’t enjoy doing the same work twice, building AI-powered features and agentic workflows integrating AI into real products and business processes. The company values flexibility and adaptability.

$135,000–$175,000/yr
Global

  • Socialize AI capabilities across the organization.
  • Build, integrate, and maintain AI-powered features and workflows.
  • Partner with product to translate user problems into AI-driven features and experiences.

Splitero is at the intersection of financial services and property technology. They are creating a win-win product that could help homeowners access their home equity without making monthly payments like traditional financing options. Splitero fosters a culture of transparency, innovation, and inclusivity, where every voice is heard and every team member is valued.

Europe 5w PTO

  • Contribute to new Agentic AI Agents and manage the safe release to customers.
  • Address UXD, infrastructure, and machine learning challenges across the stack.
  • Own projects spanning months, breaking problems into iterative steps for delivery and validation.

They are the world’s leading AI video platform for business, used by over 90% of the Fortune 100. As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development.

$164,523–$164,523/yr
5w PTO

  • Ship flagship agentic capabilities: Deliver high-impact agentic workflows end-to-end.
  • Build and operate production-grade agent systems: Design reliable agentic systems that behave predictably under real-world constraints.
  • Create shared foundations for agent delivery: Develop the core primitives that enable teams to build agents consistently.

They are rebuilding the energy transaction, making it transparent and fair. Tem exists to fix a broken global energy market and is scaling internationally. The company has closed a $75 million Series B and has facilitated thousands of business customers.

Australia

  • Developing ranking and recommendation models that identify high-performing team designs.
  • Building brandification pipelines to conform to an organisation's brand guidelines.
  • Building layout extraction and understanding systems that parse Canva's design format.

Canva is a design platform that makes it easy for anyone to create professional-looking designs. They have a flagship campus in Sydney, a second campus in Melbourne, and co-working spaces in Brisbane, Perth, & Adelaide, and provides flexibility in how and where you work.

$100,000–$150,000/yr
US

  • Contribute to the development of software infrastructure to support the creation of agentic systems.
  • Deploy and operate cloud-hosted services for use by the research community.
  • Define key directions to keep UChicago at the forefront of AI/ML and national data infrastructure.

The University of Chicago delivers solutions to the research community worldwide through Globus, a sustainable, non-profit unit. They develop cloud-based software for governmental, academic, and commercial organizations, emphasizing data management challenges, and house employees in downtown Chicago and remotely.

US

  • Design and implement production-grade RAG pipelines and agentic workflows using Python.
  • Evaluate new models and prototype approaches for SBIR/government deliverables.
  • Document architectures and contribute to technical reports for contract deliverables.

Unstructured is focused on transforming unstructured data into a format usable by LLMs. Their Public Sector team works on high-impact contracts and seek to bridge the gap between custom builds and a scalable product roadmap.