Design and implement comprehensive evaluation frameworks that reflect real-world task success for agentic systems, with a focus on human+AI collaboration outcomes
Build benchmarking pipelines that capture nuanced success indicators including trust calibration, intervention frequency, and agent handoff quality
Collaborate with researchers, engineers, and product teams to align evaluation methodologies with business and user goals
Upwork is the world’s human and AI-powered work marketplace that connects businesses with highly skilled, AI-enabled independent talent from across the globe. From entrepreneurs to Fortune 100 enterprises, companies rely on Upwork’s trusted platform to find and hire expert talent. They have facilitated more than $25 billion in economic opportunity for talent around the world and their culture is built on trust, risk-taking, customer focus, and excellence.
Support design and monitoring of EWA underwriting strategies and forecasting tools.
Query, clean, and analyze large bank transaction datasets using SQL and Python, or R/SAS).
Partner with Product, Engineering, Data Science, and Operations to implement updates to policies and decisioning logic.
Self Financial is a venture-backed, high-growth FinTech company with a mission to increase economic inclusion and financial resilience by empowering people to build credit and build savings. Their team is passionate about challenging the status quo of the credit industry by providing people accessible tools to take control of their credit.
Collect, clean, and process data from diverse sources.
Analyze large datasets using statistical methods.
Provide data-driven recommendations and actionable insights.
WCG partners with governments and local agencies across Canada to create sustainable employment opportunities. They understand that work gives hope, strengthens relationships and drives economic growth.
Own and drive end-to-end product analytics and experimentation for key consumer-facing products and initiatives
Design, execute, and analyze A/B and multivariate experiments to evaluate product, UX, pricing, and growth initiatives
Conduct deep-dive analysis on user behavior, funnels, cohorts, and journeys to identify opportunities for growth
Binance.US is America’s home to buy, trade, and earn digital assets. As a licensed and regulated U.S. crypto platform, they provide secure, reliable access to more than 190 of the world’s most popular cryptocurrencies. They're a remote-first team of innovators building the bridge between traditional finance and Web3, helping bring financial freedom within reach for all.
Lead the team and provide expert, hands-on technical guidance.
Champion analytics best practices, including effective visualization, data storytelling, documentation, testing, and version control.
Equip is the leading virtual, evidence-based eating disorder treatment program with a mission to ensure that everyone with an eating disorder can access treatment that works. Founded in 2019, Equip has been a fully virtual company since its inception and is proud of its highly-engaged, passionate, and diverse team.
Write DBT SQL pipelines to automate and manage data assets in Snowflake for our product partners.
Conduct statistical analysis of product usage and engagement data using Python/R with Hex.tech.
Use Sigma Computing to find insights in our customer journey and conduct business enhancing analytical projects.
Vanta helps businesses earn and prove trust by making security continuous and easily verifiable. They have a kind and talented team and thousands of companies rely on them in a way that's real-time and transparent.
Conduct independent model validation of existing models.
Determine aspects of model drift for model risk management.
Drive improvements in model monitoring activities.
Cotiviti focuses on payment integrity issues, reducing the cost of healthcare processes, and improving healthcare outcomes. They are an equal opportunity employer that values its team members and offers a competitive benefits package.
Partner with data science, ML engineering, product owners, and data engineering to define capability-usage metrics.
Build, maintain, and scale data solutions and self-service reporting for aligned DS products and stakeholders.
Define business-impactful metrics and ensure visibility into DS platform and product health, usage, adoption, and outcomes.
Liberty Mutual strives to be a place where everyone feels valued and supported. They foster an inclusive environment with workplace flexibility and professional development opportunities.
Writes high engineering quality SQL and Python based transformations running in cloud data warehouses.
Serves as the responsible individual for major sections of the Enterprise Dimensional Model.
Collaborates with product managers and fellow engineers to establish and refine requirements.
KnowBe4 is a cybersecurity company that puts security first. Their AI-driven Human Risk Management platform empowers organizations to strengthen their security culture. The team values radical transparency, extreme ownership, and continuous professional development in a welcoming workplace.