Source Job

US Europe

  • Conduct fundamental and innovative development in low-cost yet powerful vision-language models (VLM), unified models, automatic model compression, optimization and deployement on cloud and edge.
  • Design or implement state-of-the-art techs on model compression, inference speedup, deployement on harwares, tool automation.
  • Contribute to library and tool development to support business; or Publish influential research in top-tier conferences and journals.

Python

5 jobs similar to Research Intern – Multimodal Foundation Model for Vision

Jobs ranked by similarity.

Global

  • Experiment with novel language model architectures, helping drive and execute Fastino's research roadmap
  • Optimize Fastino’s multimodal models to improve response quality, instruction adherence, and overall performance metrics
  • Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories

Fastino is building the next generation of LLMs. Their team, boasting alumni from Google Research, Apple, Stanford, and Cambridge, is on a mission to develop specialized, efficient AI and has raised $25M through their seed round.

  • Design, build, and deploy the critical small language models that are foundational to Fastino’s product.
  • As an engineer, you will own the full lifecycle of our state of the art models, from prototyping and data analysis to deployment and monitoring.
  • Drive the data strategy to continuously improve model performance by analyzing distribution gaps and contributing to synthetic data pipelines.

Fastino is building the next generation of LLMs, with a team of alumni from Google Research, Apple, Stanford, and Cambridge. They have raised $25M through their seed round and are backed by leading investors including Microsoft, Khosla Ventures, and Insight Partners.

$110,011–$160,000/yr
US

  • Investigate new feature extraction and data augmentation techniques for generative image/video detection.
  • Collaborate with scientists and engineers across the organization. Perform research into deepfake image/video detection.
  • Aid in integrating insights gained from research into RD products.

Reality Defender, Inc. focuses on detecting and defending against manipulated media. We didn't find information about company size, culture, or number of employees.

$160,800–$212,300/yr
Canada US

  • Implement the latest research advances in Neural Rendering and generative models.
  • Translate cutting edge solution in the domain of autonomous driving for high-quality Camera, LiDAR and Radar sensor simulations.
  • Design, implement, test and deploy shippable production quality software starting from early prototypes using disciplined software development processes.

Torc is dedicated to transforming travel, freight, and business through autonomous vehicle technology. As a part of the Daimler family since 2007, they're focused on creating software for automated trucks, fostering a collaborative, energetic, and team-focused culture.

Europe North America 7w PTO

  • Profile large-scale training workloads and identify communication and computation bottlenecks.
  • Custom kernel development to improve training performance.
  • Enhance and maintain our training and inference codebases.

Poolside aims to be the company that builds a world where AI powers economically valuable work and scientific progress. Their team is a multidisciplinary blend of research, engineering, and business experts distributed across Europe and North America, fostering a culture of collaboration and hard work.