Job Description
This role is responsible for all aspects of data collection to support our model training operations. We are able to build high-quality datasets at petabyte-scale and low cost through a tight integration of infrastructure, engineering, and research work. You will be scrappy to find new sources of audio data and bring it into our ingestion pipeline. Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform. Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models.
About Speechify
Speechify’s text-to-speech products turn PDFs, books, Google Docs, news articles, websites into audio, helping 50 million people read faster and remember more.