Design, implement, and maintain distributed ingestion pipelines for structured and unstructured data.
Build scalable ETL/ELT workflows to transform, validate, and enrich datasets for AI/ML model training and analytics.
Support preprocessing of unstructured assets for training pipelines, including format conversion, normalization, augmentation, and metadata extraction.
Meshy is a leading 3D generative AI company transforming content creation by enabling the creation of 3D models from text and images. They have a global team distributed across North America, Asia, and Oceania and are backed by venture capital firms like Sequoia and GGV, with $52 Million in funding.
Building data systems for grant-funded outcomes, enabling standardization and connection with primary data pipelines.
Creating and maintaining internal data dashboards to track Community Growth programs' outcomes and impacts.
Identifying actionable insights from trends across datasets to inform reports and decision-making.
The Wikimedia Foundation operates Wikipedia and other Wikimedia free knowledge projects, envisioning a world where every human can freely share in the sum of all knowledge. As a charitable, not-for-profit organization, they rely on donations and institutional grants to support volunteer communities and advocate for policies that enable free knowledge to thrive.
Design, build, and scale performant data pipelines and infrastructure, primarily using ClickHouse, Python, and dbt.
Build systems that handle large-scale streaming and batch data, with a strong emphasis on correctness and operational stability.
Own the end-to-end lifecycle of data pipelines, from raw ingestion to clean, well-defined datasets consumed by downstream teams.
Nansen is a leading blockchain analytics platform that empowers investors and professionals with real-time, actionable insights derived from on-chain data. We’re building the world’s best blockchain analytics platform, and data is at the heart of everything we do.