Remote Data Jobs · Spark

Job listings

$150,000–$165,000/yr
US Unlimited PTO 11w maternity

  • Partner with our customer teams to develop engineering plans to implement our health system partners
  • Build and support robust batch and streaming pipelines
  • Evolve the maturity of our monitoring systems and processes to improve visibility and failure detection in our infrastructure

Paradigm is rebuilding the clinical research ecosystem by enabling equitable access to trials for all patients. Incubated by ARCH Venture Partners and backed by leading healthcare and life sciences investors, Paradigm’s seamless infrastructure implemented at healthcare provider organizations, will bring potentially life-saving therapies to patients faster.

  • Lead and mentor a team of data engineers, fostering innovation, collaboration, and continuous improvement.
  • Design, implement, and optimize scalable data pipelines and ETL processes to meet evolving business needs.
  • Ensure data quality, governance, security, and compliance with industry standards and best practices.

Jobgether is a platform that connects job seekers with companies. They use an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements.

$85,000–$90,000/yr
US 4w PTO

  • Write and deploy crawling scripts to collect source data from the web
  • Write and run data transformers in Scala Spark to standardize bulk data sets
  • Write and run modules in Python to parse entity references and relationships from source data

Sayari is a risk intelligence provider equipping sectors with visibility into commercial relationships, delivering corporate and trade data from over 250 jurisdictions. Headquartered in Washington, D.C., its solutions are trusted globally and recognized for growth and workplace culture.

$179,000–$277,000/yr

  • Lead the conceptual shift from general "data quality" to Data Reliability across the entire organization.
  • Design, prototype, and champion a single, centralized "Data Reliability Source of Truth" platform to measure and display reliability KPIs.
  • Create the technical framework and reference architecture necessary to automate the creation, deployment, and monitoring of Data Reliability checks.

Komodo Health's mission is to reduce the global burden of disease through smarter use of data and its Healthcare Map, the industry’s largest view of the U.S. healthcare system.

Design, build, and maintain a robust, self-service, scalable, and secure data platform. Create and edit data pipelines, considering business logic, levels of aggregation, and data quality. Enable teams to access and use data effectively through self-service tools and well-modeled datasets.

We are Grupo QuintoAndar, the largest real estate ecosystem in Latin America, with a diversified portfolio of brands and solutions across different countries.

Develop analytical data products using Airflow, DataProc, PySpark, and BigQuery on the Google Cloud Platform, with solid data warehouse principles. Build data pipelines to monitor data quality and analytical model performance. Maintain the data platform infrastructure using Terraform and develop, evaluate, and deliver code through CI/CD.

CESAR is an innovation and education center that has been training people and driving organizations for almost 30 years, leveraging their digital strategies.