Jobs

Explore roles at AlleyCorp portfolio companies

Senior Data Engineer

Convoy Health

Data Science

United States

Posted on May 22, 2026

Apply now

About the Role

Convoy Health is an early-stage, venture-backed healthcare AI company. Small founding team. You'll own meaningful surface area from day one and ship product into real customer environments within weeks, not quarters. Our platform combines data engineering, analytics, and agentic AI to turn complex healthcare data into actionable intelligence.

As a Senior Data Engineer, you'll own the data infrastructure end-to-end. Every insight the AI agent surfaces, every dashboard a user opens, every forecast model that runs — it all starts with the data platform you build and operate. This means ingestion pipelines across diverse source systems, a multi-tenant warehouse, transformation layers that normalize heterogeneous healthcare data, and the analytics foundation that serves both human users and autonomous AI agents.

You're expected to use AI-assisted development tools daily and to design data systems with agentic consumption patterns in mind.

What You'll Do

Data Platform & Architecture

Design and operate a dual-database architecture: a columnar analytics warehouse with schema-per-tenant isolation and a relational OLTP database for profiles, config, audit logs, and AI conversations
Implement and enforce tenant data isolation: schema-level grants, session-level context injection, defense-in-depth filtering, and row-level security policies
Build the data foundation that serves both human-facing dashboards and agentic tool-use queries — optimizing for the distinct access patterns of each

Multi-Source Ingestion

Build and operate ingestion pipelines for diverse healthcare data sources:
Claims: ANSI X12 837I/837P/835/834 and CSV formats via workflow orchestration, serverless parsers, and bulk loading
EMR/EHR: HL7 FHIR extracts, flat-file exports from major EHR systems (Epic, Cerner, Athena, eClinicalWorks, etc.)
RCM: Revenue cycle data from billing platforms — charge capture, A/R aging, denial management, collections
Financial: General ledger exports, budget files, contract rate tables, fee schedules, capitation rate cards
Operational: HRIS staffing data, scheduling system exports, patient volume feeds
Payer: Remittance files, eligibility responses, authorization data, quality measure submissions
Own the CMS public dataset ingestion pipeline: NCCI edits, NPPES, Provider Utilization, MS-DRG, HCRIS, Hospital Compare, Physician Fee Schedule, and more — with workflow orchestration, schema validation, external tables, and operational copies

Transformation & Modeling

Manage the dbt transformation pipeline: input layer ingestion, core data model normalization (claims, members, providers, encounters), and data mart outputs (PMPM, CMS-HCC, readmissions, quality measures, CCSR, chronic conditions, ED classification)
Build domain-specific analytical models beyond claims: visit economics (slot utilization, no-show rates, revenue per visit), VBC performance (quality measures, shared savings, risk adjustment), financial forecasting (revenue projections, budget variance, TCOP), and provider benchmarking (peer cohort construction, percentile distributions)
Build custom domain tables: contracts, fee schedules, capitation rates, delegation agreements, DOFR surveillance, capitation reconciliation, facility claims audit, TCOP, budget/forecast, JV attribution
Operate cross-tenant benchmarking pipelines: zero-copy data sharing, monthly aggregate computation, de-identified metric distributions, CMS national benchmark integration

AI & ML Data Foundation

Provide the data foundation for ML models: feature store tables, statistical process control metrics, population surveillance outputs, claim scoring results, vector embedding storage, and feedback loop tables
Design and optimize data access patterns for agentic AI workflows — the AI agent queries your data layer autonomously via tool use, so schema design, indexing, and query performance directly impact agent quality
Support the analytics query layer: data model definitions mapping to transformation outputs and custom domain tables, tenant security context injection, pre-aggregation optimization, cache invalidation on data refresh

What We're Looking For

Required:

5+ years of data engineering experience with production data platforms at scale
Experience with healthcare data — claims formats (837/835/834), clinical data (HL7/FHIR), or revenue cycle data. Deep expertise in at least one; working familiarity with others
Deep experience with a cloud columnar warehouse (Redshift, BigQuery, Snowflake, or Databricks) — schema design, bulk loading, external tables, data sharing, query optimization
Strong PostgreSQL skills — RLS policies, indexing strategies, transactional DDL, connection pooling
Production experience with dbt — model design, incremental materializations, macros, testing, CI/CD integration
Experience with cloud data services: object storage, workflow orchestration, serverless compute, event-driven scheduling, data catalogs, encryption
Proficiency in SQL and Python for ETL scripting and data validation
Experience with schema-per-tenant or row-level-security multi-tenant data architectures
Comfort with AI-assisted development — you use AI coding tools daily and understand how to build data systems that AI agents consume
Familiarity with HIPAA technical safeguards: encryption at rest/in transit, audit logging, PHI handling
You've built data infrastructure in an early-stage environment — comfortable making architectural decisions without a platform team behind you.

Preferred:

Experience building data infrastructure consumed by AI/ML systems or autonomous agents (feature stores, tool-use query patterns, RAG data layers)
Experience with multiple healthcare data source types: claims + EMR + RCM + financial
Familiarity with healthcare data normalization frameworks (Tuva Project or similar)
Experience with ANSI X12 EDI parsing (837I, 837P, 835, 834 transaction sets)
Exposure to CMS public datasets (NPPES, NCCI, HCRIS, MS-DRG tables, Physician Fee Schedule)
Experience with vector databases or extensions (pgvector, Pinecone, or similar) for similarity search
Infrastructure-as-code experience (CDK, Terraform, Pulumi, or CloudFormation)
Experience operating data pipelines under BAA/HIPAA compliance requirements
Founding / early data-engineer experience at a VC-backed healthcare or AI startup (Tempus AI, Lightbeam, Lyra, Virta, Sprinter, Holmusk, Flatiron, Veeva, etc.).
You've built data layers that AI agents query via tool use in production — not theoretical.

Tech Stack

Warehouse: Cloud columnar warehouse (Redshift, BigQuery, Snowflake, or similar) with external table and data sharing support
Operational DB: PostgreSQL (with RLS, vector extension support)
Transformation: dbt, SQL, Python
Orchestration: Workflow orchestration (Step Functions, Airflow, Prefect, or similar), event-driven scheduling, serverless compute
Storage: Object storage (Parquet, JSON Lines), encryption at rest
AI/Agent Layer: LLM platform — the AI agent consumes your data layer via tool use
Infrastructure: Infrastructure-as-code, monitoring, IAM
Formats: ANSI X12 (837I/837P/835/834), HL7 FHIR, CSV, Parquet, JSON Lines
Development: AI-assisted coding tools

Compensation

Competitive salary commensurate with experience, equity, and benefits.

Apply now

See more open positions at Convoy Health