Website Clairity

Clairity is seeking a Senior Data Engineer with strong Machine Learning Operations (MLOps) skills to help advance our mission of revolutionizing healthcare. With an initial focus on breast cancer screening, we will build mammography-based machine learning (ML) solutions that accurately predict the risk of cancer, personalize the plan of care, cultivate trust, and save lives. We want to expand our team with someone who shares our appreciation for the rapidly evolving power of big data and machine learning, and who enjoys bringing to production state-of-the-art applications to solve novel real-world healthcare problems.

The successful candidate will have a deep understanding of data architecture, machine learning deployment pipelines, and cloud computing technologies. He or she will be responsible for designing and implementing robust, scalable data and model maintaining systems within an AWS cloud ecosystem that support training and deploying machine learning models in production. The ideal candidate has excellent problem-solving and analytical skills, and is a team player, highly motivated self-starter, detailed-oriented with demonstrated ownership, accountability, and commitment to high quality deliverables.

Founded in 2020 by Santé Ventures and Dr. Connie Lehman, the Head of Breast Imaging at Massachusetts General Hospital, Clairity is in Austin, TX and has raised ~$29 million to date in funding from investors. The location of the position is flexible within the United States, with the ability to work remotely from home.

Primary Responsibilities

  • Design, build, and manage data pipelines to support machine learning workflows.
  • Develop and maintain a robust data architecture to support our growing data needs.
  • Ensure data governance and compliance with relevant standards and regulations.
  • Work with our AWS managed services to ensure optimal performance, cost-effectiveness, and security.
  • Manage the deployment, monitoring, maintenance, and continuous improvement of our machine learning models in production.
  • Design, implement and document standardized ML processes for ML Lifecycle Management, Model Versioning and Iteration, Model Monitoring and Management, Model Governance, Model Security, and Model Discovery.

Requirements (Essential)

  • 5+ years in a data engineering role with a focus on machine learning or data science.
  • 5+ years of AWS managed services.
  • 3+ years of MLOps tools and methodologies, including CI/CD, Continuous Training, Serving and Monitoring pipelines for ML solutions in different environments such as dev, test, staging, pre-production, and production.
  • 3+ years of developing ML batch and online prediction workflows.
  • Experience of implementing ML model monitoring in production.
  • Experience in version control of ML models (code, data, config, model) and model registries.
  • Practical experience of the following technologies:Data warehouse: Spark, Snowflake, Databricks.
    – Relational databases: SQL.
    – ETL: AWS Athena, AWS Glue.
    – AWS Managed services: S3, Redshift, RDS, EMR, Lambda, SageMaker.
    – ML: TensorFlow, SageMaker Pipelines, SageMaker Studio, TensorFlow Serving.
    – Pipeline orchestration: Airflow, MLflow, Dask.
    – Model deployment: GitHub, Docker, Kubernetes.
    – Model monitoring: AWS CloudWatch, Grafana, Prometheus, Elastic.
    – Scripting languages: Python, SQL.

Requirements (Preferred)

  • Experience of developing medical imaging data pipelines.
  • Knowledge of medical imaging standards: DICOM, HL7.
  • Familiarity with ML-specific monitoring tools such as Fiddler, Arize, or WhyLabs is a plus.
  • Experience of progressive delivery such as A/B testing, Canary deployment, and multi-armed bandit.


  • A Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field. A Master’s degree is a plus.
  • AWS Certified Big Data or AWS Certified Machine Learning would be a strong asset.

For More Information, Please Send Resume To:

To apply for this job email your details to