Project Overview

Can you learn to diagnose the severity of aortic stenosis (AS), a common valve disease, from ultrasound images of the heart (echocardiograms)?

We have released the Tufts Medical Echocardiogram Dataset (TMED), new clinically-motivated benchmark dataset for computer vision and machine learning from limited labeled data. This dataset is designed to be an authentic assessment of semi-supervised learning (SSL) methods that train classifiers from a small, hard-to-acquire labeled dataset and a much larger (but easier to acquire) unlabeled set.

Jump to: Announcements   Clinical Motivation   Dataset Summary   Classification Tasks   Bibliography


Aug. 2021: Paper accepted at MLHC 2021

Our publication is now available as an Early Access PDF and 3-minute spotlight video

July 2021: Initial Release of Dataset

We accept applications from any researcher to use this data for academic purposes. Please Apply for Access [LICENSE]

Clinical motivation

Our motivating task is to improve timely diagnosis and treatment of aortic stenosis (AS), a common degenerative cardiac valve condition. If left untreated, severe AS has lower 5-year survival rates than several metastatic cancers (Howlader et al. (2020); Clark et al. (2012)). With timely diagnosis, AS becomes a treatable condition via surgical or transcatheter aortic valve replacement with very low mortality (Lancellotti et al. (2018)).

AS is a particularly important condition where automation holds substantial promise. There is evidence that many patients with severe AS are not treated (Tang et al. (2018); Brennan et al. (2019)) and there are disparities in access to care that must be addressed (Alkhouli et al. (2019)). Automated screening for AS can increase referral and treatment rates for patients with this life threatening condition.

We hope this dataset release catalyzes research in two directions:

  • Deployable automatic preliminary screening and early detection of cardiac disease, especially expanding access to patients who live in areas without expert cardiologists (but where ultrasound imaging would still be possible)

  • Improved ML methodology for learning from limited labeled data. For our use case and many others, acquiring appropriate labels from expert clinicians is expensive and time consuming. Our dataset deliberately supports semi-supervised methods that can learn simultaneously from a small labeled dataset and a large unlabeled dataset (much easier to collect).

Dataset Summary

The TMED dataset contains transthoracic echocardiogram (TTE) imagery acquired in the course of routine care consistent with American Society of Echocardiography (ASE) guidelines, all obtained from 2015-2020 at Tufts Medical Center.

When gathering echocardiogram imagery for a patient, a sonographer manipulates a handheld transducer over the patient’s chest, manually choosing different acquisition angles in order to fully assess the heart’s complex anatomy. This imaging process results in multiple cineloop video clips of the heart depicting various anatomical views (see example view types below). We extract one still image from each available video clip, so each patient study is represented in our dataset as multiple images (typically 50-100).

In routine care when images are captured, neither view nor diagnostic labels are immediately captured and stored. View labels are not annotated or stored as part of routine practice. Diagnostic labels for aortic stenosis (AS) - along with many other observations about many aspects of heart health – are applied hours or days after a study by an expert clinician, who aggregates information from the many videos and images captured by the echocardiogram study. Diagnostic severity ratings are entered some time later into a human readable report document stored within that patient’s electronic medical record. Due to logistical reasons it is difficult to easily extract that information into a machine readable format.

For this study, we have performed significant annotation effort to gather appropriate view labels for a subset of data, as well as significant manual effort to extract diagnostic labels from existing medical records.

Below, we summarize the 3 kinds of annotated data we have collected and released:

Fully-labeled dataset: Complete studies from 260 patients

In this set, we include:

  • All images collected during echocardiography (~100 images/patient-study)

  • Each patient has an associated AS severity level (3 possible levels: none/mild-to-moderate/severe). This categorical label has been assigned by a board-certified cardiologist. These labels were pulled from the patient’s medical record in a manually intensive process.

  • Each image is annotated with 1 of 3 possible view types, provided by experts specifically for this study using a custom labeling tool. These view types are a subset of the many possible view types, chosen because they are relevant for diagnosing valve diseases like AS.

Partially-labeled dataset: Studies with diagnosis labels from 172 patients

In this set, we include:

  • All images collected during echocardiography (~100 images/patient-study)

  • Each patient has an associated AS severity level, as described above.

  • However, no view labels are available for any individual image.

(Views were much harder for us to acquire than diagnostic labels, hence the existence of this partially-labeled set.)

Unlabeled dataset: Studies without any annotations from 2341 patients

In this set, we make available only the images collected via echocardiography from 2341 patients. Neither view nor diagnostic labels are available for these studies, representing what is directly available in the electronic medical record immediately after imaging a patient.

We stress that this unlabeled set is uncurated, as explained in our manuscript, which makes it a much more authentic test of the use of unlabeled examples to improve classification than alternative datasets like SVHN or CIFAR-10, where often an “unlabeled” set is created by “forgetting” known labels.

Classification Tasks

Our dataset supports two clinically-meaningful tasks: view classification and severity diagnosis classification.

Task 1: Classify the view of an image

In echocardiography, many canonical view types are possible, each displaying distinct aspects of the heart’s complex anatomy.

As part of routine clinical care, when images are taken the sonographer is intentionally capturing a specific view, but the annotation of the view type is not applied to the image or recorded in the electronic record. Thus, from raw data alone (remember, each study contains 100s of images) it is difficult to focus on a specific anatomical view of interest.

For our goal of supporting diagnosis of aortic stenosis, two kinds of views are particulary relevant: parasternal long axis (PLAX) and parasternal short axis (PSAX). Both PLAX and PSAX views are used in the routine clinical assessment of aortic valve disease, because the aortic valve’s structure and function is visible.

We have thus obtained 3 possible labels for each image in our dataset: PLAX, PSAX, or Other (a super-category that contains many possible other views not PLAX or PSAX).

We show several high-resolution examples of each of the 3 possible view types below.

PLAX example 1/2

PSAX example 1/2

Other example 1/2

PLAX example 1/2

PSAX example 2/2

Other example 2/2

Task 2: Classify the diagnostic severity level of a patient

Our ultimate goal is automated preliminary screening of aortic stenosis (AS), which would improve early detection of this life-threatening disease.

Toward this goal, our diagnosis task requires aggregating predictions across many images of the same patient’s heart (using ~100 images) to make a coherent prediction for that individual.

This tasks mimics how cardiologists make real AS diagnoses in practice: they have access to ~100 images captured by the sonographer, each of varying signal quality as well as representing different view types. The cardiologist needs to identify which images are relevant (show relevant anatomical views with appropriate quality) and then look for key signs of disease in these relevant images to determine the appropriate severity diagnosis (none, mild/moderate, or severe).

For each patient annotated in our dataset, the expert-assigned diagnostic label indicates one of 3 possible severity levels:

  • no AS
  • mild/moderate AS
  • severe AS

In the future, we intend to look at finer-grained severity levels.


Alkhouli, M., Holmes, D. R., Carroll, J. D., Li, Z., Inohara, T., Kosinski, A. S., Szerlip, M., Thourani, V. H., Mack, M. J.others. (2019). Racial Disparities in the Utilization and Outcomes of TAVR: TVT Registry Report. JACC: Cardiovascular Interventions, 12(10), 936–948.
Brennan, M. J., Bryant, A., Boero, I., Wang, T., Vemulapalli, S., Thourani, V., & Peterson, E. (2019). Provider-level variability in the treatment of patients with severe symptomatic aortic valve stenosis. Journal of the American College of Cardiology, 73(9 (Supplement 1)), 1949.
Clark, M. A., Arnold, S. V., Duhay, F. G., Thompson, A. K., Keyes, M. J., Svensson, L. G., Bonow, R. O., Stockwell, B. T., Cohen, D. J.others. (2012). Five-year clinical and economic outcomes among patients with medically managed severe aortic stenosis: Results from a Medicare claims analysis. Circulation. Cardiovascular Quality and Outcomes, 5(5), 697–704.
Howlader, N., Noone, A. M., Krapcho, M., Miller, D., Brest, A., Yu, M., Ruhl, J., Tatalovich, Z., Mariotto, A.others. (2020). SEER Cancer Statistics Review, 1975-2017. National Cancer Institute.
Lancellotti, P., Magne, J., Dulgheru, R., Clavel, M.-A., Donal, E., Vannan, M. A., Chambers, J., Rosenhek, R., Habib, G.others. (2018). Outcomes of Patients With Asymptomatic Aortic Stenosis Followed Up in Heart Valve Clinics. JAMA Cardiology, 3(11), 1060–1068.
Tang, L., Gössl, M., Ahmed, A., Garberich, R., Bradley, S. M., Niikura, H., Witt, D., Pedersen, W. R., Bae, R.others. (2018). Contemporary Reasons and Clinical Outcomes for Patients With Severe, Symptomatic Aortic Stenosis Not Undergoing Aortic Valve Replacement. Circulation: Cardiovascular Interventions, 11(12).