"TMED-1 small dataset: (aka TMED-18-18)" | Tufts Medical Echocardiogram Dataset (TMED)

In order to assess the promise of learning from very few labels, we make available a smaller subset of our full-size TMED-1 dataset.

This small dataset release is a strict subset of the larger version (dated 2021-07-30).

The small dataset contains:

Labeled set for 54 total patients
- All patients have an aortic stenosis diagnostic label
- All images have view label (one of PLAX/PSAX/other, for more see our view label primer)
- We partition these patients into several different "splits" of 18 training / 18 validation / 18 test studies
Unlabeled set of 378 total patients

This dataset is introduced in our MLHC 2021 paper as the TMED-18-18 dataset, because models are trained on data from 18 labeled patients and each heldout set contains data from 18 patients.

More details about this smaller version of the TMED dataset are summarized here:

Dataset	Num. Patients	Num. Images Split 1	Num. Images Split 2	Num. Images Split 3
small labeled train	18	1935	1805	1849
small labeled valid	18	1833	1940	2018
small labeled test	18	1958	1834	1957
small unlabeled	378	41281	41428	41183

Dataset Format

The dataset is delivered as a shared folder on box.com to users who successfully Apply For Access

The top-level directory contains:

labels, stored in comma-separated-value (CSV) plain-text files
images, stored within folders as 64x64 pixel grayscale PNG files

The relevant label CSV files are:

- TMED-18-18_fold0.csv
- TMED-18-18_fold1.csv
- TMED-18-18_fold2.csv

Each CSV file has a row for each image file in the dataset, providing the relevant labels.

For more details on how to interpret and use these files, see the Full Version Dataset Format description