"TMED-1 small dataset: (aka TMED-18-18)"


In order to assess the promise of learning from very few labels, we make available a smaller subset of our full-size TMED-1 dataset.

This small dataset release is a strict subset of the larger version (dated 2021-07-30).

The small dataset contains:

  • Labeled set for 54 total patients
    • All patients have an aortic stenosis diagnostic label
    • All images have view label (one of PLAX/PSAX/other, for more see our view label primer)
    • We partition these patients into several different "splits" of 18 training / 18 validation / 18 test studies
  • Unlabeled set of 378 total patients

This dataset is introduced in our MLHC 2021 paper as the TMED-18-18 dataset, because models are trained on data from 18 labeled patients and each heldout set contains data from 18 patients.

More details about this smaller version of the TMED dataset are summarized here:

Dataset Num. Patients Num. Images
Split 1
Num. Images
Split 2
Num. Images
Split 3
small labeled train 18 1935 1805 1849
small labeled valid 18 1833 1940 2018
small labeled test 18 1958 1834 1957
small unlabeled 378 41281 41428 41183

Dataset Format

The dataset is delivered as a shared folder on box.com to users who successfully Apply For Access

The top-level directory contains:

  • labels, stored in comma-separated-value (CSV) plain-text files
  • images, stored within folders as 64x64 pixel grayscale PNG files

The relevant label CSV files are:

- TMED-18-18_fold0.csv
- TMED-18-18_fold1.csv
- TMED-18-18_fold2.csv

Each CSV file has a row for each image file in the dataset, providing the relevant labels.

For more details on how to interpret and use these files, see the Full Version Dataset Format description