In order to assess the promise of learning from very few labels, we make available a smaller subset of our full-size TMED-1 dataset.
This small dataset release is a strict subset of the larger version (dated 2021-07-30).
The small dataset contains:
- Labeled set for 54 total patients
-
- All patients have an aortic stenosis diagnostic label
-
- All images have view label (one of PLAX/PSAX/other, for more see our view label primer)
-
- We partition these patients into several different "splits" of 18 training / 18 validation / 18 test studies
-
Unlabeled set of 378 total patients
This dataset is introduced in our MLHC 2021 paper as the TMED-18-18 dataset, because models are trained on data from 18 labeled patients and each heldout set contains data from 18 patients.
More details about this smaller version of the TMED dataset are summarized here:
Dataset | Num. Patients | Num. Images Split 1 |
Num. Images Split 2 |
Num. Images Split 3 |
---|---|---|---|---|
small labeled train | 18 | 1935 | 1805 | 1849 |
small labeled valid | 18 | 1833 | 1940 | 2018 |
small labeled test | 18 | 1958 | 1834 | 1957 |
small unlabeled | 378 | 41281 | 41428 | 41183 |
Dataset Format
The dataset is delivered as a shared folder on box.com to users who successfully Apply For Access
The top-level directory contains:
- labels, stored in comma-separated-value (CSV) plain-text files
- images, stored within folders as 64x64 pixel grayscale PNG files
The relevant label CSV files are:
- TMED-18-18_fold0.csv
- TMED-18-18_fold1.csv
- TMED-18-18_fold2.csv
Each CSV file has a row for each image file in the dataset, providing the relevant labels.
For more details on how to interpret and use these files, see the Full Version Dataset Format description