Our complete TMED-1 dataset (dated 2021-07-30) contains:
- Fully labeled set of studies from 260 total unique patients
- All patients have an aortic stenosis (AS) diagnostic label (one of none, mild/moderate, or severe; for more see our severity diagnosis label primer)
- All images have view label (one of PLAX/PSAX/other, for more see our view label primer)
- We partition these patients into several different “splits” of 156 training / 52 validation / 52 test studies
- Partially labeled set of studies from 172 total unique patients
- These studies have AS diagnosis labels, but no view labels
- Unlabeled set of studies from 2341 total unique patients
- No labels are available for any studies in this dataset
This original dataset is referred to in our MLHC ’21 paper as the TMED-156-52 dataset, because models are trained on data from 156 labeled patients and each heldout set contains data from 52 patients.
We also study a smaller version of this dataset: TMED-1 small (aka TMED-18-18).
Jump to: Summary Table Image preprocessing Dataset Format Example Code Differences from MLHC’21 paper
Summary Table
Summary statistics of TMED-1 data, aka TMED-156-52
| Dataset | Num. Patients | Num. Images |
|---|---|---|
| fully labeled set |
260 156 train / 52 valid. / 52 test |
27788 |
| partially labeled set | 172 | 19219 |
| unlabeled set | 2341 | 271474 |
Differences from Reported Experiments in MLHC ’21 Paper
Note: TMED-1 described here differs slightly from the datasets used in our MLHC ’21 paper
To improve quality, we removed several studies that were originally included in the partially labeled and unlabeled sets in our MLHC 2021 manuscript. We decided to remove these studies because despite our early best efforts, they were found to be from the same patient (but not necessarily the same exact imaging study) as some data in our labeled set. In TMED-1, all studies are guaranteed to be from distinct patients, which should simplify analysis and minimize confusion.
Brief summary of the changes:
- No changes were made to the paper’s labeled set
- 130 studies were removed from the paper’s unlabeled set
- 2 studies were removed from the paper’s partially-labeled set (treated as unlabeled in all SSL experiments)
Image preprocessing
Every image in this dataset is a TTE image stored at 64x64 pixel resolution in PNG format.
For each included study, we included all available images after filtering by aspect ratio to discard non-2D images (see App. C of our MLHC paper for details). From each cineloop file, we chose exactly one image to analyze. Clinical collaborators suggested that any single frame could be used, so we took the first frame of each cineloop. The resulting data contains both color images and gray scale images with various resolutions. We converted each image to gray-scale, pad along its shorter axis to achieve a square aspect ratio, and resize to 64x64 pixels.