Our complete TMED-2 dataset release (dated 2022-07-12) contains three components:
-
view_and_diagnosis_labeled_set
: 599 studies from 577 unique patients (some patients have multiple studies on distinct days).- All patients have an aortic stenosis (AS) diagnostic label (none, early AS, or significant AS; for more see our severity diagnosis label primer)
- Some images from each study have view label annotations (one of PLAX/PSAX/A2C/A4C/other, for more see our view label primer)
- We partition these by patient into different "splits" of 360 training / 119 validation / 120 test studies.
-
view_labeled_set
: 705 studies from 703 unique patients- These studies have view labels, but no AS diagnosis labels
-
unlabeled_set
: 5486 studies from 5287 patients- No labels are available for any studies in this set
This TMED-2 dataset is referred to in some of our manuscripts as the DEV479 dataset, because models are trained on development set of 479 studies (360 for train and 119 for validation). The heldout test set contains data 120 studies.
Jump to: Summary Table Image preprocessing Dataset Format Example Code
Summary Table
Summary statistics of our released TMED-2 dataset
Dataset | Num. Patients | Num. Studies | Num. Labeled Images | Num. Unlabeled Images |
---|---|---|---|---|
fully labeled set | 577 | 599 | 17270 | 26596 |
partially labeled set | 703 | 705 | 7694 | 37576 |
unlabeled set | 5287 | 5486 | 0 | 353500 |
Image preprocessing
Every image in this dataset is a 2D TTE image stored at 112x112 pixel resolution in PNG format.
In TMED-2, we used metadata available in the raw DICOM files to ensure only the 2D TTE images from each study are included (filtering out doppler images, m-mode images, and colorflow images). Note that this is more aggressive preprocessing than in TMED-1 (where we did some filtering by aspect ratio, but this may have not discarded all doppler images, m-mode images, or colorflow images).
Dataset Format
The dataset is delivered as a shared folder on box.com to users who successfully Apply For Access.
The top-level directory contains:
- labels, stored in comma-separated-value (CSV) plain-text files
- images, stored within folders as 112x112 pixel grayscale PNG files
Labels and other metadata
Labels and assignments of the labeled set to different train/validation/test splits are stored in the following CSV files in the top-level directory.
- labels_per_image.csv
- TMED2_fold0_labeledpart.csv
- TMED2_fold1_labeledpart.csv
- TMED2_fold2_labeledpart.csv
- TMED2_fold0_unlabeledpart.csv
- TMED2_fold1_unlabeledpart.csv
- TMED2_fold2_unlabeledpart.csv
- TMED2_train_unlabeled.csv
Each CSV file has a row for each image file in the dataset, providing the relevant labels.
The specs for each type of CSV file are below:
Spec for labels_per_image.csv
CSV file with one row per image. Columns include:
- - query_key
- Filename of specific image, as a string. Example: "2977s1_0.png". See below for explanation of PatientIDStudyID_ImageID.png naming convention.
- - view_label
- View label, as a string. Options: {"PLAX", "PSAX", "A4C", "A2C", "A4CorA2CorOther"}
- - diagnosis_label
- Diagnostic severity label, as a string. Options: {"no_as", "mild_as", "mildtomoderate_AS", "moderate_AS", "severe_AS", "Not_Provided"}
Spec for TMED2_foldX_labeledpart.csv
CSV file with one row per image. Integer X denotes the specific train/valid/test split, and could take values in {0, 1, 2} (these splits correspond exactly to the 3 splits used in our paper's experiments). Columns include the following:
- - query_key
- Filename of specific image, as a string. Example: "2977s1_0.png". See below for explanation of PatientIDStudyID_ImageID.png naming convention.
- - view_classifier_split
- String that indicates which standard data split for image-level view classifier this image belongs to within fold X. Options: {"train", "val", "test"}.
- - diagnosis_classifier_split
- String that indicates which standard data split for image-level diagnosis classifier this image belongs to within fold X. Options: {"train", "val", "test", "not_used"}.
- - view_label
- View label, as defined above.
- - diagnosis_label
- Diagnostic severity label, as defined above.
- - SourceFolder
- Folder that the image is located
Spec for TMED2_foldX_unlabeledpart.csv
CSV file with one row per image. Integer X denotes the specific train/valid/test split, and could take values in {0, 1, 2} (these splits correspond exactly to the 3 splits used in our paper's experiments). Columns include the following:
- - query_key
- Filename of specific image, as a string. Example: "2977s1_0.png". See below for explanation of PatientIDStudyID_ImageID.png naming convention.
- - SourceFolder
- Folder that the image is located
Images
Images are stored within a hierarchy of folders representing the sets that comprise TMED-2:
view_and_diagnosis_labeled_set/labeled/
view_and_diagnosis_labeled_set/unlabeled/
view_labeled_set/labeled/
view_labeled_set/unlabeled/
unlabeled_set/unlabeled/
Each set's labeled/
subfolder contains only images with view labels.
Each set's unlabeled/
subfolder contains only images without any view labels.
The individual image files are stored as 112x112 pixel grayscale PNG files within the appropriate folder.
For example, the fully-labeled set looks like:
- view_and_diagnosis_labeled_set/labeled/2977s1_0.png
- view_and_diagnosis_labeled_set/labeled/2977s1_1.png
- view_and_diagnosis_labeled_set/labeled/2977s1_2.png
...
- view_and_diagnosis_labeled_set/labeled/2977s1_19.png
- view_and_diagnosis_labeled_set/labeled/1907s2_0.png
- view_and_diagnosis_labeled_set/labeled/1907s2_1.png
- view_and_diagnosis_labeled_set/labeled/1907s2_2.png
...
- view_and_diagnosis_labeled_set/labeled/1907s2_24.png
...
The naming convention of these files is [PatientID]s[StudyID]_[ImageID].png
.
- PatientIDs are unique random identifiers (consistent across the whole dataset)
- Each StudyID (counting up from 1) indicates one session of echocardiogram imagery captured on one day.
- Each ImageID (counting up from 0) distinguishes each image within a subset. ImageID is not unique across even the
labeled/
andunlabeled/
image sets, so please use the full path if you need a unique identifier.
Example code
See the TMED-2 Data Loading and Visualization Demo that loads the data, visualizes it and displays the corresponding labels.