TY - JOUR
T1 - REAL-Colon
T2 - A dataset for developing real-world AI applications in colonoscopy
AU - Biffi, Carlo
AU - Antonelli, Giulio
AU - Bernhofer, Sebastian
AU - Hassan, Cesare
AU - Hirata, Daizen
AU - Iwatate, Mineo
AU - Maieron, Andreas
AU - Salvagnini, Pietro
AU - Cherubini, Andrea
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/5/25
Y1 - 2024/5/25
N2 - Detection and diagnosis of colon polyps are key to preventing colorectal cancer. Recent evidence suggests that AI-based computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems can enhance endoscopists' performance and boost colonoscopy effectiveness. However, most available public datasets primarily consist of still images or video clips, often at a down-sampled resolution, and do not accurately represent real-world colonoscopy procedures. We introduce the REAL-Colon (Real-world multi-center Endoscopy Annotated video Library) dataset: a compilation of 2.7 M native video frames from sixty full-resolution, real-world colonoscopy recordings across multiple centers. The dataset contains 350k bounding-box annotations, each created under the supervision of expert gastroenterologists. Comprehensive patient clinical data, colonoscopy acquisition information, and polyp histopathological information are also included in each video. With its unprecedented size, quality, and heterogeneity, the REAL-Colon dataset is a unique resource for researchers and developers aiming to advance AI research in colonoscopy. Its openness and transparency facilitate rigorous and reproducible research, fostering the development and benchmarking of more accurate and reliable colonoscopy-related algorithms and models.
AB - Detection and diagnosis of colon polyps are key to preventing colorectal cancer. Recent evidence suggests that AI-based computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems can enhance endoscopists' performance and boost colonoscopy effectiveness. However, most available public datasets primarily consist of still images or video clips, often at a down-sampled resolution, and do not accurately represent real-world colonoscopy procedures. We introduce the REAL-Colon (Real-world multi-center Endoscopy Annotated video Library) dataset: a compilation of 2.7 M native video frames from sixty full-resolution, real-world colonoscopy recordings across multiple centers. The dataset contains 350k bounding-box annotations, each created under the supervision of expert gastroenterologists. Comprehensive patient clinical data, colonoscopy acquisition information, and polyp histopathological information are also included in each video. With its unprecedented size, quality, and heterogeneity, the REAL-Colon dataset is a unique resource for researchers and developers aiming to advance AI research in colonoscopy. Its openness and transparency facilitate rigorous and reproducible research, fostering the development and benchmarking of more accurate and reliable colonoscopy-related algorithms and models.
KW - Colonoscopy/methods
KW - Humans
KW - Colonic Polyps/diagnosis
KW - Diagnosis, Computer-Assisted
KW - Artificial Intelligence
KW - Video Recording
KW - Colorectal Neoplasms/diagnosis
UR - http://www.scopus.com/inward/record.url?scp=85194361176&partnerID=8YFLogxK
U2 - 10.1038/s41597-024-03359-0
DO - 10.1038/s41597-024-03359-0
M3 - Journal article
C2 - 38796533
SN - 2052-4463
VL - 11
SP - 539
JO - Scientific data
JF - Scientific data
IS - 1
M1 - 539
ER -