Application of Deep Learning Models to Improve Ulcerative Colitis Endoscopic Disease Activity Scoring Under Multiple Scoring Systems.
Byrne MF., Panaccione R., East JE., Iacucci M., Parsa N., Kalapala R., Reddy DN., Rughwani HR., Singh AP., Berry SK., Monsurate R., Soudan F., Laage G., Cremonese ED., St-Denis L., Lemaître P., Nikfal S., Asselin J., Henkel ML., Travis SP.
BACKGROUND & AIMS: Lack of clinical validation and inter-observer variability are two limitations of endoscopic assessment and scoring of disease severity in patients with Ulcerative Colitis. We developed a deep learning (DL) model to improve, accelerate and automate UC detection, and predict the Mayo Endoscopic Subscore (MES) and the Ulcerative Colitis Endoscopic Index of Severity (UCEIS). METHODS: A total of 134 prospective videos (1,550,030 frames) were collected and those with poor quality were excluded. The frames were labeled by experts based on MES and UCEIS scores. The scored frames were used to create a preprocessing pipeline and train multiple convolutional neural networks (CNNs) with proprietary algorithms in order to filter, detect and assess all frames. These frames served as the input for the DL model, with the output being continuous scores for MES and UCEIS (and its components). A graphical user interface was developed to support both labeling video sections and displaying the predicted disease severity assessment by the AI from endoscopic recordings. RESULTS: Mean absolute error (MAE) and mean bias were used to evaluate the distance of continuous model's predictions from ground truth and its possible tendency to over/under-predict were excellent for MES and UCEIS. The quadratic weighted kappa used to compare the inter-rater agreement between experts' labels and the model's predictions showed strong agreement (0.87, 0.88 frame-level, 0.88, 0.90 section-level and 0.90, 0.78 at video-level, for MES and UCEIS, respectively). CONCLUSIONS: We present the first fully automated tool that improves the accuracy of the MES and UCEIS, reduces the time between video collection and review, and improves subsequent quality assurance and scoring.