Learning Semantic Image-Text Embeddings in the Radiology Context

Sonit Singh1, Kevin Ho-Shon2, Sarvnaz Karimi3, Len Hamey4

1 Department of Computing, Balaclava Road, NSW, 2109, sonit.singh@hdr.mq.edu.au

2 Macquarie University Hospital, 3 Technology Place, Macquarie University, NSW, 2109, kevin.ho-shon@mq.edu.au

3 Data61, CSIRO, Corner Vimiera & Pembroke Roads, Marsfield NSW, 2122, sarvnaz.karimi@data61.csiro.au

4 Department of Computing, Balaclava Road, NSW, 2109, len.hamey@mq.edu.au


Radiologists routinely interpret medical images and describe findings in the form of radiology reports. Inspired by the fact that radiologists often follow templates for writing reports and modify them according to each individual case, we propose a cross-modal retrieval model that aligns visual data (medical images) and textual data (radiology reports) in a shared representation space, allowing retrieval of relevant items that are of different nature with respect to the query format. The model architecture consists of deep neural network where mage features are extracted using off-the-shelf Convolutional Neural Network pre-trained on ImageNet and text features are extracted using multi-scale sentence vectorization methods such as Bag-of-Words (BoW) and Word2Vec. To check the effectiveness of the proposed model, two datasets in the radiology domain, namely, University Chest X-ray collection (IU-CXR) and Radiology Objects in COntext (ROCO) are used. Both datasets consist of medical images and their corresponding captions allowing retrieval of medical image given a text query and vice-versa. For performance evaluation, we report rank-based performance metric Recall@k (where k = 1, 5, 10, 50) which computes the percentage of test images/reports for which at least one correct result is found among the top-k retrieved reports/images. The proposed model not only make radiologists more efficient by retrieving similar cases existing in the Picture Archiving and Communication Systems (PACS), but also allow multi-modal query composition to retrieve medical images or radiology reports as per their specific need.