NLM at ImageCLEF 2018 Visual Question Answering in the Medical Domain.

Ben Abacha A, Gayen S, Lau JJ, Rajaraman S, Demner-Fushman D
CLEF2018 Working Notes. CEUR Workshop Proceedings, Avignon, France, (September 10-14 2018).


This paper describes the participation of the U.S. National Library of Medicine (NLM) in the Visual Question Answering task (VQAMed) of ImageCLEF 2018. We studied deep learning networks with state-of-the-art performance in open-domain VQA. We selected Stacked Attention Network (SAN) and Multimodal Compact Bilinear pooling (MCB) for our official runs. SAN performed better on VQA-Med test data, achieving the second best WBSS score of 0.174 and the third best BLEU score of 0.121. We discuss the current limitations and future improvements to VQA in the medical domain. We analyze the use of automatically generated questions and images selected from the literature based on ImageCLEF data. We describe four areas of improvements dedicated to medical VQA: (i) designing goal-oriented VQA systems and datasets (e.g. clinical decision support, education), (ii) generating and categorizing medical/clinical questions, (iii) selecting (clinically) relevant images, and (iv) capturing the context and the medical knowledge.