Exploring Access to Scientific Literature Using Content-Based Image Retrieval

Deserno TM, Antani SK, Long LR
Proc SPIE Medical Imaging 2007. Vol. 6516: 65160L-1-8.


The number of articles published in the scientific medical literature is continuously increasing, and Web access to the journals is becoming common. Databases such as SPIE Digital Library, IEEE Xplore, indices such as PubMed, and search engines such as Google provide the user with sophisticated full-text search capabilities. However, information in images and graphs within these articles is entirely disregarded. In this paper, we quantify the potential impact of using content-based image retrieval (CBIR) to access this non-text data. Based on the Journal Citations Report (JCR), the journal Radiology was selected for this study. In 2005, 734 articles were published electronically in this journal. This included 2,587 figures, which yields a rate of 3.52 figures per article. Furthermore, 56.4% of these figures are composed of several individual panels, i.e. the figure combines different images and/or graphs. According to the Image Cross-Language Evaluation Forum (ImageCLEF), the error rate of automatic identification of medical images is about 15%. Therefore, it is expected that, by applying ImageCLEF-like techniques, already 95.5% of articles could be retrieved by means of CBIR. The challenge for CBIR in scientific literature, however, is the use of local texture properties to analyze individual image panels in composite illustrations. Using local features for content-based image representation, 8.81 images per article are available, and the predicted correctness rate may increase to 98.3%. From this study, we conclude that CBIR may have a high impact in medical literature research and suggest that additional research in this area is warranted.