Multipanel Figure Segmentation for Better Retrieval, A Deep Learning Approach
Brown Bag Lecture by Dr. Jie Zou | 4/17/2018 11:00AM – 12PM | 7th Floor Conference Room, Bldg 38A
Visual question answering systems require precise figure image content indexing. For multi-panel figures, it is critical to associate panels to metadata for which figure segmentation, including figure splitting and figure label recognition, is a necessary step. It is a challenging task due to the large variations in panel and label locations, sizes, contrast to background, etc. We implemented deep learning algorithms to solve the problem. Visual features are extracted through convolutional neural network layers and then fed into region proposal networks, which estimate the bounding boxes of label candidates. Frequently, many characters other than labels are superimposed on the figures, and they cause confusing false alarms. It is important to model the figure labels as a sequence. We trained a long short term memory network (LSTM) to capture the label sequence constraints, and found that it effectively removes false alarms. We collected and annotated 10,642 figures. The algorithm is trained on 9,642 figures, and evaluation on the remaining 1,000 figures show that the proposed algorithm achieves better performance compared to our previous method based on HOG descriptor, SVM classifier and beam search.
Dr. Jie Zou joined the Communications Engineering Branch (CEB), Lister Hill National Center for Biomedical Communications in 2005, after receiving his Ph.D. in Computer and Systems Engineering from Rensselaer Polytechnic Institute. His research interests are image processing, computer vision and machine learning.