Lecture: Combining SVM Classifiers to Identify Investigator Name Zones in Biomedical Articles by Dr. Jongwoo Kim on 5/15/2012

Combining SVM Classifiers to Identify Investigator Name Zones in Biomedical Articles

Brown Bag Lecture by Dr. Jongwoo Kim | 5/15/2012 11AM-12PM | 7th Floor Conference Room, Bldg 38A

Abstract: This presentation talks about an automated system to label zones containing Investigator Names (IN) in biomedical articles, a key item in a MEDLINE® citation.
In a typical biomedical article, the author zone is located close to the article title, and contains names of the authors. However, due to the increasingly collaborative nature of biomedical research, many investigators from various groups/organizations may collaborate in conducting the research. These group/organization names might also appear in the author or title zones, and the investigators affiliated with them would have their names listed somewhere else within the article, most likely toward the end, close to the references. In articles that list investigators, the number of investigator names average about 40, but may number several hundred.
The manual process to enter this data is time-consuming and error-prone. Therefore, we proposed a hierarchical classification model using two Support Vector Machine (SVM) classifiers for automatic extraction of IN zones since the correct identification of these zones is necessary for the subsequent extraction of IN from these zones. The first classifier is used to identify an IN zone with highest confidence, and the other classifier identifies the remaining IN zones. Eight sets of word lists are collected to train and test the classifiers, each set containing collections of words ranging from 100 to 1,200.
Experiments based on a test set of 105 journal articles show a Precision of 0.88, 0.97 Recall, 0.92 F-Measure, and 0.99 Accuracy.

Bio: Dr. Jongwoo Kim received the Ph.D. degree in the Department of Computer Engineering and Computer Science from the University of Missouri at Columbia, Missouri. Since 1998, he has worked for MARS II and PDRS projects in NLM and has developed several labeling modules to extract bibliographic information from hard-copy and online biomedical journal articles. His research interests include document processing, pattern recognition, image processing, fuzzy theory, neural networks, and computer vision.

Lecture: Place-Based Information Systems: Textual Location Identification and Visualization by Prof. Hanan Samet on 5/22/2012
Lecture: Structured Abstracts in MEDLINE: Retrospective Cohort Study and Recent Update by Anna Ripple on 5/1/2012