This paper describes an automated system to label zones containing Investigator Names (IN) in biomedical articles, a key item in a MEDLINE® citation. The correct identification of these zones is necessary for the subsequent extraction of IN from these zones. A hierarchical classification model is proposed using two Support Vector Machine (SVM) classifiers. The first classifier is used to identify an IN zone with highest confidence, and the other classifier identifies the remaining IN zones. Eight sets of word lists are collected to train and test the classifiers, each set containing collections of words ranging from 100 to 1,200. Experiments based on a test set of 105 journal articles show a Precision of 0.88, 0.97 Recall, 0.92 F-Measure, and 0.99 Accuracy.
Keywords: Investigator Names, MEDLINE, Support Vector Machine, labeling, text classification, bibliographic information.