A Semi-supervised Learning Method to Classify Grant Support Zone in Web-based Medical Articles

Zhang X, Zou J, Le DX, Thoma GR
Proc SPIE Electronic Imaging Science and Technology, Document Recognition and Retrieval. January 2009;7247:7247 OW(1-8)


Traditional classifiers are trained from labeled data only. Labeled samples are often expensive to obtain, while unlabeled data are abundant. Semi-supervised learning can therefore be of great value by using both labeled and unlabeled data for training. We introduce a semi-supervised learning method named decision-directed approximation combined with Support Vector Machines to detect zones containing information on grant support (a type of bibliographic data) from online medical journal articles. We analyzed the performance of our model using different sizes of unlabeled samples, and demonstrated that our proposed rules are effective to boost classification accuracy. The experimental results show that the decision-directed approximation method with SVM improves the classification accuracy when a small amount of labeled data is used in conjunction with unlabeled data to train the SVM.