Lecture: Identifying “Comment-on” Citation Data in Online Biomedical Articles Using SVM-based Text Summarization Technique by Dr. In Cheol Kim on 6/5/2012

Identifying “Comment-on” Citation Data in Online Biomedical Articles Using SVM-based Text Summarization Technique

Brown Bag Lecture by Dr. In Cheol Kim | 6/5/2012 11AM-12PM | 7th Floor Conference Room, Bldg 38A

Abstract: Comment-on (CON), a MEDLINE® citation field, indicates previously published articles commented on by authors expressing possibly complimentary or contradictory opinions. This paper presents an automated method using a support vector machine (SVM)-based text summarization technique that identifies CON data by distinguishing CON sentences from “citation sentences” and analyzes their corresponding bibliographic data in the references. We compare the performance of two types of SVM, one with a linear kernel function and the other with a radial basis kernel function (RBF). Input feature vectors for these SVMs are created by combining five feature types: 1) word statistics, 2) frequency of occurrence of author names, 3) sentence positions, 4) similarity between titles, and 5) difference of publication years. Experiments conducted on a set of online biomedical articles show that the SVM with a RBF is more reliable in terms of precision, recall, and F-measure rates than the SVM with a linear kernel function for identifying CON.

Bio: Dr. In Cheol Kim is a Senior System Analyst at Medical Science & Computing, Inc. and has been working at the Lister Hill National Center for Biomedical Communications (LHNCBC), U.S. National Library of Medicine, Bethesda, Maryland since 2004. He has a Ph.D. degree in Information Processing Engineering from the Kyungpook National University, South Korea (2001). His previous experience includes two years as a postdoctoral researcher at the Concordia University, Montreal, Canada. In addition, he worked as a senior system engineer for more than five years in an industrial research lab. Dr. In Cheol Kim’s research interests are Web-based document analysis and processing, pattern recognition and classification, text data mining, neural networks, and statistical learning methods.

Lecture: The NLM Teaching Tool after Two Years of Operation by Rodney Long on 6/12/2012
Lecture: Handwriting Recognition - Still Some Way to go for Computers by Dr. Szilárd Vajda on 5/29/2012