Machine Learning with Selective Word Statistics for Automated Classification of Citation Subjectivity in Online Biomedical Articles.

Kim I, Thoma GR
Proc. Int’l Conf. Artificial Intelligence (ICAI’17), pp. 201-207, Las Vegas, July 2017.

There is growing interest in automatically classifying author’s sentiment expressed within citation sentences in scientific literature to provide effective tools for researchers who are seeking relevant previous work or approaches for a certain research purpose. We propose an automated method of determining whether a given citation sentence contains an author’s subjective opinion (positive or negative) or objective factual information, as the first step to analyze and identify the citing author’s sentiments toward the cited external sources. Our method uses a support vector machine (SVM)-based text categorization technique to identify the subjective citations specifically toward Comment-on (CON) articles. CON, a MEDLINE® citation field, indicates previously published articles commented on by authors of a given article expressing possibly complimentary or contradictory opinions. We introduce a bag of unigrams based on selective word statistics, which is derived from a text region of interest within a sentence containing a description of author’s reason of citation and lexical linguistic cues to build an input feature vector for the SVM classifier. Experiments conducted on a set of CON sentences collected from 414 different online biomedical journal titles show that the SVM classifier yields a comparable result for the proposed a bag of unigrams input feature selectively extracted from a text of interest, compared to another bag of unigrams from the entire sentence.Moreover, we achieve a significant performance boost of the SVM with an input feature vector combining two types of statistical bag of unigrams and sentiment word lexicon.