Automated Labeling for Biomedical Journals Published in Foreign Languages

Kim J, Le DX, Thoma GR
Proc. 8th World Multiconference on Systemics, Cybernetics and Informatics. 2004 Jul.;:444-9.


An automated labeling (AL) module is developed to produce bibliographic records such as English title, vernacular title, author, affiliation, and English abstract from biomedical articles published in foreign language journals. Optical character recognition (OCR) output from scanned biomedical journals is used in this labeling process. Since frequently occurring words in a zone are important features, word lists are used as key features in the AL module. The AL module uses geometric and contextual features, and geometric relations between zones, as the basis for the rule-based labeling algorithms in the module. The algorithms uses 131 rules derived for foreign language journals. Experiments conducted with several medical journal articles show about 95% accuracy.