Advanced clustering technique for medical data using semantic information

Kwangcheol Shin, Sang Yong Han, Alexander Gelbukh

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

MEDLINE is a representative collection of medical documents supplied with original full-text natural-language abstracts as well as with representative keywords (called MeSH-terms) manually selected by the expert annotators from a pre-defined ontology and structured according to their relation to the document. We show how the structured manually assigned semantic descriptions can be combined with the original full-text abstracts to improve quality of clustering the documents into a small number of clusters. As a baseline, we compare our results with clustering using only abstracts or only MeSH-terms. Our experiments show 36% to 47% higher cluster coherence, as well as more refined keywords for the produced clusters.

Original languageEnglish
Pages (from-to)322-331
Number of pages10
JournalLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume2972
DOIs
StatePublished - 2004
EventThird Mexican International Conferenceon Artificial Intelligence - Mexico City, Mexico
Duration: 26 Apr 200430 Apr 2004

Fingerprint

Dive into the research topics of 'Advanced clustering technique for medical data using semantic information'. Together they form a unique fingerprint.

Cite this