TY - GEN
T1 - Case-sensitivity of classifiers for WSD
T2 - 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007
AU - Saarikoski, Harri M.T.
AU - Legrand, Steve
AU - Gelbukh, Alexander
PY - 2007
Y1 - 2007
N2 - We present a novel method for improving disambiguation accuracy by building an optimal ensemble (OE) of systems where we predict the best available system for target word using a priori case factors (e.g. amount of training per sense). We report promising results of a series of best-system prediction tests (best prediction accuracy is 0.92) and show that complex/simple systems disambiguate tough/easy words better. The method provides the following benefits: (1) higher disambiguation accuracy for virtually any base systems (current best OE yields close to 2% accuracy gain over Senseval-3 state of the art) and (2) economical way of building more effective ensembles of all types (e.g. optimal, weighted voting and cross-validation based). The method is also highly scalable in that it utilizes readily available factors available for any ambiguous word in any language for estimating word difficulty and defines classifier complexity using known properties only.
AB - We present a novel method for improving disambiguation accuracy by building an optimal ensemble (OE) of systems where we predict the best available system for target word using a priori case factors (e.g. amount of training per sense). We report promising results of a series of best-system prediction tests (best prediction accuracy is 0.92) and show that complex/simple systems disambiguate tough/easy words better. The method provides the following benefits: (1) higher disambiguation accuracy for virtually any base systems (current best OE yields close to 2% accuracy gain over Senseval-3 state of the art) and (2) economical way of building more effective ensembles of all types (e.g. optimal, weighted voting and cross-validation based). The method is also highly scalable in that it utilizes readily available factors available for any ambiguous word in any language for estimating word difficulty and defines classifier complexity using known properties only.
UR - http://www.scopus.com/inward/record.url?scp=37149032789&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-70939-8_23
DO - 10.1007/978-3-540-70939-8_23
M3 - Contribución a la conferencia
SN - 354070938X
SN - 9783540709381
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 253
EP - 266
BT - Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings
PB - Springer Verlag
Y2 - 18 February 2007 through 24 February 2007
ER -