Document clustering based on maximal frequent sequences

Edith Hernández-Reyes, Rene A. García-Hernández, J. A. Carrasco-Ochoa, J. Fco Martínez-Trinidad

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearch

16 Citations (Scopus)

Abstract

Document clustering has the goal of discovering groups with similar documents. The success of the document clustering algorithms depends on the model used for representing these documents. Documents are commonly represented with the vector space model based on words or n-grams. However, these representations have some disadvantages such as high dimensionality and loss of the word sequential order. In this work, we propose a new document representation in which the maximal frequent sequences of words are used as features of the vector space model. The proposed model efficiency is evaluated by clustering different document collections and compared against the vector space model based on words and n-grams, through internal and external measures. © Springer-Verlag Berlin Heidelberg 2006.
Original languageAmerican English
Title of host publicationDocument clustering based on maximal frequent sequences
Pages257-267
Number of pages230
ISBN (Electronic)3540373349, 9783540373346
StatePublished - 1 Jan 2006
Externally publishedYes
EventLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) -
Duration: 1 Jan 2014 → …

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4139 LNAI
ISSN (Print)0302-9743

Conference

ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Period1/01/14 → …

Fingerprint

Document Clustering
Vector Space Model
N-gram
Vector spaces
Model-based
Clustering Algorithm
Dimensionality
Clustering
Internal
Clustering algorithms
Model

Cite this

Hernández-Reyes, E., García-Hernández, R. A., Carrasco-Ochoa, J. A., & Martínez-Trinidad, J. F. (2006). Document clustering based on maximal frequent sequences. In Document clustering based on maximal frequent sequences (pp. 257-267). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4139 LNAI).
Hernández-Reyes, Edith ; García-Hernández, Rene A. ; Carrasco-Ochoa, J. A. ; Martínez-Trinidad, J. Fco. / Document clustering based on maximal frequent sequences. Document clustering based on maximal frequent sequences. 2006. pp. 257-267 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{02c0bba500d449269ca4f0dba49f4115,
title = "Document clustering based on maximal frequent sequences",
abstract = "Document clustering has the goal of discovering groups with similar documents. The success of the document clustering algorithms depends on the model used for representing these documents. Documents are commonly represented with the vector space model based on words or n-grams. However, these representations have some disadvantages such as high dimensionality and loss of the word sequential order. In this work, we propose a new document representation in which the maximal frequent sequences of words are used as features of the vector space model. The proposed model efficiency is evaluated by clustering different document collections and compared against the vector space model based on words and n-grams, through internal and external measures. {\circledC} Springer-Verlag Berlin Heidelberg 2006.",
author = "Edith Hern{\'a}ndez-Reyes and Garc{\'i}a-Hern{\'a}ndez, {Rene A.} and Carrasco-Ochoa, {J. A.} and Mart{\'i}nez-Trinidad, {J. Fco}",
year = "2006",
month = "1",
day = "1",
language = "American English",
isbn = "3540373349",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "257--267",
booktitle = "Document clustering based on maximal frequent sequences",

}

Hernández-Reyes, E, García-Hernández, RA, Carrasco-Ochoa, JA & Martínez-Trinidad, JF 2006, Document clustering based on maximal frequent sequences. in Document clustering based on maximal frequent sequences. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4139 LNAI, pp. 257-267, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1/01/14.

Document clustering based on maximal frequent sequences. / Hernández-Reyes, Edith; García-Hernández, Rene A.; Carrasco-Ochoa, J. A.; Martínez-Trinidad, J. Fco.

Document clustering based on maximal frequent sequences. 2006. p. 257-267 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4139 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearch

TY - GEN

T1 - Document clustering based on maximal frequent sequences

AU - Hernández-Reyes, Edith

AU - García-Hernández, Rene A.

AU - Carrasco-Ochoa, J. A.

AU - Martínez-Trinidad, J. Fco

PY - 2006/1/1

Y1 - 2006/1/1

N2 - Document clustering has the goal of discovering groups with similar documents. The success of the document clustering algorithms depends on the model used for representing these documents. Documents are commonly represented with the vector space model based on words or n-grams. However, these representations have some disadvantages such as high dimensionality and loss of the word sequential order. In this work, we propose a new document representation in which the maximal frequent sequences of words are used as features of the vector space model. The proposed model efficiency is evaluated by clustering different document collections and compared against the vector space model based on words and n-grams, through internal and external measures. © Springer-Verlag Berlin Heidelberg 2006.

AB - Document clustering has the goal of discovering groups with similar documents. The success of the document clustering algorithms depends on the model used for representing these documents. Documents are commonly represented with the vector space model based on words or n-grams. However, these representations have some disadvantages such as high dimensionality and loss of the word sequential order. In this work, we propose a new document representation in which the maximal frequent sequences of words are used as features of the vector space model. The proposed model efficiency is evaluated by clustering different document collections and compared against the vector space model based on words and n-grams, through internal and external measures. © Springer-Verlag Berlin Heidelberg 2006.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=33749670998&origin=inward

UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=33749670998&origin=inward

M3 - Conference contribution

SN - 3540373349

SN - 9783540373346

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 257

EP - 267

BT - Document clustering based on maximal frequent sequences

ER -

Hernández-Reyes E, García-Hernández RA, Carrasco-Ochoa JA, Martínez-Trinidad JF. Document clustering based on maximal frequent sequences. In Document clustering based on maximal frequent sequences. 2006. p. 257-267. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).