Clustering Data Text Based on Semantic
Keywords:
Text mining, Text clustering, Hierarchical clustering, Ontology, Semantic relationship.Abstract
Clustering is one of the most important data mining techniques which categorize a large number of unordered text documents into meaningful and coherent clusters. Most of text clustering algorithms do not consider the semantic relationships between words and do not have the ability to recognize and use the semantic concepts.In this paper, a new algorithm has been presented to cluster texts based on meanings of the words. First, a new method has been presented to find semantic relationship between words based on Wordnet ontology then, text data is clustered using the proposed method and hierarchical clustering algorithm. Documents are preprocessed, converted to vector space model, and then are clustered using the proposed algorithm semantically. The experimental results show that the quality and accuracy of the proposed algorithm are more reliable than the existing hierarchical clustering algorithms.
References
A. K. Jain, "Data clustering: 50 years beyond K-means," Pattern recognition letters, vol. 31, no. 8, pp. 651-666, 2010.
H. H. Tar and T. T. S. Nyunt, "Ontology-based concept weighting for text documents," world Academy of Science, engineering and Technology, vol. 57, pp. 249-253, 2011.
O. Moh’d Alia, M. A. Al-Betar, R. Mandava, and A. T. Khader, "Data clustering using harmony search algorithm," in International Conference on Swarm, Evolutionary, and Memetic Computing, 2011, pp. 79-88: Springer.
A. K. Uysal and S. Gunal, "Text classification using genetic algorithm oriented latent semantic features," Expert Systems with Applications, vol. 41, no. 13, pp. 5938-5947, 2014.
T. Wei, Y. Lu, H. Chang, Q. Zhou, and X. Bao, "A semantic approach for text clustering using WordNet and lexical chains," Expert Systems with Applications, vol. 42, no. 4, pp. 2264-2275, 2015.
E. Gabrilovich and S. Markovitch, "Computing semantic relatedness using wikipedia-based explicit semantic analysis," in IJcAI, 2007, vol. 7, pp. 1606-1611.
I. Witten and D. Milne, "An effective, low-cost measure of semantic relatedness obtained from Wikipedia links," in Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, AAAI Press, Chicago, USA, 2008, pp. 25-30.
W. Song, C. H. Li, and S. C. Park, "Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures," Expert Systems with Applications, vol. 36, no. 5, pp. 9095-9104, 2009.
W. K. Gad and M. S. Kamel, "Enhancing text clustering performance using semantic similarity," in International Conference on Enterprise Information Systems, 2009, pp. 325-335: Springer.
Z. Wu and M. Palmer, "Verbs semantics and lexical selection," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994, pp. 133-138: Association for Computational Linguistics.
H.-C. Huang, Y.-Y. Chuang, and C.-S. Chen, "Multiple kernel fuzzy clustering," IEEE Transactions on Fuzzy Systems, vol. 20, no. 1, pp. 120-134, 2012.
Downloads
Published
How to Cite
Issue
Section
License
Authors who submit papers with this journal agree to the following terms.