Keywords:-
Article Content:-
Abstract
Clustering has been an interesting research area due to its usability. It is an unsupervised learning technique which is used in many fields such as machine learning, data mining, pattern recognition, image analysis and bioinformatics etc. Also, Clustering helps in finding targeted patterns and structures directly from very large data sets with little or none of the background knowledge. Researchers are working on it by using various clustering methods/algorithms like K-means, C-means, Fuzzy C-means etc. to make it useful and have obtained meaningful results. In Short Text Clustering, clustering is performed using short
text data like tweets, facebook messages, various news feeds etc. Short text as its name suggests is a text that contains only a few words; for instance, the length of a short text in Twitter is less than 140 words; Search engine queries are mostly short texts. However, it may prove very helpful in extracting meaningful information if this large unorganized data may be grouped on the basis of some similarity. But, the major problem in clustering short text is its sparse feature vector. ‘Feature Vector’ is the key element in clustering technique. So, the major solution proposed by researchers is to expand short text data as a long text using various concepts. In this paper, we have discussed various challenges in short text clustering, concepts used to overcome these challenges and also we have discussed the other possible solutions which may further improve clustering results.
References:-
References
E. Gabrilovich. Feature Generation for Textual Information Retrieval Using World
Knowledge. PhD Thesis, Department of Computer Science, Technion – Israel Institute
of Technology, Haifa, Israel, 2006.
Banerjee, Somnath, Krishnan Ramanathan, and Ajay Gupta. "Clustering short texts
using wikipedia." Proceedings of the 30th annual international ACM SIGIR
conference on Research and development in information retrieval. ACM, 2007.
X. Ni, X. Quan, Z. Lu, and W. Liu and B. Hua, Short text clustering by finding core
terms, Knowledge and information systems 2011;27(3): 345-365.
Hotho, S. Staab and G. Stumme, WordNet improves text document clustering, In
Proceedings of the 26th Annual International ACM SIGIR Conference Semantic Web
Workshop 2003; 541-544.
S. Tian, and X. Zhai, L. Yu, and H. Guo, Uyghur Text Clustering Based on Semantic
Word Set, Journal of Computational Information Systems 2013; 9(2): 781-790. 6. G. Spanakis, G. Siolas, and A. Stafylopatis, Exploiting Wikipedia knowledge for
conceptual hierarchical clustering of documents, The Computer Journal 2012;55(2):
-312.
X. Hu, X. Zhang, C. Lu, E.K. Park, and X. Zhou, Exploiting Wikipedia as external
knowledge for document clustering, In Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and data mining 2009; 389-396.
X. Quan, G. Liu, Z. Lu, X. Ni and W. Liu, Short text similarity based on probabilistic
topics, Knowledge and information systems 2010; 25(3): 473-491,
J. Janruang, and S. Guha, Semantic Suffix Tree Clustering, In Proceedings of the
International Conference on Data Engineering and Internet Technology 2011; 35-40.
M. Sahami and T.D. Heilman, A web-based kernel function for measuring the
similarity of short text snippets, In Proceedings of the 15th international conference
on World WideWeb;377-386. 11. D. Bollegala, Y. Matsuo, and M. Ishizuka, A Web Search Engine-based Approach to
Measure Semantic Similarity between Words, IEEE Transactions on Knowledge and
Data Engineering 2011; 23(7): 977-990.
HUANG, Xiaohui, et al. "Short Text Clustering with Expanding Keywords through
Concept Graph." Journal of Computational Information Systems 9.21 (2013): 8649-