Keywords:-

Keywords: Short Text, Feature Vector, Clustering, Sparse

Article Content:-

Abstract

Clustering has been an interesting research area due to its usability. It is an unsupervised learning technique which is used in many fields such as machine learning, data mining, pattern recognition, image analysis and bioinformatics etc. Also, Clustering helps in finding targeted patterns and structures directly from very large data sets with little or none of the background knowledge. Researchers are working on it by using various clustering methods/algorithms like K-means, C-means, Fuzzy C-means etc. to make it useful and have obtained meaningful results. In Short Text Clustering, clustering is performed using short
text data like tweets, facebook messages, various news feeds etc. Short text as its name suggests is a text that contains only a few words; for instance, the length of a short text in Twitter is less than 140 words; Search engine queries are mostly short texts. However, it may prove very helpful in extracting meaningful information if this large unorganized data may be grouped on the basis of some similarity. But, the major problem in clustering short text is its sparse feature vector. ‘Feature Vector’ is the key element in clustering technique. So, the major solution proposed by researchers is to expand short text data as a long text using various concepts. In this paper, we have discussed various challenges in short text clustering, concepts used to overcome these challenges and also we have discussed the other possible solutions which may further improve clustering results.

References:-

References

E. Gabrilovich. Feature Generation for Textual Information Retrieval Using World

Knowledge. PhD Thesis, Department of Computer Science, Technion – Israel Institute

of Technology, Haifa, Israel, 2006.

Banerjee, Somnath, Krishnan Ramanathan, and Ajay Gupta. "Clustering short texts

using wikipedia." Proceedings of the 30th annual international ACM SIGIR

conference on Research and development in information retrieval. ACM, 2007.

X. Ni, X. Quan, Z. Lu, and W. Liu and B. Hua, Short text clustering by finding core

terms, Knowledge and information systems 2011;27(3): 345-365.

Hotho, S. Staab and G. Stumme, WordNet improves text document clustering, In

Proceedings of the 26th Annual International ACM SIGIR Conference Semantic Web

Workshop 2003; 541-544.

S. Tian, and X. Zhai, L. Yu, and H. Guo, Uyghur Text Clustering Based on Semantic

Word Set, Journal of Computational Information Systems 2013; 9(2): 781-790. 6. G. Spanakis, G. Siolas, and A. Stafylopatis, Exploiting Wikipedia knowledge for

conceptual hierarchical clustering of documents, The Computer Journal 2012;55(2):

-312.

X. Hu, X. Zhang, C. Lu, E.K. Park, and X. Zhou, Exploiting Wikipedia as external

knowledge for document clustering, In Proceedings of the 15th ACM SIGKDD

international conference on Knowledge discovery and data mining 2009; 389-396.

X. Quan, G. Liu, Z. Lu, X. Ni and W. Liu, Short text similarity based on probabilistic

topics, Knowledge and information systems 2010; 25(3): 473-491,

J. Janruang, and S. Guha, Semantic Suffix Tree Clustering, In Proceedings of the

International Conference on Data Engineering and Internet Technology 2011; 35-40.

M. Sahami and T.D. Heilman, A web-based kernel function for measuring the

similarity of short text snippets, In Proceedings of the 15th international conference

on World WideWeb;377-386. 11. D. Bollegala, Y. Matsuo, and M. Ishizuka, A Web Search Engine-based Approach to

Measure Semantic Similarity between Words, IEEE Transactions on Knowledge and

Data Engineering 2011; 23(7): 977-990.

HUANG, Xiaohui, et al. "Short Text Clustering with Expanding Keywords through

Concept Graph." Journal of Computational Information Systems 9.21 (2013): 8649-

Downloads

Citation Tools

How to Cite
Siddiqui, D. T., & Aalam, P. (2015). Short Text Clustering; Challenges & Solutions: A Literature Review. International Journal Of Mathematics And Computer Research, 3(06), 1025-1031. Retrieved from http://ijmcr.in/index.php/ijmcr/article/view/116