Keywords:-
Article Content:-
Abstract
Fake News on digital platforms is a major problem in this digital age. Many people want to find methods to detect Fake News. This research looks at a way to group Fake News articles using K-Means and Agglomerative Clustering techniques, using the semantic representations from Word2Vec embeddings. The researchers use natural language translation methods and advanced machine learning to improve the accuracy and efficiency of Fake News detection. The study involves getting meaningful features from textual data, turning them into vector representations using Word2Vec, and then applying clustering algorithms to sort similar articles. The methodology aims to improve the most recent state of the art in Fake News detection, helping to create more reliable and robust tools to fight misinformation in the digital age, In the comparative analysis of clustering metrics, K-Means clustering exhibits a Purity Score of 88.09% and an Adjusted Rand Score of 58.03%. Conversely, Agglomerative Clustering with the Ward method yields a Purity Score of 85.13% and an Adjusted Rand Score of 49.36%.The Purity Score of 88.09% for K-Means suggests a strong ability to form clusters where the majority of data points share the same true class. Agglomerative Clustering with Ward, though slightly lower at 85.13%, also demonstrates effective class separation within clusters. When considering the Adjusted Rand Score, which accounts for chance and measures the agreement between true and predicted labels, K-Means significantly outperforms Agglomerative Clustering with Ward. The scores are 58.03% and 49.36%, respectively.
References:-
References
Z. Zikrayanti, "PREVENTION OF ONLINE FAKE NEWS ON SOCIAL MEDIA DURING COVID-19 PANDEMIC: A LITERATURE REVIEW APPROACH," PROCEEDINGS 2022, 153–164.
H. Liu, L. Fang, J.-G. Lou, and Z. Li, "Leveraging Web Semantic Knowledge in Word Representation Learning," Proc. AAAI Conf. Artif. Intell. 33, 6746–6753 (2019).
X. Zhang and A. A. Ghorbani, "An overview of online fake news: Characterization, detection, and discussion," Inf. Process. & Manag. 57(2), 102025 (2020).
S. Tufchi, A. Yadav, and T. Ahmed, "A comprehensive survey of multimodal fake news detection techniques: advances, challenges, and opportunities," Int. J. Multimedia Inf. Retr. 12(2) (2023).
M. Ahmed, R. Seraj, and S. M. S. Islam, "The k-means Algorithm: A Comprehensive Survey and Performance Evaluation," Electronics 9(8), 1295 (2020).
A. Bouguettaya, Q. Yu, X. Liu, X. Zhou, and A. Song, "Efficient agglomerative hierarchical clustering," Expert Syst. With Appl. 42(5), 2785–2797 (2015).
S. Kim, H. Park, and J. Lee, "Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis," Expert Syst. With Appl. 152, 113401 (2020).
K. Chen, Z. Duan, and S. Yang, "Twitter as research data," Politics Life Sci. 2021, 1–17.
O. Abu Arqoub, A. A. Elega, B. Efe Özad, H. Dwikat, and F. A. Oloyede, "Mapping the Scholarship of Fake News Research: A Systematic Review," Journal. Pract. 2020, 1–31.
Y. Hwang, "Development of Big Data Teaching-learning Activities Using Wordcloud Based on Learners' Preferences," J. Converg. Sci., Technol., Soc. 1(1), 1–5 (2022).
R. R. Ferreira, "Benkler, Y., Faris, R., & Roberts, H. (2018). Network propaganda: Manipulation, disinformation, and radicalization in American politics. Oxford: Oxford University Press," Mediap. – Rev. Comun., J. Espac. Publico 2020(11), 103–105.
D. Steinley, M. J. Brusco, and L. Hubert, "The variance of the adjusted Rand index.," Psychol. Methods 21(2), 261–272 (2016).
"Application of Fuzzy and Possibilistic c-Means Clustering Models in Blind Speaker Clustering," Acta Polytech. Hung. 12(7) (2015).
A. Borg, M. Boldt, O. Rosander, and J. Ahlstrand, "E-mail classification with machine learning and word embeddings for improved customer support," Neural Comput. Appl. 2020.
V. A. Kozhevnikov and E. S. Pankratova, "RESEARCH OF TEXT PRE-PROCESSING METHODS FOR PREPARING DATA IN RUSSIAN FOR MACHINE LEARNING.," Theor. & Appl. Sci. 84(04), 313–320 (2020).
G. Mustafa, M. Usman, L. Yu, M. T. afzal, M. Sulaiman, and A. Shahid, "Multi-label classification of research articles using Word2Vec and identification of similarity threshold," Sci. Rep. 11(1) (2021).
A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, "K-means Clustering Algorithms: A Comprehensive Review, Variants Analysis, and Advances in the Era of Big Data," Inf. Sci. 2022.
Hui Xiong, Junjie Wu, and Jian Chen, "K-Means Clustering Versus Validation Measures: A Data-Distribution Perspective," IEEE Trans. Syst., Man, Cybern.,B (Cybern.) 39(2), 318–331 (2009).
C.-P. Hu, J.-M. Hu, S.-L. Deng, and Y. Liu, "A co-word analysis of library and information science in China," Scientometrics 97(2), 369–382 (2013).
C. Gaiteri, M. Chen, B. Szymanski, K. Kuzmin, J. Xie, C. Lee, T. Blanche, E. Chaibub Neto, S.-C. Huang, T. Grabowski, T. Madhyastha, and V. Komashko, "Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering," Sci. Rep. 5(1) (2015).
M. Haghir chehreghani, "Reliable Agglomerative Clustering," IEEE IJCNN; Int. Jt. Conf. Neural Netw. (IJCNN), Pp. 1-8, 2021; Doi 2018(eprint arXiv:1901.02063).
J. H. Ward, "Hierarchical Grouping to Optimize an Objective Function," J. Am. Stat. Assoc. 58(301), 236–244 (1963).
T. Lange, V. Roth, M. L. Braun, and J. M. Buhmann, "Stability-Based Validation of Clustering Solutions," Neural Comput. 16(6), 1299–1323 (2004).
T. Liu, C. Wang, K. Huang, P. Liang, B. Zhang, M. Daneva, and M. van Sinderen, "RoseMatcher: Identifying the impact of user reviews on app updates," Inf. Softw. Technol. 2023, 107261.
Z. Khanam, B.N. Alwasel, H. Sirafi, and M. Rashid, IOP Conference Series: Materials Science and Engineering 1099, 012040 (2021).