Performance Benchmarking of Traditional Machine Learning and Transformer Models for Multi-Class Text Classification

Omar El Khatib; Nabeel Alkhatib

Authors

Omar El Khatib Math. and Computer Science Dept., Loyola University New Orleans, New Orleans, 70118 USA
Nabeel Alkhatib Math. and Computer Science Dept., Loyola University New Orleans, New Orleans, 70118 USA

Keywords:

NLP Multi-Class Classification, Transformer, NLP Transfer Learning, Text Classification

Abstract

Text classification is a fundamental task in natural language processing (NLP), widely applied in areas such as spam detection, sentiment analysis, and text categorization. This study presents a comparative analysis of three distinct machine learning paradigms—traditional machine learning algorithms (like Random Forest, XGBoost, support vector machine and Naive Bayes), a custom-built transformer architecture, and transfer learning or pre-trained transformer models (BERT, DistilBERT, RoBERTa, ELECTRA)—on the multi-class news classification dataset. While traditional models provided competitive baselines with up to 90.47% accuracy, modern transformer architecture surpassed them, achieving 91% accuracy when trained from scratch. The highest performance was observed with transfer learning using pre-trained models, where RoBERTa achieved 94.54% accuracy, DistillBERT achieved 94.32% accuracy, BERT achieved 94.07% accuracy and ELECTRA achieved 93.66%. These findings highlight the significance of contextual embeddings and large-scale pretraining in advancing text classification performance.

References

AG News Dataset source: https://www.kaggle.com/datasets/amananandrai/ag-news-classification-datase

Juergen Schmidhuber. Deep Learning in Neural Networks: An Overview. Neural Networks, Vol 61, pp 85-117, Jan 2015. https://arxiv.org/abs/1404.7828

Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is All You Need. In Advances in Neural Information Processing Systems (NeurIPS), 30.

Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds) Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science, vol 1398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026683

McCallum, A., & Nigam, K. A comparison of event models for naive bayes text classification. AAAI Workshop, 1998.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD, 785–794.

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. ISBN-13: 978-0521865715.

Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. 2013. https://arxiv.org/abs/1301.3781

Jeffrey Pennington, Richard Socher, Christopher Manning. GloVe: Global Vectors for Word Representation. Proceedings Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. https://aclanthology.org/D14-1162

Kim, Y. (2014). Convolutional neural networks for sentence classification. EMNLP.

Sepp Hochreiter, Jürgen Schmidhuber. Long Short-Term Memory. Neural Computation, Volume 9, Issue 8. Pages 1735 – 1780, 1998 https://doi.org/10.1162/neco.1997.9.8.1735

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT, 4171–4186.

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. 2020. https://arxiv.org/abs/2003.10555

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. RoBERTa: A Robustly Optimized BERT Pretraining Approach. 2019. https://arxiv.org/abs/1907.11692

Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith. Linguistic Knowledge and Transferability of Contextual Representations, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. 2019. https://arxiv.org/abs/1903.08855

Performance Benchmarking of Traditional Machine Learning and Transformer Models for Multi-Class Text Classification

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Developed By

Make a Submission

Information

Browse

Latest publications