Efficient Transfer Learning for NLP: An Experimental Analysis of Dimensionality Reduction Techniques
Authors
Abhinav Education Society Institute of Management Research, Savitribai Phule Pune University, Pune (India)
Abhinav Education Society Institute of Management Research, Savitribai Phule Pune University, Pune (India)
Article Information
DOI: 10.51584/IJRIAS.2025.1010000092
Subject Category: Machine Learning
Volume/Issue: 10/10 | Page No: 1093-1100
Publication Timeline
Submitted: 2025-10-28
Accepted: 2025-11-03
Published: 2025-11-10
Abstract
Dimensionality reduction (DR) is crucial for enhancing the efficiency of Natural Language Processing (NLP) systems, especially when utilized along with contemporary transfer learning models like BERT and its variations. While pretrained language models yield state-of-the-art results, they are computationally intensive, thus less useful in resource-scarce environments. In this paper, an experimental comparison of DR methods like Latent Semantic Analysis (LSA), Chi-Square feature selection, and Principal Component Analysis (PCA), in transfer learning for sentiment classification. With the IMDb dataset, we benchmark fine-tuned DistilBERT against TF-IDF baselines (Logistic Regression and SVM) and DR-enhanced pipelines. We find that TF-IDF + SVM and TF-IDF + Chi² + SVM are more efficient but comparable or even better performing than fine-tuned DistilBERT. PCA on DistilBERT embeddings yields compact models but diminishes predictive power. Our results emphasize the need to balance semantic richness with computational efficiency in real-world NLP applications.
Keywords
Transfer Learning, NLP, Dimensionality Reduction, Sentiment Analysis, BERT, PCA, Chi-Square, LSA
Downloads
References
1. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, 2019,doi: 10.18653/v1/N19-1423 [Google Scholar] [Crossref]
2. V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.https://doi.org/10.48550/arXiv.1910.01108 [Google Scholar] [Crossref]
3. A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Proc. NeurIPS, 2017. https://doi.org/10.48550/arXiv.1706.03762 [Google Scholar] [Crossref]
4. Y. Liu, M. Ott, N. Goyal, et al., “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. [Google Scholar] [Crossref]
5. https://doi.org/10.48550/arXiv.1907.11692 [Google Scholar] [Crossref]
6. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” J. Machine Learning Research, vol. 3, pp. 1157–1182, 2003. [Google Scholar] [Crossref]
7. H. Almuallim and T. G. Dietterich, “Learning with many irrelevant features,” in Proc. AAAI, 1991. [Google Scholar] [Crossref]
8. G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, vol. 24, no. 5, pp. 513–523, 1988. [Google Scholar] [Crossref]
9. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” JASIS, vol. 41, no. 6, pp. 391–407, 1990. [Google Scholar] [Crossref]
10. K. Pearson, “On lines and planes of closest fit to systems of points in space,” Philosophical Magazine, vol. 2, no. 11, pp. 559–572, 1901. [Google Scholar] [Crossref]
11. R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1–2, pp. 273–324, 1997. [Google Scholar] [Crossref]
12. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” in Proc. EACL, 2017. [Google Scholar] [Crossref]
13. R. Johnson and T. Zhang, “Effective use of word order for text categorization with CNNs,” in Proc. NAACL-HLT, 2015. [Google Scholar] [Crossref]
14. M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun, “Unsupervised learning of invariant feature hierarchies,” in Proc. CVPR, 2007. [Google Scholar] [Crossref]
15. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015. [Google Scholar] [Crossref]
16. A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised learning,” Applied Intelligence, vol. 51, pp. 2065–2089, 2021. [Google Scholar] [Crossref]
17. X. Mao, Z. Li, Q. Li, and S. Zhang, “BERT-DXLMA: Enhanced representation learning and generalization model for English text classification,” Neurocomputing, 2024,doi: 10.1016/j.neucom.2024.129325. [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- A Machine Learning Model for Predicting the Risk of Developing Diabetes - T2DM Using Real-World Data from Kilifi, Kenya
- AI-Powered Facial Recognition Attendance System Using Deep Learning and Computer Vision
- A Comprehensive Review on Brain Tumour Segmentation Using Deep Learning Approach
- A Scalable Retrieval-Augmented Generation Pipeline for Domain-Specific Knowledge Applications
- Predictive Maintenance in Semiconductor Manufacturing Using Machine Learning on Imbalanced Dataset