Efficient Transfer Learning for NLP: An Experimental Analysis of Dimensionality Reduction Techniques

Authors

Mrs. Vaishali Suryawanshi

Abhinav Education Society Institute of Management Research, Savitribai Phule Pune University, Pune (India)

Dr. Abhijeet Kaiwade

Abhinav Education Society Institute of Management Research, Savitribai Phule Pune University, Pune (India)

Article Information

DOI: 10.51584/IJRIAS.2025.1010000092

Subject Category: Machine Learning

Volume/Issue: 10/10 | Page No: 1093-1100

Publication Timeline

Submitted: 2025-10-28

Accepted: 2025-11-03

Published: 2025-11-10

Abstract

Dimensionality reduction (DR) is crucial for enhancing the efficiency of Natural Language Processing (NLP) systems, especially when utilized along with contemporary transfer learning models like BERT and its variations. While pretrained language models yield state-of-the-art results, they are computationally intensive, thus less useful in resource-scarce environments. In this paper, an experimental comparison of DR methods like Latent Semantic Analysis (LSA), Chi-Square feature selection, and Principal Component Analysis (PCA), in transfer learning for sentiment classification. With the IMDb dataset, we benchmark fine-tuned DistilBERT against TF-IDF baselines (Logistic Regression and SVM) and DR-enhanced pipelines. We find that TF-IDF + SVM and TF-IDF + Chi² + SVM are more efficient but comparable or even better performing than fine-tuned DistilBERT. PCA on DistilBERT embeddings yields compact models but diminishes predictive power. Our results emphasize the need to balance semantic richness with computational efficiency in real-world NLP applications.

Keywords

Transfer Learning, NLP, Dimensionality Reduction, Sentiment Analysis, BERT, PCA, Chi-Square, LSA

Downloads

References

1. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, 2019,doi: 10.18653/v1/N19-1423 [Google Scholar] [Crossref]

2. V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.https://doi.org/10.48550/arXiv.1910.01108 [Google Scholar] [Crossref]

3. A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Proc. NeurIPS, 2017. https://doi.org/10.48550/arXiv.1706.03762 [Google Scholar] [Crossref]

4. Y. Liu, M. Ott, N. Goyal, et al., “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. [Google Scholar] [Crossref]

5. https://doi.org/10.48550/arXiv.1907.11692 [Google Scholar] [Crossref]

6. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” J. Machine Learning Research, vol. 3, pp. 1157–1182, 2003. [Google Scholar] [Crossref]

7. H. Almuallim and T. G. Dietterich, “Learning with many irrelevant features,” in Proc. AAAI, 1991. [Google Scholar] [Crossref]

8. G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, vol. 24, no. 5, pp. 513–523, 1988. [Google Scholar] [Crossref]

9. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” JASIS, vol. 41, no. 6, pp. 391–407, 1990. [Google Scholar] [Crossref]

10. K. Pearson, “On lines and planes of closest fit to systems of points in space,” Philosophical Magazine, vol. 2, no. 11, pp. 559–572, 1901. [Google Scholar] [Crossref]

11. R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1–2, pp. 273–324, 1997. [Google Scholar] [Crossref]

12. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” in Proc. EACL, 2017. [Google Scholar] [Crossref]

13. R. Johnson and T. Zhang, “Effective use of word order for text categorization with CNNs,” in Proc. NAACL-HLT, 2015. [Google Scholar] [Crossref]

14. M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun, “Unsupervised learning of invariant feature hierarchies,” in Proc. CVPR, 2007. [Google Scholar] [Crossref]

15. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015. [Google Scholar] [Crossref]

16. A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised learning,” Applied Intelligence, vol. 51, pp. 2065–2089, 2021. [Google Scholar] [Crossref]

17. X. Mao, Z. Li, Q. Li, and S. Zhang, “BERT-DXLMA: Enhanced representation learning and generalization model for English text classification,” Neurocomputing, 2024,doi: 10.1016/j.neucom.2024.129325. [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles