Feature Extraction and Opinion Mining of Gujarati Language text

Submission Deadline-12th July 2024
June 2024 Issue : Publication Fee: 30$ USD Submit Now
Submission Deadline-20th July 2024
Special Issue of Education: Publication Fee: 30$ USD Submit Now

International Journal of Research and Scientific Innovation (IJRSI) | Volume IX, Issue IV, April 2022 | ISSN 2321–2705

Feature Extraction and Opinion Mining of Gujarati Language text

Himadri Patel, Bankim Patel, Kalpesh Lad
Uka Tarsadia University

IJRISS Call for paper

Abstract – The field of opinion mining has gained much popularity in last few years. Many new techniques and methods are being developed in different languages like English, Hindi etc. However, it is observed that there is no significant progress in the field of Opinion Mining for languages like Gujarati. The presented work uses a deep learning approach for the Opinion Mining of Gujarati language text. The paper also discusses feature extraction which is one of the most important steps in machine learning or deep learning method.

Keywords – Feature Extraction, Opinion Mining, Gujarati, Deep Learning, CNN.

I.INTRODUCTION

Opinion Mining is considered as a classification problem that identifies if the opinion is positive or negative and is done using the classification methods like SVM, KNN, CNN etc [1]. The literature contains work which proves the SVM to be better performing method among the traditional ML methods SVM and Naïve Bayes [2, 3]. The deep learning methods are proven successful in literature of Opinion Mining for Persian language and Arabic language [4, 5]. However, the Gujarati language or other Indian languages are not explored to use use the deep learning methods.
Opinion Mining performed using classification methods uses features which needs to be extracted from the dataset. A good feature is said to be the features that are more expressive, domain dependent, occur rarely and are selected based on document frequency [6]. One of the first ever work done in the Opinion Mining using Machine Learning technique uses TFiDF for vectorization and “Bag-of-words” as its features [7]. They claim that the order of word is very less significant while using SVM as the ML technique. However, other work done for the Opinion Mining introduces an enhanced method called Delta TFiDF and claims that this method best suits for Opinion Mining as their assigned weight is biased towards one corpus either positive or negative. Other methods used for the vectorization are Word2Vec and Doc2Vec in [8] to vectorize the articles on trending topics that updates every hour. They claim that Doc2Vec outperforms TFiDF as it supports dynamically changing vocabulary. However, Doc2Vec or Word2Vec needs the use of Language Model as its prerequisite.
In Opinion Mining, the vectorization is performed on term or tokens which are then converted into features like Lemma, N-gram, POS tagging, Syntactic Dependency Tree etc [9]. One of such feature selection technique uses combination of Lemma with its POS tags adjective, adverb, noun and verb