RSIS International

A Study on Spam Detection in Twitter Based on Machine Learning

Submission Deadline: 29th November 2024
November 2024 Issue : Publication Fee: 30$ USD Submit Now
Submission Deadline: 20th November 2024
Special Issue on Education & Public Health: Publication Fee: 30$ USD Submit Now
Submission Deadline: 05th December 2024
Special Issue on Economics, Management, Psychology, Sociology & Communication: Publication Fee: 30$ USD Submit Now

International Journal of Research and Scientific Innovation (IJRSI) | Volume VI, Issue V, May 2019 | ISSN 2321–2705

A Study on Spam Detection in Twitter Based on Machine Learning

Nazia Nusrath Ul Ain1, Meena Kumari K S2

IJRISS Call for paper

1Dept. of Information Science & Engineering, 2Dept of Computer Science & Engineering
Brindavan College of Engineering

Abstract- Spam has continued to grow at a disturbing rate despite on-going reduction efforts. This has been considerably more pervasive on micro blogging websites, given their increased popularity and ease of access. One of the most prominent micro blogging website is Twitter. Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 500 million tweets per day. Spammers leverage on this popularity of platform to trap users in malicious activities by posting spam tweets. There are tools to stop spammers, but these tools can only block malicious links, however they cannot protect the user in real-time as early as possible. Researchers have applied different approaches to detect spam. In this paper, we study the different approaches, some of them are only based on user-based features or tweet-based features or tweet-text feature. Using tweet text feature helps us to identify spam tweets even if the spammer creates a new account which was not possible only with the user and tweet based features. The existing system which used tweet text feature evaluated four different machine learning algorithms namely – Support Vector Machine, Neural Network, Random Forest and Gradient Boosting [1]. In our proposed system, using cross validation techniques, the best performance was obtained using Naive Bayes Model. With Naïve Bayes Model, we are able to achieve accuracy surpassing the existing solution.

Keywords-Naïve Bayes , Random Forest, Spam, ham

I. INTRODUCTION

Internet and social media have become increasingly popular in the recent years. Often internet users spend lot of time on social media to follow the events of their interest, post their messages, share their ideas and make friends around the world. These platforms have become integral part of people’s daily lives. One such platform is twitter which rated as the most popular social network [2].
But with great possibilities come great challenges. Exponential growth of twitter also invites unwanted activities on this platform. Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 500 million tweets per day. Spammers leverage on this popularity of platform to trap users in malicious activities by posting spam tweets.