Challenges and Opportunities of Using Twitter Data in Educational Discourse Research: A Methodological Reflection from the Malaysian Odl Context

Mohd Amirul Atan
Nur Aqilah Norwahi
3073-3083
Jul 8, 2025
Education

Challenges and Opportunities of Using Twitter Data in Educational Discourse Research: A Methodological Reflection from the Malaysian ODL Context

Mohd Amirul Atan^*, Nur Aqilah Norwahi

Academy of Language Studies, University Technology MARA Cawangan Melaka (Kampus Jasin), Melaka, Malaysia

*Corresponding Author

DOI: https://dx.doi.org/10.47772/IJRISS.2025.906000226

Received: 30 May 2025; Accepted: 03 June 2025; Published: 08 July 2025

ABSTRACT

This paper offers a methodological reflection on the use of Twitter (now X) as a data source for analysing public discourse surrounding Open and Distance Learning (ODL) in Malaysian higher education during the COVID-19 pandemic and endemic phases. Drawing on a corpus of over 30,000 tweets collected between March 2020 and December 2022, of which 3,000 were purposively sampled for qualitative analysis, the paper integrates Discursive News Value Analysis (DNVA) and Corpus-Assisted Discourse Studies (CADS) to examine linguistic and thematic patterns. Drawing from an ongoing doctoral research project, it discusses the advantages of using Twitter for capturing spontaneous, affective expressions of student experiences, while critically examining key challenges including linguistic hybridity, data cleaning, contextual ambiguity, and ethical considerations. The study demonstrates how Twitter-based data can provide a rich discursive lens into students’ reception of education policies and institutional responses during crisis conditions. It also highlights the methodological potential of corpus-assisted and discourse-based frameworks, such as Discursive News Value Analysis (DNVA) and Corpus-Assisted Discourse Studies (CADS), for navigating and interpreting large-scale social media data. The paper concludes with recommendations for future research, emphasising ethical sensitivity, tool selection, and the value of methodological triangulation. These insights affirm Twitter’s ongoing relevance as a powerful medium for post-pandemic educational research and for understanding the lived realities of digitally mediated learning.

Keywords: Twitter, Open and Distance Learning (ODL), COVID-19, Malaysian higher education, discourse analysis, digital ethnography, educational policy, methodological reflection

INTRODUCTION

The unprecedented shift to Open and Distance Learning (henceforth, ODL) during the COVID-19 pandemic significantly transformed the landscape of higher education in Malaysia and beyond. Faced with prolonged closures of physical campuses, universities were compelled to transition swiftly to digital platforms to sustain academic continuity (Mohd Yusof et al., 2021). This rapid transformation, while necessary, exposed various systemic inequalities, ranging from digital infrastructure limitations to pedagogical readiness among both students and educators (Ariffin et al., 2020; Rajab et al., 2020). As the pandemic evolved into an endemic phase, the continuity of ODL further highlighted questions around the sustainability, inclusivity, and overall effectiveness of digital learning modes, particularly in contexts with pronounced socio-economic and geographical disparities.

In this context, public discourse has emerged as an important dimension in understanding how policy shifts, institutional responses, and individual experiences around ODL are negotiated and represented. Beyond conventional evaluation tools such as surveys or institutional feedback mechanisms, spontaneous and user-generated discourse in digital spaces offers unfiltered insights into how learners interpret, cope with, and critique educational changes (Zappavigna, 2012; Selwyn, 2020). As such, analysing public discourse—particularly in online platforms—provides a valuable means of gauging the emotional, cultural, and communicative dimensions of students’ engagement with ODL. This is especially relevant in moments of crisis, where top-down policies often intersect with complex lived realities.

Twitter, currently rebranded as X, has become a prominent platform for capturing real-time public sentiments, particularly during crisis periods. Its brevity, immediacy, and popularity among Malaysian youths make it an especially rich source for educational discourse analysis (Zayani, 2015; Ariffin & Yaacob, 2022). Tweets posted during the pandemic encapsulate authentic and often emotionally charged reflections on students’ lived experiences with ODL, including frustrations with connectivity, perceived lack of support, and the pressures of self-directed learning. As a form of digital ethnography, Twitter data allows researchers to observe the vernacular voices of students in their own terms, often using colloquial or hybridised linguistic forms that traditional data collection methods might overlook (Barton & Lee, 2013; Tagg & Seargeant, 2014).

This paper reflects on the methodological complexities and affordances of using Twitter data to examine the representation of ODL in Malaysian universities during the COVID-19 pandemic and endemic phases. Drawing from an ongoing doctoral study, the paper outlines the rationale for selecting Twitter as a primary data source and discusses the procedural, ethical, and analytical challenges encountered in the collection, cleaning, and interpretation of the data. Particular attention is given to issues such as language hybridity, representational bias, contextual ambiguity, and ethical considerations in using publicly available yet personally expressive data. At the same time, this paper highlights the opportunities presented by social media discourse for educational researchers, including the ability to capture affective dimensions of learning and policy reception that are often absent from institutional narratives.

By offering a reflexive account of the methodological processes involved, this study aims to contribute to broader conversations on digital research practices in education, particularly those that engage with public discourse and informal learning environments. It is hoped that the insights presented here will serve as a guide for researchers seeking to harness the potential of social media platforms in educational discourse research, especially within crisis-affected and linguistically diverse contexts.

CONTEXTUAL BACKGROUND

The Malaysian higher education system, governed primarily by the Ministry of Higher Education (MOHE), faced unprecedented challenges following the outbreak of the COVID-19 pandemic in early 2020. In response to the nationwide enforcement of the Movement Control Order (henceforth, MCO), which necessitated the suspension of all face-to-face academic activities, MOHE issued a series of circulars and directives urging higher education institutions (HEIs) to transition to online modes of teaching and learning (MOHE, 2020). As a result, ODL emerged as the primary mechanism for instructional continuity across both public and private universities.

The decision to implement ODL on a large scale was both strategic and urgent. Unlike conventional online learning that may exist as a supplementary tool, ODL in this context became a compulsory substitute for all academic delivery, assessments, and student engagement activities (Samsudin et al., 2021). While institutions such as University Technology MARA (UiTM), University Kebangsaan Malaysia (UKM), and University Malaya (UM) had pre-existing digital infrastructures, the sudden scale-up of ODL revealed stark disparities in digital access, pedagogical readiness, and institutional capacities (Yusof et al., 2020). Students from low-income families, rural areas, or marginalised communities were disproportionately affected, with limited access to reliable internet, digital devices, or conducive learning environments (Adnan & Anwar, 2020; Hashim et al., 2021).

As the nation gradually transitioned into the endemic phase of COVID-19 from mid-2022 onwards, ODL continued to occupy a significant role in higher education. MOHE promoted hybrid and flexible learning policies through initiatives such as the Higher Education Digitalisation Agenda (HEDA) and Pelan Strategik e-Pembelajaran Kebangsaan (PSepK). These strategies aimed to embed digital learning as a sustainable component of Malaysia’s education future, rather than as a temporary emergency response (MOHE, 2022). Consequently, discussions surrounding ODL shifted from issues of access and infrastructure to broader questions of learning quality, mental health, digital equity, and policy responsiveness.

Within this evolving context, social media has emerged as a crucial site for observing and analysing how students make sense of, respond to, and critique ODL policies and experiences. Platforms such as Twitter (now known as X) provide a dynamic archive of public sentiment, often capturing expressions of frustration, humour, protest, or solidarity that are typically absent from institutional evaluations or formal feedback mechanisms (Selwyn & Jandrić, 2020; Seargeant & Tagg, 2014). The microblogging nature of Twitter, combined with its widespread use among Malaysian youths, makes it especially valuable for discourse-oriented research, where authenticity, immediacy, and affective expression are central concerns.

Moreover, social media discourse can reveal patterns of representation that reflect broader socio-political dynamics, including perceptions of government accountability, digital inequality, and educational justice (Zappavigna, 2012; Lee, 2021). These vernacular discourses serve as both public texts and social actions, where students do not merely report experiences but construct meaning, voice resistance, and negotiate identity in the midst of crisis. As such, leveraging Twitter data for educational research not only broadens the methodological toolkit but also aligns with a critical, student-centred approach to understanding the impact of national education policies in times of disruption.

METHODOLOGICAL DESIGN

Twitter as a Data Source for Educational Discourse

Twitter presents unique affordances as a data source for discourse-oriented educational research. As a platform characterised by immediacy, brevity, and high user engagement, it enables researchers to access large volumes of public discourse in near real-time (Zappavigna, 2012; Sloan et al., 2018). In the context of the COVID-19 pandemic, Twitter served not only as a channel for information dissemination but also as a discursive space where users, including students and educators, articulated personal experiences, institutional critiques, and emotional responses to rapid educational transitions.

The use of Twitter is especially pertinent for studies seeking to understand lived experiences and vernacular perspectives. Tweets often reflect raw, unfiltered sentiments expressed in the moment, allowing for the observation of how individuals negotiate meaning, identity, and resistance within socio-educational crises (Papacharissi, 2015; Barton & Lee, 2013). Furthermore, its public-by-default interface and the use of hashtags enable researchers to trace thematic clusters, social alignments, and temporal discourse shifts without intruding into private spaces.

Tweet Collection Approach

For this paper, tweets were collected using a combination of keyword- and hashtag-based search strategies, designed to capture the discourse around Open and Distance Learning (ODL) in Malaysia during the pandemic and endemic phases. Primary keywords included “ODL”, “kelas online”, “pengajian jarak jauh”, “MOHE”, “kelas UiTM”, and other variations in both English and Malay. These were paired with event-based hashtags such as #pkp, #kelasonline, and #universiti to contextualise the tweets within specific policy announcements or institutional developments.

Due to the limited availability of Twitter’s full-archive API for academic research at the time of data collection, a hybrid method was employed: tweet scraping was conducted using open-source tools such as GetOldTweets3 and Twarc, which allowed for retrospective retrieval based on temporal and lexical filters (Driscoll & Walker, 2014). Data was then exported into a spreadsheet format for cleaning and organisation.

Only tweets posted between March 2020 and December 2022 were included, corresponding with Malaysia’s major policy phases: the initial MCO, subsequent reopening, and the transition to endemic management. To ensure linguistic and contextual relevance, tweets were filtered by geolocation (Malaysia), language (Malay and English), and user type (excluding institutional accounts where possible).

Sampling Strategy

Given the volume of tweets retrieved (over 30,000 initially), a purposive sampling approach was employed to generate a manageable and analytically rich corpus. The aim was not to achieve statistical representativeness but rather discursive saturation, where thematic patterns could be observed and critically interrogated (Baker & McEnery, 2015).

From the larger pool, a refined sample of approximately 3,000 tweets was selected for close reading and qualitative analysis, with selection guided by thematic relevance, user type (students, educators), and diversity of sentiment. Tweets were categorised under key themes such as emotional distress, digital inequality, academic pressure, and policy reactions. Particular attention was given to tweets that exhibited expressive language, linguistic hybridity (e.g., Malay-English code-switching), or pointed critiques of institutional and governmental responses.

This sample size proved sufficient for conducting both Corpus-Assisted Discourse Studies (CADS) and Discursive News Value Analysis (DNVA) in subsequent analytical stages, enabling a nuanced understanding of how ODL was discursively constructed during a time of disruption.

Ethical Considerations

The use of publicly available social media data raises important ethical considerations, particularly in balancing the accessibility of data with the protection of individual users’ identities and intent. While Twitter content is technically public and accessible without the need for informed consent, scholars have cautioned against treating such data as ethically unproblematic (Townsend & Wallace, 2016; Zimmer, 2010).

In this study, ethical diligence was maintained by anonymising user handles, paraphrasing potentially sensitive content in publications, and avoiding the use of verbatim quotes that could be easily traced through search engines. Tweets from verified or institutional accounts were excluded unless the focus was explicitly on institutional communication. Additionally, tweets involving minors or disclosing personal crises (e.g., self-harm, financial distress) were flagged and excluded from qualitative presentation.

This approach aligns with established guidelines in internet research ethics, including those outlined by the Association of Internet Researchers (AoIR), which emphasise context sensitivity, user expectations, and harm minimisation (AoIR, 2019). By treating tweets as socio-discursive artefacts rather than neutral data points, this research adopts a critical-ethical stance that acknowledges both the value and vulnerability embedded in digital expressions.

CHALLENGES ENCOUNTERED

While the use of Twitter data presents valuable opportunities for educational discourse research, it also introduces a range of methodological and analytical challenges. These challenges are particularly salient in studies focusing on informal, multilingual, and emotionally charged content, as in the case of Malaysian students’ responses to ODL during the COVID-19 crisis. This section outlines five major challenges encountered during the data collection and analysis process: linguistic complexity, data cleaning and preprocessing, representativeness and bias, volume and noise, and contextual ambiguity.

Linguistic Complexity: Informality, Mixed Languages, and Abbreviations

One of the most prominent challenges involved navigating the linguistic diversity and informality of the tweets. For instance, expressions such as ‘ODL stress gila doh’ and ‘kelas td mcm xdengar apapun’ demonstrate a blend of Malay-English code-switching, slang, and phonetic spellings that require contextual fluency to decode. Malaysian Twitter discourse is characteristically multilingual, with users frequently switching between Malay and English, sometimes within a single sentence or phrase (Lee & Barton, 2020). This phenomenon of code-switching complicates both keyword searches and semantic analysis, as meanings may shift fluidly across linguistic boundaries. Additionally, users often employ slang, abbreviations, phonetic spellings, and emotive expressions such as “ODL stress gila doh” or “kelas td mcm xdengar apapun 😩”, which resist standardised textual interpretation.

These features, while rich in sociolinguistic value, pose analytical challenges for corpus cleaning, tokenisation, and categorisation. For instance, the abbreviation “td” (short for tadi, meaning “just now”) or “xdengar” (a phonetic contraction of “tidak dengar”, meaning “did not hear”) requires cultural and contextual fluency to interpret accurately. The hybrid language forms also complicate the use of natural language processing tools that are often optimised for monolingual corpora.

Data Cleaning and Pre-processing: Duplicates, Spam, and Bots

The process of data cleaning was another critical and time-consuming phase. The initial dataset contained a high proportion of non-relevant content, including retweets, promotional tweets, automated bot posts, and institutional announcements. Filtering these out required a combination of automated and manual screening. Keyword-based collection often captured irrelevant uses of “ODL” (e.g., unrelated to education) or spam content that employed trending hashtags to boost visibility.

Additionally, the prevalence of duplicated tweets, particularly during viral episodes of critique or humour, skewed frequency analyses. For example, a single viral complaint about unstable internet connections might be retweeted thousands of times, amplifying its apparent weight in the dataset. While retweets represent an important marker of social endorsement, they also introduce redundancy and require careful handling depending on the analytical focus (Boyd et al., 2010).

Bot-like behaviour—defined by repetitive posting patterns, unusual user activity, or the use of clickbait phrases—was also observed. Although bot detection was not the primary focus of this study, heuristics such as post frequency, user metadata, and language patterns were applied to minimise non-human interference in the final dataset.

Representativeness and Bias: Who Tweets and Who Does Not?

A key epistemological concern was the non-representative nature of Twitter users. While Twitter is popular among Malaysian youth, particularly university students, it does not reflect the full spectrum of learner experiences. Students without internet access, digital literacy, or social media fluency are structurally excluded from the dataset. This raises critical questions of data bias and epistemic exclusion—whose voices are amplified in public discourse, and whose are silenced? (Tufekci, 2014).

Furthermore, the dataset is inherently skewed towards expressive individuals, including those who are more likely to externalise emotional responses, critique policy, or participate in online activism. Introverted or disengaged students, or those facing severe emotional distress, may choose not to voice their concerns publicly, resulting in a corpus that reflects only certain affective or ideological positions.

It is thus essential to interpret Twitter discourse as indicative, not exhaustive, of student sentiment. Rather than claiming representativeness, this study adopts a discourse-centred approach that treats tweets as culturally situated expressions rather than generalisable data points.

Volume and Noise: Filtering Relevance Amid Massive Data

The sheer volume of retrieved data—tens of thousands of tweets across multiple timeframes—introduced significant analytical noise. This necessitated rigorous filtering to isolate tweets that were genuinely reflective of the ODL experience in higher education contexts. Keywords such as “kelas online” or “ODL” occasionally appeared in unrelated discussions, including secondary school contexts, promotional material, or non-educational complaints.

Furthermore, not all tweets that used relevant keywords contained substantive discourse. Many posts consisted of single-word exclamations, sarcastic emojis, or meme references that lacked clear thematic content. Manual review was essential to ensure that the final sample contained tweets that offered discursive depth—in other words, content that was rich enough to allow for thematic or linguistic analysis.

To manage this, a multi-stage filtering strategy was implemented, combining automated keyword filters with human review and thematic coding. This ensured that the final dataset retained both breadth and relevance while avoiding superficial or low-informational content.

Contextual Ambiguity: Sarcasm, Irony, and Non-Verbal Cues

A final and persistent challenge was the contextual ambiguity inherent in social media discourse. Many tweets employed sarcasm, irony, or indirect humour, particularly when expressing dissatisfaction with university policies or lecturer behaviour. For example, a tweet stating “best gila kelas td, dapat tgk ceiling sampai habis” (“today’s class was so great, I got to stare at the ceiling the whole time”) is clearly sarcastic, yet such nuances may be lost in surface-level textual analysis.

Moreover, the absence of non-verbal cues, such as tone, facial expression, or body language, limits the interpretability of sentiment. Emojis and punctuation marks often serve as surrogate cues, but they too are culturally coded and can shift in meaning depending on the context or audience familiarity (Tagg, 2015). This ambiguity poses difficulties for both automated sentiment analysis and human coding, as misinterpretation of tone can lead to inaccurate thematic categorisation.

To mitigate these issues, interpretive decisions were grounded in broader cultural knowledge of Malaysian student discourse, and tweets were analysed holistically rather than in isolation. When necessary, references to concurrent events (e.g., exam periods, policy announcements) were cross-checked to clarify intended meanings.

OPPORTUNITIES AND INSIGHTS

Despite the methodological complexities of working with Twitter data, this study found that social media platforms such as Twitter offer powerful and underutilised opportunities for discourse researchers examining higher education responses during crises. In particular, Twitter’s affordances provide valuable insight into the affective, temporal, and socio-political dimensions of students’ experiences with ODL in Malaysia. This section reflects on four key methodological advantages: the value of spontaneous expression, the visibility of public policy reception, the potential for advanced discourse analysis, and the capacity to trace discourse across policy shifts over time.

Spontaneous, Authentic Expressions of Student Frustration and Support

A significant advantage of using Twitter as a data source is the spontaneity and authenticity of user-generated content. For example, during exam periods or policy announcements, hashtags such as #ODLstress, #kelasUiTM, and #MOHE often trended, reflecting heightened emotional responses and collective sentiment. Tweets about ODL often capture emotionally charged responses in real time, ranging from expressions of frustration and anxiety to moments of gratitude and humour. These affective reactions are typically absent from formal institutional feedback mechanisms, where students may be constrained by power dynamics or professional expectations (Page, 2022). On Twitter, however, students often speak candidly, sometimes even defiantly, about their learning conditions, workloads, and perceptions of support from lecturers and university administration.

The immediacy of these expressions enhances the ecological validity of the data, allowing researchers to observe how students react to particular events or decisions as they unfold. For example, the announcement of campus closures or exam format changes was frequently accompanied by a surge in student tweets expressing uncertainty or distress. In this way, Twitter serves as a kind of emotional barometer, offering insights into how educational reforms are perceived at the grassroots level (Veletsianos & Shepherd, 2021).

Moreover, these authentic expressions often reflect intersectional challenges—such as digital poverty, mental health strain, and familial responsibilities—that might otherwise remain invisible in aggregated survey data. This adds a critical dimension to the research by foregrounding students not merely as passive recipients of policy but as active narrators of their lived experiences.

Social Media as a Window into Policy Reception and Lived Experiences

Twitter also functions as a public site for policy reception, offering a discursive space where students articulate their interpretations, misinterpretations, and contestations of MOHE decisions and university guidelines. The publicness of Twitter enables the tracking of collective sentiment around specific announcements, such as the postponement of physical classes or changes to assessment formats.

Importantly, students do not engage with policy documents per se but rather respond to how those policies are communicated and how they materialise in their daily academic routines. For instance, while a policy may promise flexibility in deadlines or learning formats, students may tweet about inconsistent implementation at the faculty level or express confusion due to vague instructions.

This renders Twitter data a valuable complement to formal policy analysis, as it illuminates how top-down directives are negotiated, reinterpreted, or resisted at the ground level (Selwyn, 2020). Moreover, it enables researchers to capture nuanced tensions between institutional intentions and student realities, thus contributing to more responsive and student-centred education policy discourses (Jandrić, 2021).

Rich Source for Discourse-Oriented Frameworks: DNVA and CADS

The discursive richness of tweets makes them particularly well-suited for methodologies such as Discursive News Value Analysis (DNVA) and Corpus-Assisted Discourse Studies (CADS). Tweets are often short, affective, and multimodal—featuring hashtags, emojis, images, and links—that reflect the construction of news values such as negativity, personalisation, timeliness, and eliteness (Bednarek & Caple, 2017). These values reveal how ODL events are made meaningful to different audiences, especially in the context of crisis communication.

DNVA, in particular, allows researchers to analyse how certain university events or policy decisions are framed as ‘newsworthy’ or emotionally resonant in the public imagination. Its emphasis on news values such as proximity, eliteness, and negativity aligns well with brief, affect-rich content typical of tweets, allowing for a meaningful capture of digital sentiment during crisis moments. For example, a tweet complaining about the unfairness of last-minute timetable changes may gain traction not because of its informational content but due to its affective appeal and relatability—a key characteristic of discursive news values.

Similarly, the application of CADS enables the systematic identification of recurrent lexical and semantic patterns across large tweet corpora, while still anchoring analysis in sociolinguistic interpretation (Baker et al., 2008). Tools such as keyword lists, collocation analysis, and concordance lines support both macro-level discourse mapping and micro-level contextual reading. This mixed-methods affordance allows researchers to combine quantitative breadth with qualitative depth, thereby enhancing analytical rigour.

Mapping Discourse Across Time and Policy Phases

A further advantage lies in the temporal granularity of Twitter data. Because tweets are timestamped and often tied to external events (e.g., press conferences, academic calendar changes), researchers can track how discourse evolves over time and in response to policy shifts. This is especially useful in crisis settings like COVID-19, where policy developments occurred rapidly and student responses were time-sensitive and emotionally volatile.

By examining tweet frequencies, thematic salience, or sentiment clusters before, during, and after major announcements, researchers can construct chronological narratives of how student sentiment aligns or diverges from institutional messaging. This diachronic approach helps identify discursive tipping points, such as moments when public frustration escalates, or when certain phrases or hashtags become viral markers of dissent.

Such temporal mapping not only contributes to more dynamic models of discourse analysis but also supports evidence-based policymaking, enabling education stakeholders to better anticipate student responses and adapt their communication strategies accordingly (Williamson et al., 2020).

RECOMMENDATIONS FOR FUTURE RESEARCH

methodological experiences and insights from this study, several recommendations can be made for researchers seeking to use Twitter data in educational discourse research. These recommendations address key areas of ethical practice, tool selection, linguistic navigation, and methodological triangulation. While the potential of Twitter as a research tool is significant, careful and reflexive design choices are necessary to ensure rigour, validity, and sensitivity to the complexities of digital discourse.

Ethical Use of Public Tweets: Anonymity, Paraphrasing, and Contextual Sensitivity

Although Twitter is a public platform, ethical concerns remain paramount when using tweet content in academic research. Public visibility does not equate to informed consent, and users may not anticipate that their posts will be scrutinised or cited in academic work (Zimmer & Proferes, 2014). Therefore, future research should adopt a context-sensitive ethical approach, guided by established principles such as those articulated by the Association of Internet Researchers (AoIR, 2019).

Researchers are encouraged to anonymise usernames, avoid verbatim quotations that are easily traceable, and paraphrase tweet content where possible—especially when the content involves personal or emotionally sensitive experiences. Additionally, tweets must always be interpreted within their broader cultural and conversational context to avoid decontextualised misrepresentation. This includes paying attention to the conversational thread, preceding tweets, or ongoing hashtag discourses.

Ethics approval should be sought where applicable, particularly in studies where tweets are analysed in combination with identifiable user metadata or demographic profiling. Researchers must also remain cautious about amplifying marginalised voices in ways that unintentionally expose them to further scrutiny or harm.

Tool Suggestions: Corpus and Discourse Analysis Software

For researchers conducting large-scale or corpus-assisted discourse analysis, software tools are essential for managing and interpreting high volumes of textual data. Two widely accessible tools—AntConc and Voyant Tools—are particularly recommended for educational discourse research.

AntConc is a lightweight, free concordance tool that allows for keyword extraction, collocation analysis, and keyword-in-context (KWIC) displays, making it ideal for identifying recurring linguistic patterns, particularly in thematic or affective discourse (Anthony, 2022). It also supports multilingual data and customisable stop word lists, which are valuable when working with hybrid Malay-English tweets.

Voyant Tools, a web-based platform, offers more visually intuitive features such as word clouds, trend graphs, and co-occurrence maps. While less powerful for granular corpus work, it is highly useful for exploratory data analysis and for communicating findings to non-specialist audiences.

For more advanced researchers, tools like Sketch Engine, #LancsBox, or even custom Python scripts using NLTK or spaCy can enhance the sophistication of the analysis. However, researchers should prioritise tools that align with their technical proficiency and analytical goals rather than defaulting to complexity.

Navigating Multilingual and Colloquial Data

In multilingual societies like Malaysia, Twitter discourse often blends languages, dialects, and informal speech forms. Researchers must adopt linguistic flexibility and cultural awareness when analysing such content. Common challenges include code-switching, non-standard spelling, abbreviations, and the use of cultural references or slang.

To address this, future researchers should consider:

Creating a custom lexicon or glossary of common abbreviations and slang terms based on preliminary scans of the data.
Conducting manual validation of keyword searches to ensure semantic relevance across languages.
Applying collocation analysis to detect meaningful combinations of words that may transcend literal translations.
Where feasible, involving bilingual researchers or native speakers in the cleaning, coding, and interpretation phases to enhance cultural validity (Barton & Lee, 2013).
Given the limitations of automated tools for handling code-mixed data, human interpretation remains essential to capture nuances in meaning, tone, and intent.

Combining Social Media Analysis with Other Methods

While Twitter data offers valuable insights, its methodological limitations—especially regarding representativeness and depth—can be mitigated through triangulation with other qualitative or mixed methods. Combining Twitter discourse analysis with interviews, focus groups, or online surveys allows researchers to gain a more holistic understanding of student experiences.

For instance, interviews can contextualise tweet sentiments by probing students’ rationales, motivations, or underlying struggles that are not fully articulated in 280 characters. Focus groups may surface shared discourses or clarify how digital expressions align with offline realities. Surveys can help correlate demographic variables with discourse patterns, contributing to a more comprehensive analytical framework.

Integrating social media analysis with other data sources also enables researchers to test or refine theoretical constructs—such as student engagement, digital resilience, or institutional trust—in ways that are both grounded and empirically robust (Couldry & Hepp, 2017).

CONCLUSION

This paper has offered a reflexive account of the methodological processes, challenges, and possibilities involved in using Twitter data to examine student discourses surrounding ODL in Malaysian higher education during the COVID-19 pandemic and endemic phases. By drawing on data collected from a large corpus of tweets, the study has underscored the analytical richness and ethical complexity of social media as a site for educational research.

Several key lessons emerged from this methodological undertaking. First, the linguistic informality and hybridity of Twitter discourse—characterised by code-switching, non-standard orthography, and cultural references—require researchers to adopt flexible, context-sensitive interpretive strategies. Second, the volume and variability of data necessitate careful filtering, cleaning, and sampling procedures to ensure thematic relevance and analytical coherence. Third, while Twitter data may lack demographic representativeness, it offers a unique window into the affective and spontaneous dimensions of students’ lived experiences—dimensions often overlooked in traditional research instruments. Finally, ethical engagement with public data remains essential, and future researchers must balance accessibility with responsibility, especially when dealing with emotionally charged or personally revealing content.

The study also affirms that Twitter holds significant potential for post-pandemic educational research, particularly in contexts marked by rapid policy change, institutional uncertainty, or digital inequity. As education systems around the world continue to experiment with hybrid, flexible, and digitally mediated learning models, student discourse on social media platforms provides a valuable indicator of ground-level reception, resistance, and resilience. Twitter’s real-time nature, public accessibility, and vernacular expression make it not only a rich source of data but also a critical site of meaning-making where learners articulate their identities, struggles, and agency in the face of systemic change.

Looking ahead, integrating Twitter-based analysis with other methodological approaches—such as interviews, focus groups, or critical policy analysis—can yield more nuanced understandings of educational experiences in complex digital environments. Moreover, the proposed methodological framework can be adapted for research in other crisis-driven educational settings, including responses to natural disasters, mental health incidents, or national examination disruptions. In doing so, researchers can move beyond deficit framings of students as passive victims of online learning, and instead amplify their voices as active participants in shaping the evolving discourse on education in the digital age.

Ultimately, this paper encourages scholars in education, discourse studies, and digital sociology to continue exploring the intersection of language, policy, and lived experience through the lens of social media. In times of uncertainty and transition, these vernacular narratives offer not only empirical insights but also ethical reminders of the human dimensions at the heart of educational transformation.

Ethical Approval

This study was conducted in accordance with the ethical standards of the institutional research committee. The authors declare that there are no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ACKNOWLEDGEMENT

The authors would like to express their sincere gratitude to University Technology MARA for the support and resources provided throughout this research. Appreciation is also extended to all participants who generously contributed their time and insights.

REFERENCES

Adnan, M., & Anwar, K. (2020). Online learning amid the COVID-19 pandemic: Students’ perspectives. Journal of Pedagogical Sociology and Psychology, 2(1), 45–51.
Anthony, L. (2022). AntConc (Version 4.0.0) [Computer software]. Waseda University.
AoIR (Association of Internet Researchers). (2019). Internet research: Ethical guidelines 3.0. https://aoir.org/reports/ethics3.pdf
Ariffin, K., & Yaacob, N. A. (2022). Students’ emotional expressions in Twitter during remote learning in Malaysia. Malaysian Journal of Learning and Instruction, 19(1), 1–20.
Ariffin, M. A. M., Ahmad, M. S., & Ahmad, F. (2020). Digital inequality and the Malaysian online education experience during COVID-19. Journal Komunikasi: Malaysian Journal of Communication, 36(4), 1–18.
Baker, P., Gabrielatos, C., KhosraviNik, M., Krzyżanowski, M., McEnery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics. Discourse & Society, 19(3), 273–306.
Baker, P., & McEnery, T. (2015). Corpora and discourse studies: Integrating discourse and corpora. Palgrave Macmillan.
Barton, D., & Lee, C. (2013). Language online: Investigating digital texts and practices. Routledge.
Bednarek, M., & Caple, H. (2017). The discourse of news values: How news organizations create newsworthiness. Oxford University Press.
Boyd, D., Golder, S., & Lotan, G. (2010). Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter. In Proceedings of the 43rd Hawaii International Conference on System Sciences (pp. 1–10).
Couldry, N., & Hepp, A. (2017). The mediated construction of reality. Polity Press.
Driscoll, K., & Walker, S. (2014). Working within a black box: Transparency in the collection and production of big Twitter data. International Journal of Communication, 8, 1745–1764.
Hashim, H., Hussin, S., & Ishak, M. S. A. (2021). Challenges in implementing online learning during the Movement Control Order (MCO): Malaysian students’ perspective. International Journal of Academic Research in Business and Social Sciences, 11(3), 1038–1046.
Jandrić, P. (2021). Postdigital research in the time of COVID-19. Postdigital Science and Education, 3, 233–238.
Lee, C. (2021). Doing critical discourse studies with social media data. In J. Flowerdew & J. E. Richardson (Eds.), The Routledge handbook of critical discourse studies (pp. 350–364). Routledge.
Lee, C., & Barton, D. (2020). Constructing hybrid identities through code-switching on Malaysian social media. Discourse, Context & Media, 35, 100390.
MOHE (Ministry of Higher Education Malaysia). (2020). Guidelines on teaching and learning during the COVID-19 pandemic. Putrajaya: MOHE.
MOHE.(2022). Pelan Strategik e-Pembelajaran Kebangsaan (PSepK) 2021–2025. Putrajaya: MOHE.
Mohd Yusof, A., Shahrill, M., & Awang, H. (2021). The transition to online learning: Challenges in higher education institutions. Education and Information Technologies, 26, 6933–6953.
Page, R. (2022). Narratives online: Shared stories and social media. Cambridge University Press.
Papacharissi, Z. (2015). Affective publics: Sentiment, technology, and politics. Oxford University Press.
Rajab, M. H., Gazal, A. M., & Alkattan, K. (2020). Challenges to online medical education during the COVID-19 PLOS ONE, 15(11), e0242913.
Samsudin, S., Yusof, N., & Ahmad, M. (2021). Open and distance learning during COVID-19: A case study of Malaysian public universities. International Journal of Learning, Teaching and Educational Research, 20(5), 146–160.
Seargeant, P., & Tagg, C. (2014). The language of social media: Identity and community on the internet. Palgrave Macmillan.
Selwyn, N. (2020). Digital education after COVID-19: Understanding the short-term and long-term implications. Postdigital Science and Education, 2(3), 695–699.
Selwyn, N., & Jandrić, P. (2020). Postdigital living in the age of COVID-19: Unsettling what we see as possible. Postdigital Science and Education, 2(3), 989–1005.
Sloan, L., Morgan, J., Burnap, P., & Williams, M. (2018). Who tweets? Deriving the demographic characteristics of age, occupation and social class from Twitter user meta-data. PLOS ONE, 13(12), e0206225.
Tagg, C. (2015). Exploring digital communication: Language in action. Routledge.
Tagg, C., & Seargeant, P. (2014). Social media and the future of English language teaching. British Council.
Townsend, L., & Wallace, C. (2016). Social media research: A guide to ethics. University of Aberdeen and the Economic and Social Research Council.
Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity and other methodological pitfalls. In Proceedings of the Eighth International AAAI Conference on Weblogs and social media (pp. 505–514).
Veletsianos, G., & Shepherd, T. (2021). Affective experiences of learning, teaching, and surviving during the pandemic. Online Learning, 25(1), 3–9.
Williamson, B., Eynon, R., & Potter, J. (2020). Pandemic politics, pedagogies and practices: Digital technologies and distance education during the coronavirus emergency. Learning, Media and Technology, 45(2), 107–114.
Yusof, N. M., Ismail, K., & Razak, R. A. (2020). Digital readiness and challenges among Malaysian students in the transition to remote learning. Journal of Education and e-Learning Research, 7(4), 388–393.
Zappavigna, M. (2012). Discourse of Twitter and social media: How we use language to create affiliation on the Continuum.
Zayani, M. (2015). Networked publics and digital contention: The politics of everyday life in Tunisia. Oxford University Press.
Zimmer, M. (2010). “But the data is already public”: On the ethics of research in Facebook. Ethics and Information Technology, 12(4), 313–325.
Zimmer, M., & Proferes, N. (2014). A topology of Twitter research: Disciplines, methods, and ethics. Aslib Journal of Information Management, 66(3), 250–261.