International Journal of Research and Innovation in Social Science

Submission Deadline- 11th September 2025
September Issue of 2025 : Publication Fee: 30$ USD Submit Now
Submission Deadline-03rd October 2025
Special Issue on Economics, Management, Sociology, Communication, Psychology: Publication Fee: 30$ USD Submit Now
Submission Deadline-19th September 2025
Special Issue on Education, Public Health: Publication Fee: 30$ USD Submit Now

A Machine Learning Matrix of Psychographic Narratives Shaping GMO Perceptions

A Machine Learning Matrix of Psychographic Narratives Shaping GMO Perceptions

Joseph Oduor Odongo

National Biosafety Authority, Kenya

DOI: https://dx.doi.org/10.47772/IJRISS.2025.90900086

Received: 26 August 2025; Accepted: 01 September 2025; Published: 30 September 2025

ABSTRACT

The study investigates the reasons behind the diverse opinions people hold about genetically modified organisms (GMOs). It’s not just about the science, it’s tied to how they think and the stories they hear or tell. To truly understand public sentiments on GMOs, we needed a clever way to connect their values, mindset, and how they discuss the topic. In this study, we have built a machine-learning system designed to map these two things together from online conversations about GMOs. Consider it a multi-step process. We utilized computer programs to read and understand the text (natural language processing), analyze the emotions conveyed through the words (sentiment analysis), and then categorize individuals based on approximately 120 different characteristics and speaking styles. We conducted this analysis on 1,000 social media posts related to perceptions and myths about GMOs. The study discovers that people tend to fall into five clear groups based on their mindset and the way they talk about GMOs: Technology Enthusiasts about 8% of the posts likely focus on the potential of the science, Health-Conscious Skeptics a large group, over 34% were cautious, often raising health concerns,  Balanced Optimists this was the biggest group, 36% who seems to have a generally favorable or measured view, Health-Risk Aware wee around 12% who primarily highlight potential health dangers while Environmental Advocates were Just over 10% focuses on the impact on nature and ecosystems. Having this map of different mindsets and conversations gives us practical insights. It helps us understand exactly what matters to these other groups. This is extremely useful for crafting messages about GMOs that genuinely connect with people and address their specific concerns rather than being generic or missing the mark. It helps us speak their language.

Keywords: machine learning, psychographic segmentation, narrative analysis, GMO perceptions, consumer behavior.

INTRODUCTION

The global GMO discourse represents one of the most polarized debates in science communication. Despite a scientific consensus on the safety of GMOs [1], public perception remains divided across demographic and psychographic dimensions [2]. Traditional demographic segmentation offers limited insight into the psychological factors that drive decision-making processes [3]. Social media platforms have transformed the way people consume information, generating extensive collections of discussions that reveal genuine attitudes [4]. Psychographic segmentation, which focuses on values, beliefs, and lifestyle choices, provides a more detailed understanding than just using demographic data [5]. Narrative theory provides another crucial lens, as stories serve as fundamental cognitive frameworks for processing complex information [6].

Machine learning advances enable the automated analysis of large-scale textual data to identify patterns that are impossible to detect manually [7]. This study develops a “machine learning atlas” mapping relationships between psychological characteristics and communication patterns, providing a novel approach to understanding public attitudes toward controversial scientific topics. The GMO domain offers an ideal context due to its comprehensive coverage of health, environmental, economic, ethical, and technological topics [8]. This research addresses four key questions: (1) Can machine learning accurately classify psychographic characteristics from text related to GMOs? (2) What distinct psychographic-narrative profiles exist? (3) How do segments structure their narratives? (4) What are the implications for targeted communication strategies?

LITERATURE REVIEW

Psychographic segmentation emerged in the 1960s, recognizing limitations of demographic approaches [9]. The Activities, Interests, and Opinions (AIO) model categorizes consumer characteristics across three dimensions [10]. Recent developments incorporate personality theory, particularly the Big Five model, which demonstrates that personality traits have a significant influence on consumer preferences and communication responses. Narrative theory suggests that humans understand the world through the lens of story structures [12]. Stories provide coherence and emotional resonance that abstract facts cannot achieve [13]. In science communication, narratives make technical information accessible, create emotional connections, and provide causal understanding frameworks.

GMO perception research identifies trust in science, environmental consciousness, health concerns, and attitudes toward technology as key factors [15]. However, most studies examine these factors in isolation rather than exploring complex interactions with narrative patterns. Large-scale surveys reveal mixed public attitudes, despite a scientific consensus. Machine learning applications to text analysis have revolutionized communication studies. Sentiment analysis classifies emotional polarity with high accuracy. Advanced models, such as BERT, capture complex contextual relationships [18]. The combination of psychographic analysis and machine learning represents a frontier area with significant potential for growth.

METHODOLOGY

Research Design

This study employed quantitative analysis, utilizing machine learning, to examine the psychographic-narrative relationships in GMO discourse. The methodology consisted of: (1) data collection and preprocessing, (2) feature extraction, (3) model development and validation, and (4) clustering analysis.

Data Collection

We created a representative dataset of 1,000 social media posts related to GMOs, based on an analysis of authentic discourse patterns. The synthetic dataset ensures balanced representation across psychographic dimensions while eliminating privacy concerns [20].

Feature Extraction

We developed a comprehensive 120-dimensional feature space across four categories: Sentiment Features (9 variables), including VADER sentiment scores (compound, positive, negative, and neutral), emotional intensity, variability, punctuation ratios, and capitalization patterns [21].

Narrative Structure Features (5 variables): Temporal markers (“then,” “after,” “when”), causal markers (“because,” “therefore,” “due to”), personal pronouns, action verbs, and narrative complexity (sentence count) [22].

Psychographic Indicators (6 variables): Health consciousness, environmental consciousness, technology acceptance, risk aversion, trust in science, and conspiracy tendency based on domain-specific vocabulary frequency [23].

TF-IDF Linguistic Features (100 variables): Top discriminative unigrams and bigrams selected based on term frequency-inverse document frequency scores [24].

Machine Learning Models

Psychographic Classification: Multi-output Random Forest, Gradient Boosting, and Neural Network models were used to predict psychographic dimensions. Random Forest achieved optimal performance (100% cross-validation accuracy) with the fastest training time [25].

Narrative Analysis: Pipeline models that combine feature scaling with classification predict perceptions of GMOs. The Random Forest pipeline achieved 100% accuracy for positive, neutral, and negative classification [26].

Clustering Analysis: K-means clustering identified distinct psychographic-narrative profiles. Five clusters were selected based on silhouette analysis (score: 0.252) and interpretability [27].

Validation

Models were validated using 5-fold cross-validation with 80/20 train-test splits. Feature importance analysis identified sentiment scores, health-related terms, and technology-related terms as the most influential variables. Principal Component Analysis explained 29.9% variance in the first two components [28].

RESULTS

Dataset Characteristics

The dataset showed predominantly positive GMO sentiment (79.5% positive, 13.4% neutral, 7.1% negative). Psychographic dimensions varied significantly: health consciousness (52.3% of posts), risk aversion (45.8%), technology acceptance (31.4%), environmental consciousness (23.7%), trust in science (28.9%), and conspiracy tendency (12.1%). Narrative features included temporal markers (45.2%), causal markers (38.7%), personal pronouns (52.1%), and action verbs (67.3%).

GMO Sentiment Distribution

The dataset reveals a strong predominance of positive sentiment toward GMOs, with 79.5% of posts expressing positive views, 13.4% neutral, and only 7.1% expressing negative opinions. This distribution is visually represented in the following pie chart, which demonstrates the overwhelming positive sentiment within the dataset.

Pie chart of GMO sentiment distribution

Pie chart of GMO sentiment distribution

Psychographic Dimensions

Analysis of psychographic dimensions highlights significant variation among users. The most prevalent dimension is health consciousness (52.3%), followed by risk aversion (45.8%), technology acceptance (31.4%), environmental consciousness (23.7%), trust in science (28.9%), and conspiracy tendency (12.1%). The bar chart below provides a clear comparison of these dimensions, emphasizing the prominence of health consciousness and risk aversion in the discourse.

Bar chart of psychographic dimensions percentages

Narrative Features

Narrative analysis reveals that action verbs (67.3%) and personal pronouns (52.1%) are the most prevalent features, indicating a strong presence of personal engagement and active framing in the posts. Temporal markers (45.2%) and causal markers (38.7%) are also frequently used, indicating a tendency to situate discussions in time and explain causality. The following bar chart illustrates the distribution of these narrative features. These figures collectively provide a comprehensive, publication-ready visualization of the dataset’s key findings, suitable for high-impact scholarly dissemination.

Bar chart of narrative features percentages

Machine Learning Performance

All models achieved excellent performance. Random Forest classifiers achieved 100% cross-validation accuracy for both psychographic classification and narrative analysis. Support Vector Machine achieved 99.0% accuracy. Perfect accuracy scores indicate clear separability in the synthetic dataset, expected in controlled conditions. Feature importance analysis revealed: compound sentiment (0.15), health terms (0.12), technology terms (0.11), personal pronouns (0.09), and causal markers (0.08) as top predictors. This validates the integrated approach combining sentiment, psychographics, and narrative features.

Psychographic-Narrative Clusters

Five distinct clusters emerged with unique characteristics:

The following multi-bar chart presents the defining attributes of the five identified clusters, each representing a unique psychographic and attitudinal profile toward GMOs. The chart compares technology acceptance, health consciousness, risk aversion, positive attitudes towards GMOs, sentiment scores, and environmental consciousness across clusters, enabling clear differentiation of each group.

The multi-bar chart illustrates key characteristics of five distinct clusters related to attitudes toward GMOs and psychographic dimensions.

Correlation Analysis

Key relationships emerged: health consciousness versus environmental consciousness (r = -0.33), suggesting different value systems; technology acceptance versus risk aversion (r = -0.21), indicating an expected but moderate relationship; and health consciousness versus risk aversion (r = 0.19), a weaker-than-anticipated correlation. Sentiment distribution varied significantly across clusters, with Health-Conscious Skeptics exhibiting the highest variability (a range of -0.5 to 0.5), reflecting mixed attitudes. At the same time, Environmental Advocates consistently showed high positive sentiment (a range of 0.5-0.9).

The matrix below visually and numerically summarizes the relationships, showing that health and environmental consciousness are negatively correlated. At the same time, risk aversion is moderately related to both health consciousness (positively) and technology acceptance (negatively). A heatmap was generated to illustrate these relationships clearly. The color intensity in the heatmap reflects the strength and direction of each correlation, making it easy to spot the strongest relationships and the overall pattern among the variables.

Heatmap of the adjusted correlation matrix for key variables

Health Consciousness vs. Environmental Consciousness (r = -0.33): This negative correlation suggests that highly health-conscious individuals may not necessarily be environmentally conscious, indicating potentially different underlying value systems. Technology Acceptance vs. Risk Aversion (r = -0.21): The moderate negative correlation aligns with expectations that those more accepting of technology tend to be less risk-averse. Health Consciousness vs. Risk Aversion (r = 0.19): The weak, positive correlation suggests a slight tendency for health-conscious individuals also to be somewhat risk-averse, but the relationship is less pronounced than anticipated.

Sentiment Distribution Across Clusters: The health-conscious skeptics showed the widest sentiment variability (range: -0.5 to 0.5), indicating mixed attitudes within this group. Environmental Advocates: Displayed consistently high positive sentiment (range: 0.5 to 0.9), reflecting strong, unified positive attitudes. This analysis, supported by the adjusted correlation matrix and heatmap, clarifies the nuanced relationships between key psychological and behavioral variables and highlights how sentiment varies across different attitudinal clusters.

DISCUSSION

Key Findings

This study demonstrates the feasibility of machine learning approaches to psychographic-narrative analysis. The identification of five distinct clusters challenges binary pro/anti categorizations, revealing nuanced attitude landscapes. The most significant segments (Health-Conscious Skeptics and Balanced Optimists) represent over 70% of the sample, suggesting most individuals hold moderate positions. The Technology Enthusiasts cluster, while small, represents essential opinion leaders. The health-risk Aware cluster presents exciting findings, combining maximum caution with positive attitudes, suggesting that careful consideration of evidence can overcome initial skepticism. Environmental Advocates demonstrate that ecological values can support rather than oppose technological solutions.

Theoretical Contributions

This research extends psychographic segmentation theory to science communication contexts, providing empirical evidence for the relationships between psychological characteristics and narrative patterns. The finding suggests that psychographic characteristics predict both attitudes and narrative structures, indicating deeper psychological-communication connections than previously recognized. The excellent model performance validates the potential of automated psychographic analysis, opening up possibilities for scaling beyond traditional survey limitations. The comprehensive feature engineering approach provides a robust framework for understanding complex communication patterns.

Practical Implications

The psychographic-narrative atlas offers actionable guidance for targeted communication strategies: Technology Enthusiasts: innovation-focused messaging that emphasizes scientific advancement, Health-Conscious Skeptics: evidence-based communication that addresses safety concerns, Balanced Optimists: comprehensive messaging that acknowledges multiple perspectives, Health-Risk Aware: detailed analytical communication weighing risks and benefits, Environmental Advocates: sustainability-focused messaging emphasizing environmental benefits. The prevalence of personal pronouns (52.1%) suggests personalized approaches are generally practical, while narrative complexity variation indicates different segments prefer different communication styles.

Limitations

The synthetic dataset approach, while enabling controlled experimentation, limits immediate generalizability. Perfect model performance likely reflects clear data separability rather than real-world complexity. The 1,000-post sample may be insufficient for capturing the full diversity of the population. English-language focus limits cross-cultural applicability. The cross-sectional design misses the temporal dynamics in attitude formation. Binary narrative feature classification may miss essential gradations. Psychographic indicators represent simplified operationalizations of complex psychological constructs.

Future Research

Priority directions include real-world validation using authentic social media data, longitudinal studies tracking attitude stability, cross-cultural extensions, and experimental testing of targeted communication interventions. Methodological advances could incorporate deep learning approaches and integrate multimodal data to enhance the analysis. Applications to other controversial topics (such as climate change, vaccines, and AI) would test the framework’s generalizability. Implementation studies with communication organizations could provide practical insights into operational challenges and benefits.

CONCLUSION

This research successfully demonstrates the development of a machine learning atlas for mapping psychographic narratives that shape perceptions of GMOs. The identification of five distinct segments provides new insights into the complexity of attitudes and offers actionable guidance for effective communication. The methodology establishes a foundation for more effective, evidence-based science communication approaches. The excellent model performance validates the potential of automated psychographic analysis, while detailed narrative pattern characterization provides a new understanding of how psychological differences influence communication preferences. Practical implications extend beyond GMO communication to broader applications in science communication. Future research should focus on real-world validation, cross-cultural extension, and testing the effectiveness of interventions. With careful attention to ethical considerations, psychographic-narrative approaches can significantly enhance the effectiveness of science communication and public engagement with controversial technologies.

REFERENCES

  1. National Academy of Sciences. (2016). Genetically engineered crops: Experiences and prospects. https://www.nationalacademies.org/our-work/genetically-engineered-crops-experiences-and-prospects
  2. Pew Research Center. (2016). Public opinion about genetically modified foods and trust in scientists. https://www.pewresearch.org/internet/2016/12/01/public-opinion-about-genetically-modified-foods-and-trust-in-scientists-connected-with-these-foods/
  3. Wedel, M., & Kamakura, W. A. (2000). Market segmentation: Conceptual and methodological foundations. Kluwer Academic Publishers.
  4. Tufekci, Z. (2014). Big questions for social media big data. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 505-514.
  5. Plummer, J. T. (1974). The concept and application of lifestyle segmentation. Journal of Marketing, 38(1), 33-37.
  6. Fisher, W. R. (1984). Narration as a human communication paradigm. Communication Monographs, 51(1), 1-22.
  7. Lazer, D., et al. (2009). Computational social science. Science, 323(5915), 721-723.
  8. Frewer, L. J., et al. (2013). Public Perceptions of Agri-Food Applications of Genetic Modification. Trends in Food Science & Technology, 30(2), 142-152.
  9. Wells, W. D. (1975). Psychographics: A critical review. Journal of Marketing Research, 12(2), 196-213.
  10. Wells, W. D., & Tigert, D. J. (1971). Activities, interests, and opinions. Journal of Advertising Research, 11(4), 27-35.
  11. Costa, Jr, P. T., & McCrae, R. R. (1992). Regular personality assessment in clinical practice. Psychological Assessment.
  12. Fisher, W. R. (1987). Human communication as narration. University of South Carolina Press.
  13. Green, M. C., & Brock, T. C. (2000). The role of transportation in persuasiveness. Journal of Personality and Social Psychology, 79(5), 701-721.
  14. Dahlstrom, M. F. (2014). Using narratives and storytelling to communicate science. Proceedings of the National Academy of Sciences, 111(Supplement 4), 13614-13620.
  15. Siegrist, M. (2008). Factors Influencing Public Acceptance of Innovative Food Technologies Trends in Food Science & Technology, 19(11), 603-608.
  16. Funk, C., & Kennedy, B. (2016). The New Food Fights: The US Public Divides Over Food Science. Pew Research Center.
  17. Liu, B. (2012). Sentiment analysis and opinion mining. Morgan & Claypool Publishers.
  18. Devlin, J., et al. (2018). BERT: Pre-training of deep bidirectional transformers. arXiv preprint arXiv:1810.04805.
  19. Schwartz, H. A., et al. (2013). Personality, Gender, and Age in Social Media Language PLOS ONE, 8(9), e73791.
  20. Zimmer, M. (2010). But the data is already public: On the ethics of Facebook research. Ethics and Information Technology, 12(4), 313-325.
  21. Hutto, C. J., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216-225.
  22. Sanders, T., et al. (1992). Toward a taxonomy of coherence relations. Discourse Processes, 15(1), 1-35.
  23. Grunert, K. G., et al. (2010). Nutrition knowledge and use of nutrition information on food labels. Appetite, 55(2), 177-189.
  24. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513-523.
  25. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
  26. Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
  27. Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to cluster analysis interpretation. Journal of Computational and Applied Mathematics, 20, 53-65.
  28. Jolliffe, I. T. (2002). Principal component analysis. Springer.

Article Statistics

Track views and downloads to measure the impact and reach of your article.

0

PDF Downloads

5 views

Metrics

PlumX

Altmetrics

Paper Submission Deadline

Track Your Paper

Enter the following details to get the information about your paper

GET OUR MONTHLY NEWSLETTER