Enhancing Faculty Performance Appraisals with Sentiment Analysis and Data Visualization for Evaluating Teaching Effectiveness
- Michael E. Bensi
- Leonylyn P. Bensi
- Apple Grace G. Oliveros
- Emilsa T. Bantug
- Racquel L. Pula
- 577-589
- Nov 18, 2024
- Education
Enhancing Faculty Performance Appraisals with Sentiment Analysis and Data Visualization for Evaluating Teaching Effectiveness
Michael E. Bensi, Leonylyn P. Bensi, Apple Grace G. Oliveros, Emilsa T. Bantug, Racquel L. Pula
College of Information and Communications Technology-Nueva Ecija University of Science and Technology
DOI: https://doi.org/10.51244/IJRSI.2024.1110049
Received: 15 October 2024; Accepted: 18 October 2024; Published: 18 November 2024
ABSTRACT
An essential part of educational institutions is evaluating faculty performance, which provides insightful views on the effectiveness of instruction and the standard of education. Understanding students’ sentiments toward faculty members is essential to fostering a supportive learning environment and advancing the quality of instruction. This study examines how faculty members’ performance is evaluated utilizing word clouds, sentiment analysis, and visualization tools. The researchers employ VADER (Valence Aware Dictionary and Sentiment Reasoner), a sentiment analysis tool, to ascertain the attitudes expressed in the students’ remarks. Word clouds were created to graphically depict the words that appear most frequently in these comments. Finally, we computed the mean rating for each question and presented our findings through informative graphs. Our research aims to reduce the disparity between traditional evaluation methods and contemporary digital approaches. Using sentiment analysis’s analytical powers and visualization’s communicative potential, we hope to improve the efficacy and granularity of teacher performance evaluations. Ultimately, this will foster a culture of excellence and ongoing development in higher education.
Keywords: Faculty Performance Appraisal; Sentiment Analysis; VADER (Valence Aware Dictionary and Sentiment Reasoner); Word Cloud; Jaccard Index; Visualization
INTRODUCTION
Faculty performance appraisal plays a pivotal role in educational institutions, providing critical insights into the quality of education and the effectiveness of instruction. As education evolves, the evaluation of teachers is becoming more intricate, integrating advanced technologies and methodologies alongside traditional assessment tools. In this study, we assess faculty performance through data-driven approaches, leveraging sentiment analysis and visualization techniques such as word clouds to capture and interpret student feedback.
Creating a positive learning environment and fostering teaching excellence requires a deep understanding of students’ perceptions of their instructors. To this end, we employ sentiment analysis tools, specifically VADER (Valence Aware Dictionary and Sentiment Reasoner), to analyze the nuances of students’ comments. This approach allows us to uncover hidden sentiments within feedback, providing valuable insights that go beyond standard surveys and performance reviews.
Additionally, the use of word cloud visualization condenses large volumes of textual data into clear visual representations. Word clouds highlight recurring themes in students’ feedback, offering a snapshot of key concerns and priorities. By combining sentiment analysis with visualization techniques, this study provides a comprehensive perspective on faculty performance evaluations in contemporary educational settings.
PROBLEM STATEMENT
While useful, the traditional methods of faculty performance appraisal often lack the depth and granularity needed to capture the complexity of student feedback fully. In particular, these methods may overlook the subtle sentiments and recurring themes that provide deeper insights into teaching quality. With the growing availability of data and the advancement of analytical tools, there is a need to modernize faculty performance evaluations to make them more effective, precise, and insightful. This study addresses the gap by integrating sentiment analysis and visualization techniques into the appraisal process, aiming to enhance faculty evaluations’ depth and communicative power.
Objectives of the Study
This study aims to enhance the effectiveness of faculty performance appraisals by integrating modern digital methodologies with traditional evaluation techniques. Specifically, the study seeks to evaluate the use of sentiment analysis, mainly the VADER tool, in interpreting student feedback on faculty performance. Additionally, it aims to utilize word cloud visualizations to identify recurring themes in student responses, providing deeper insights into teaching quality. By comparing the insights derived from these digital methods with conventional appraisal approaches, the study proposes improvements to faculty evaluation systems, ultimately contributing to a culture of excellence and continuous improvement in higher education.
Limitations of the Study
- The analysis is limited to a specific time frame, capturing faculty performance during that period only, without reflecting on the long-term trends in performance.
- The study focuses solely on student feedback, which may not comprehensively reflect other dimensions of teaching quality, such as peer reviews or administrative assessments.
- While VADER is a powerful tool for sentiment analysis, its effectiveness might be limited by the specific context of student language, such as sarcasm or nuanced feedback that may not be captured fully.
- The findings may be specific to the institution or the sample surveyed and might not be directly applicable to other educational settings or disciplines without adjustment.
MATERIALS AND METHODS
Survey Instrument
The DC-SUCHI/CIRPS PIA FORM 2 is a Performance Appraisal Instrument designed to evaluate the performance of professors or instructors based on various criteria.
Below is a breakdown of its components:
Form Title: DC-SUCHI/CIRPS PIA FORM 2 (For Students) – This title identifies the specific evaluation form and its intended users.
Performance Appraisal Instrument: This section serves as the main evaluation tool. It includes:
Ratee Information: The name of the professor or instructor being evaluated, along with the subject taught and the rating period.
Rating Scale: A scale from 1 to 5 is provided for each item, where:
5 – Outstanding
4 – Very Satisfactory
3 – Satisfactory
2 – Needs Improvement
1 – Very Poor
Evaluation Items: There are 20 evaluation items listed, which cover various aspects of the professor or instructor’s performance, including course organization, teaching effectiveness, communication, interaction with students, and professionalism.
Remarks Section: A space provided for additional student comments or remarks regarding the professor or instructor’s performance. This section allows students to provide specific feedback beyond the predefined rating scale.
This tool’s mission is to collect fair, unbiased, and objective student assessments of their educational experience and the efficiency with which the professor or teacher handled the course material. This input might be helpful for the university in evaluating the caliber of its teaching personnel and the professor or teacher in enhancing their instructional strategies.
Sampling Technique
In this study, random sampling was used, and students enrolled in courses handled by the professors or instructors under evaluation were chosen randomly from a predetermined section. Ensuring that every student has an equal chance of being chosen helps minimize sampling bias.
Participants
The study involved 865 evaluations from first- to fourth-year students for 53 faculty members teaching major and general education courses in the Bachelor of Science in Information Technology program. It was conducted during the second semester of the academic year 2023-2024 at Nueva Ecija University of Science and Technology – Talavera Off Campus.
Participation Criteria
The distribution of participation requirements for the student performance appraisal survey can aid in guaranteeing that the input obtained is pertinent, significant, and inclusive of the student population.
The following are the participation criteria:
- Enrollment in Course
- Active Status
- Non-Bias
- Confidentiality
- Voluntary Participation
By establishing clear participation criteria, researchers can help ensure the integrity and validity of the data collected through the performance appraisal survey while respecting student participants’ rights and privacy.
Data Collection
The researchers employ paper-based surveys to gather information for the performance evaluation tool given to students.
During class, the researchers provided the students with printed copies of the survey form (DC-SUCHI/CIRPS PIA FORM 2). The questionnaires can be manually completed by the students and sent to the researchers for privacy.
Data Preprocessing
Data preparation, which includes encoding, organizing, cleaning, and transforming raw data into an appropriate format, is an essential stage in the data analysis pipeline.
The following are the data preprocessing procedures used in the study concerning the student performance assessment survey:
Data Cleaning
- Identify and remove duplicate survey replies: Ensure each student’s input is only tallied once by looking for and removing any duplicate responses.
- To maintain the integrity of the dataset, handle missing values by identifying and imputing missing values or deleting incomplete entries.
Data Transformation
- Standardize data formats: To make analysis and interpretation easier, the researcher ensures that data fields (such as dates and text responses) have the same format.
- Encode category variables: use one-hot or label encoding methods to translate categorical variables such as academic major numerical representations.
Feature Engineering
- Generate new features: Utilize pre-existing features to generate new ones that could offer more information about faculty performance (e.g., compute total scores and calculate average ratings across several categories).
Text Data Processing
- Tokenization is the process of separating text answers into discrete words or tokens for additional study.
- Eliminate common stop words from text responses (such as “the,” “and,” and “is”) to concentrate on important content.
- Lemmatization or stemming: To minimize variability and increase the accuracy of text analysis, normalize words to their base forms (lemmas) or stems.
Researchers may ensure that the data utilized for analysis are correct, dependable, and suitable for revealing information on faculty performance and guiding academic institution decision-making processes by carrying out these preprocessing procedures.
Data Analysis
After collecting and preprocessing survey responses, we apply statistical tools to compute the quantitative feedback. We use sentiment analysis using VADER and Word Cloud to extract respondents’ sentiments in their qualitative feedback. The Jaccard Index was also utilized to determine how similar two sets of word clouds were. Additionally, we use data visualization tools to show sentiment analysis findings and quantitative data in an understandable and accessible way. We can see patterns, trends, and places where the faculty performance appraisal procedure needs to be improved.
Python is the primary programming language used in this study’s data processing and visualization processes. The NumPy, matplotlib, pandas, and ipywidgets SentimentIntensityAnalyzer are just a few of the libraries and tools available in Python that enable comprehensive data processing and visualization in this study.
Sentiment Analysis
The sentiment analysis tool VADER (Valence Aware Dictionary and Sentiment Reasoner) evaluates the degree of positivity or negativity exhibited by words within a particular text. VADER offers comprehensive data on positive, negative, and neutral sentiments and sentiment scores.
Table 1 provides an example of a student’s comment about a specific faculty along with the sentiment score and polarity of the comment.
Table 1. Example of remarks, the student’s verbatim sentiment score, and polarity
Table 2. The Average Sentiment Label
Subsequently, we computed the mean sentiment labels, yielding the subsequent results displayed in Table 2. This suggests that the faculty receives 78.57% of positive feedback, 14.29% of neutral feedback, and 7.14% of negative feedback.
Word Clouds Visualizations
We create a word cloud to find out which terms appear most frequently in student’s comments. These visual aids highlight essential phrases related to teacher performance.
Figure 1. Example of Word Cloud
The magnitude of each word in the cloud in Figure 1 generally reflects how frequently that term appears in the text under analysis. Smaller words are less common, while more prominent words are more common. This suggests that this faculty member is friendly, reasonable, and approachable.
The color also indicates the groups of related words. Closely spaced words in the word cloud may have conceptual or thematic connections.
Statistical Visualization
A line graph is made to get a clearer idea of how the students evaluated the faculty based on the assessment questions listed in Table 3. The average of each of the 20 question items is shown on the line graph in Figure 2.
Table 3. Example Survey Questions and Faculty Mean Rating
Figure 2. Line Graph of Mean Ratings for Items
In addition, table 3 displays the verbal description that corresponds to the computed average mean of the 20 components. In this instance, the faculty received a verbally described as “very satisfactory” average score of 3.88.
RESULTS AND DISCUSSIONS
There were 865 student responses to the survey, which included 53 faculty members who were categorized as faculty teaching major courses or non-major courses (general education courses). Thirty faculty members teach general education courses, and twenty-three teach major courses as displayed in figure 3.
Figure 3. Number of Faculty under survey
Sentiment Analysis Result
Table 4. shows the average sentiment labels for faculty remarks categorized into two groups: Gen-ed and Major. Each group has been further analyzed to determine the proportion of remarks categorized as Positive, Neutral, and Negative sentiments.
Table 4. Sentiment Analysis
Faculty | Positive | Neutral | Negative |
Gen-ed | 0.75 | 0.20 | 0.05 |
Major | 0.73 | 0.23 | 0.04 |
Sentiment analysis in the Gen-ed faculty shows a largely optimistic view, with 75% of the comments categorized as Positive sentiment. 20% of the sample falls into the category of neutral sentiment, indicating a moderate level of neutrality in the expressed opinions. On the other hand, 5% of the remarks are classified as negative, suggesting that negative expressions are not common in general education faculty opinions.
Regarding the Faculty handling major courses, there is a similar trend toward positive emotion, as 73% of the responses fall into this area. Additionally, this has a higher percentage of neutral sentiments (23% of the responses), indicating a more evenly distributed range of neutral and positive expressions. At just 4% of the total, the category of negative sentiment in major remarks is the least common, suggesting that professors in this academic discipline generally have favorable sentiments.
According to the sentiment analysis results for both the Gen-ed and Major categories, faculty members’ overall sentiment is primarily positive. Significantly more comments in both categories are categorized as Positive sentiment, suggesting that interactions or evaluations from the faculty generally elicit feelings of satisfaction, gratitude, or positivity, as shown in Figure 4.
Figure 4. Average Sentiment Label
Word Cloud Analysis Result
The generated word cloud visually represents the most frequently occurring words in the remarks for General Education and faculty handling Major courses. Figure 5 displays each word’s size in the word cloud, indicating its frequency in the remarks.
Figure 5. Word Cloud of Faculty Handling Gen-Ed and Major Courses
Figure 5 indicates that the terms “teaching,” “good,” and “teacher” are highlighted in both word clouds, implying the noticeable features of the faculty members surveyed.
To evaluate the word distributions of the two sets of text data in this word cloud for additional similarity or dissimilarity. The statistical comparison of these word clouds is tested using the Jaccard Index.
We compute the Jaccard Index using the formula:
where:
A – represents the set of words in the first word cloud,
B – represents the set of words in the second word cloud,
∣A∩B∣ denotes the size of the intersection of sets A and B,
∣A∪B∣ denotes the size of the union of sets A and B.
The Jaccard Index is a metric that quantifies the similarity between two sets by measuring the intersection divided by the union of the sets. Understanding how closely two groups of words overlap or are similar can be gained by interpreting the Jaccard Index, as shown by the comparison in figure 5. The two-word clouds have a moderate degree of resemblance, with a Jaccard Index of about 0.35.
This implies the coexistence of words specific to each cloud and words shared, signifying common themes or topics. These variations highlight the diversity of viewpoints and information found in the dataset, highlighting different facets or themes that are depicted in the word clouds.
Faculty Performance Rating
Figure 6 displays the distribution of faculty performance ratings, which indicate a largely positive assessment of the faculty members. 32 received a “Very Satisfactory” grade, indicating consistent and commendable performance. This shows that the institution as a whole has high standards for both instruction and service delivery. Furthermore, a noteworthy percentage (13) of the faculty have received ratings of “Outstanding,” indicating the faculty’s commitment to excellence and their extraordinary achievements.
However, there are opportunities for improvement as evidenced by the five faculty members who were assessed as “satisfactory” and the two who were found to need improvement. These evaluations underscore the significance of providing focused assistance and materials to close performance discrepancies and guarantee that all instructors fulfill or surpass the organization’s expectations.
Figure 6. Over all Performance Rating of Faculty
Numerical Ratings and Sentiment Score
The scatter plot in Figure 7 depicts the correlation between sentiment scores and mean ratings obtained from a dataset containing a range of assessments or comments. Mean ratings are a quantitative indicator of overall performance or satisfaction compiled from individual evaluations. Conversely, sentiment ratings provide a qualitative evaluation of the feelings stated about the evaluation subject, with the possible outcomes being positive, neutral, or negative. With the mean rating on the x-axis and the sentiment score on the y-axis, each data point on the plot represents a distinct evaluation entry.
To differentiate between positive (blue), neutral (green), and negative (red) sentiment scores, data points are color-coded in the display. The spotting of sentiment trends across various mean rating levels is made easier by this color distinction. The scatter plot makes it possible to see trends or connections between sentiment distributions and mean ratings.
Figure 7. Scatter Plot of the Performance Rating and Sentiment Score
The correlation matrix heatmap in figure 8 provides insight into the relationships between different variables in the dataset. In this heatmap, each cell represents the correlation coefficient between two variables, ranging from -1 to 1. A correlation coefficient close to 1 indicates a strong positive correlation, while a coefficient close to -1 indicates a strong negative correlation. A coefficient close to 0 suggests no linear correlation between the variables.
In this context, the heatmap displays the correlation coefficients between the variables ‘Positive’, ‘Neutral’, ‘Negative’ sentiment scores, and ‘Mean Rating’. The heatmap allows us to visualize the strength and direction of the relationships between these variables. For instance, we observe that ‘Positive’ sentiment has a moderately positive correlation with ‘Mean Rating’, which implies that higher positive sentiment scores tend to correlate with higher mean ratings. Similarly, ‘Neutral’ and ‘Negative’ sentiment scores show their respective correlations with ‘Mean Rating’, providing insights into the interplay between customer sentiment and overall ratings.
Figure 8. Correlation Matrix of the Performance Rating and Sentiment Score
CONCLUSIONS
In conclusion, the thorough examination of the survey data provides insightful information about how students feel about and perceive the professors teaching major and general education (Gen-ed) courses. The sentiment analysis indicated primarily positive attitudes in both categories, with most remarks falling into this category. This suggests that students are highly satisfied with and appreciative of the faculty members’ performance and quality of instruction. Furthermore, the word cloud analysis revealed recurring themes that suggested consistent positive impressions across several courses, such as teaching excellence and positive faculty traits.
The distribution of faculty performance ratings further highlighted the positive evaluation that faculty members are generally given, with a significant percentage receiving ratings of “Very Satisfactory” or “Outstanding.” On the other hand, the fact that some faculty members identified areas they needed to improve highlights the significance of providing focused support and resources to address performance disparities and guarantee that all teachers meet or surpass institutional expectations.
Moreover, insights into the connection between student views and overall performance ratings were obtained via the correlation study between sentiment scores and mean ratings. The scatter plot and correlation matrix heatmap demonstrated moderate correlations between mean ratings and positive sentiment scores, underscoring the influence of pleasant interactions and experiences on overall student satisfaction and faculty member appraisal.
Overall, the results highlight how critical it is to provide a supportive atmosphere for learning to increase student satisfaction, faculty members’ ongoing professional development, and academic achievement. Academic institutions can improve teaching quality, student engagement, and overall learning outcomes by implementing targeted tactics based on insights from various data sources such as sentiment analysis, word cloud analysis, performance ratings, and correlation analysis.
RECOMMENDATIONS
Based on the findings of this study, several recommendations can be made to improve faculty performance appraisal systems. First, institutions should consider adopting modern digital tools, such as sentiment analysis and word cloud visualizations, to supplement traditional evaluation methods. These tools provide deeper insights into student feedback, allowing for more nuanced and data-driven assessments of teaching performance. Additionally, the integration of regular feedback loops, wherein students’ sentiments and concerns are analyzed in real time, could foster continuous improvement and enhance teaching quality. To address performance discrepancies identified in the evaluation process, institutions should implement targeted support programs for faculty members needing improvement, including professional development and mentorship. Finally, educational institutions are encouraged to explore the use of data visualization and analytical techniques to make performance evaluations more accessible, actionable, and transparent to both faculty and administrative staff.
Ethical Considerations
This study involves the analysis of student feedback data to evaluate faculty performance. Ethical approval was obtained from the relevant institutional review board before the commencement of the study to ensure that the collection and analysis of data were conducted by ethical guidelines. All data used were anonymized to protect the identity of the students and faculty members involved, and participation in the evaluation process was voluntary.
Students were informed that their feedback might be used for research purposes, and their consent was implied through their participation in the evaluation. The research team ensured that no identifying information was collected or disclosed.
The authors declare no conflicts of interest. The research was conducted objectively, without bias or external influence from any parties involved in the faculty evaluation process.
The study adhered to strict data protection protocols to ensure the confidentiality of all participants. Data used for analysis were anonymized, and no personal information of students or faculty was accessible to the researchers.
REFERENCES
- Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges, and trends. Knowledge-Based Systems, 107, 134. https://doi.org/10.1016/j.knosys.2021.107134
- Borse, P., Chinchpure, A., Singh, R., & Shinde, S. (2018). Comprehensive faculty appraisal and development system using data analytics and data visualization. In Proceedings of the 2018 3rd International Conference on Communication and Electronics Systems (ICCES) (pp. 1-6). https://doi.org/10.1109/ICCUBEA.2018.8697379
- Chadha, R., & Chaudhary, A. (2023). A study analyzing an innovative approach to sentiment analysis with VADER. Journal of Engineering Design and Analysis, 6, 23-27.
- Chong, C., Sheikh, U. U., Samah, N. A., & Ahmad Zuri Sha’ameri. (2020). Analysis on reflective writing using natural language processing and sentiment analysis. IOP Conference Series: Materials Science and Engineering, 884(1), 1–8. https://doi.org/10.1088/1757-899X/884/1/012069
- Darwesh, S. (2016). The emergence of student evaluation in higher education. In Student evaluation in higher education (pp. 1-16). Springer, Cham. https://doi.org/10.1007/978-3-319-41893-3_1
- Hutto, C. J., & Gilbert, E. (2015). VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM).
- Kastrati, Z., Dalipi, F., Imran, A. S., Pireva Nuci, K., & Wani, M. A. (2021). Sentiment analysis of students’ feedback with NLP and deep learning: A systematic mapping study. Applied Sciences, 11(9), 3986. https://doi.org/10.3390/app11093986
- Liu, B. (2020). Sentiment analysis: Mining opinions, sentiments, and emotions (2nd ed.). Cambridge University Press. https://doi.org/10.1162/COLI_r_00259
- Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94–106. https://doi.org/10.1016/j.stueduc.2016.12.004
- Pacol, C. A., & Palaoag, T. D. (2021). Enhancing sentiment analysis of textual feedback in the student-faculty evaluation using machine learning techniques. European Journal of Engineering Science and Technology, 4(1), 27-34. https://doi.org/10.33422/ejest.v4i1.604
- Ramírez-Tinoco, F. J., Alor-Hernández, G., Sánchez-Cervantes, J. L., Olivares-Zepahua, B. A., & Rodríguez-Mazahua, L. (2018). A brief review on the use of sentiment analysis approaches in social networks. In CIMPS 2017 (AISC, vol. 688, pp. 263–273). Springer, Cham. https://doi.org/10.1007/978-3-319-69341-5_24
- Ren, P., Yang, L., & Luo, F. (2023). Automatic scoring of student feedback for teaching evaluation based on aspect-level sentiment analysis. Education and Information Technologies, 28, 797–814. https://doi.org/10.1007/s10639-022-11151-z
- Rajput, Q., Haider, S., & Ghani, S. (2016). Lexicon-based sentiment analysis of teachers’ evaluation. Applied Computational Intelligence and Soft Computing, 2016, 1–12. https://doi.org/10.1155/2016/2385429
- Sarkar, D. (2019). Sentiment analysis. In Text analytics with Python (pp. 165–182). Apress. https://doi.org/10.1007/978-1-4842-4354-1_9
- Shrivastava, S. (2015). Academic appraisal program. https://doi.org/10.13140/RG.2.1.3394.4806
- Tang, F., Fu, L., Yao, B., & Xu, W. (2019). Aspect-based fine-grained sentiment analysis for online reviews. Information Sciences, 488, 190–204. https://doi.org/10.1016/j.ins.2019.02.022
- Wongsurawat, W. (2011). What’s a comment worth? How to better understand student evaluations of teaching. Quality Assurance in Education, 19(1), 67–83. https://doi.org/10.1108/09684881111107762
- Zhang, X., Wang, Y., & Li, Z. (2018). Analysis on the emotional tendency of student feedback. Information Sciences, 469, 79–91. https://doi.org/10.1016/j.ins.2018.06.016
- Zhan, Y., & Yang, Q. (2020). A survey of sentiment analysis and its applications in social media. International Journal of Data Mining and Bioinformatics, 28(1), 73-93. https://doi.org/10.1504/IJDMB.2020.104127
- Zheleznyak, E. (2018). Methods for sentiment analysis of students’ feedback. Journal of Physics: Conference Series, 1115, 012007. https://doi.org/10.1088/1742-6596/1115/1/012007