From Inconsistent Descriptions to Verified Retrieval Through a Local Data Cleaning Framework for Higher Education Climate Action Keyword Mapping

Authors

PMPC Gunathilake

Postgraduate Institute of Science, University of Peradeniya, Peradeniya (Sri Lanka)

Tilak Hewawasam

Department of Geography, University of Peradeniya, Peradeniya (Sri Lanka)

Jagath Gunatilake

Department of Geology, University of Peradeniya, Peradeniya (Sri Lanka)

Article Information

DOI: 10.47772/IJRISS.2026.1026EDU0324

Subject Category: Geology

Volume/Issue: 10/26 | Page No: 4140-4160

Publication Timeline

Submitted: 2026-05-18

Accepted: 2026-05-23

Published: 2026-06-13

Abstract

Higher Education Institutions (HEIs) are increasingly expected to report sustainability and climate-related activities aligned with Nationally Determined Contributions (NDCs), aggregated through Locally Determined Contributions (LDCs), and recognised within national climate governance systems. This alignment is difficult when university activity descriptions use incomplete, inconsistent, or non-standard climate keywords, preventing automated detection of climate actions and leading to incorrect NDC mapping and unreliable inputs for future carbon quantification. NDC documents further compound this challenge by containing structurally similar actions across different sectors and contradictory actions arising from sectoral trade-offs, issues that often stem from limited cross-sectoral coordination during preparation. This study develops a data cleaning framework for higher education climate action keyword mapping, using Sri Lanka as a pilot study. Sri Lanka’s updated NDCs serve as the national reference framework, enabling the methodology to be tested within a context where climate reporting infrastructure and cross-sectoral coordination remain emergent. The methodology applies Natural Language Processing (NLP) techniques, Jaccard similarity, and cosine similarity to identify sectoral overlaps and action-level conflicts. The framework then cleans climate-policy and sustainability-reporting text by removing duplicates, normalising terminology, preserving metadata, and constructing a locally relevant keyword set for mitigation, adaptation, and cross-cutting climate actions derived from sources aligned with IPCC, UNFCCC, NDC, SDG, and university sustainability reporting standards. Pilot results from Sri Lanka confirm the framework’s effectiveness. The NDC diagnostic stage identified high-similarity action pairs, including forestry–coastal ecosystem restoration (0.85), industry–waste circular economy (0.82), power–industry renewable energy (0.78), and water–agriculture efficiency (0.75). The validated keyword lexicon achieved a retrieval precision of 0.83 and recall of 0.79, with mitigation keywords reaching 0.87 precision and adaptation terms scoring 0.76. The study concludes that the Sri Lanka pilot demonstrates the framework’s transferability to other nations, transforming noisy sustainability descriptions into verified, retrieval-ready evidence that strengthens HEI contributions to national climate reporting systems.

Keywords

Higher Education Climate Action; Nationally Determined Contributions; Climate Action Keyword Mapping

Downloads

References

1. Association for the Advancement of Sustainability in Higher Education. (2023). STARS technical manual version 2.2. Association for the Advancement of Sustainability in Higher Education. https://stars.aashe.org/wp-content/uploads/2019/07/STARS-2.2-Technical-Manual.pdf [Google Scholar] [Crossref]

2. Boiocchi, R., Ragazzi, M., Torretta, V., & Rada, E. C. (2023). Critical analysis of the GreenMetric World University Ranking System: The issue of comparability. Sustainability, 15(2), Article 1343. https://doi.org/10.3390/su15021343 [Google Scholar] [Crossref]

3. Ceulemans, K., Molderez, I., & Van Liedekerke, L. (2015). Sustainability reporting in higher education: A comprehensive review of the recent literature and paths for further research. Journal of Cleaner Production, 106, 127–143. https://doi.org/10.1016/j.jclepro.2014.09.052 [Google Scholar] [Crossref]

4. Findler, F., Schönherr, N., Lozano, R., Reider, D., & Martinuzzi, A. (2019). The impacts of higher education institutions on sustainable development: A review and conceptualization. International Journal of Sustainability in Higher Education, 20(1), 23–38. https://doi.org/10.1108/IJSHE-07-2017-0114 [Google Scholar] [Crossref]

5. Gunathilake, P. M. P. C., Hewawasam, T., & Gunatilake, J. (2025). Aligning and quantifying higher education institutions’ climate actions with Nationally Determined Contributions through AI-enabled data discovery and verification. International Journal of Research and Innovation in Social Science, 9(26), 9984–9997. https://doi.org/10.47772/IJRISS.2025.903SEDU0766 [Google Scholar] [Crossref]

6. Intergovernmental Panel on Climate Change. (2023). Climate change 2023: Synthesis report. IPCC. https://doi.org/10.59327/IPCC/AR6-9789291691647 [Google Scholar] [Crossref]

7. Izacard, G., & Grave, É. (2021). Leveraging passage retrieval with generative models for open domain question answering. In P. Merlo, J. Tiedemann, & R. Tsarfaty (Eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 874–880). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.eacl-main.74 [Google Scholar] [Crossref]

8. Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2), 37–50. https://doi.org/10.1111/j.1469-8137.1912.tb05611.x [Google Scholar] [Crossref]

9. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Chen, D., Dai, W., Chan, H. S., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), Article 248. https://doi.org/10.1145/3571730 [Google Scholar] [Crossref]

10. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. [Google Scholar] [Crossref]

11. Lozano, R. (2011). The state of sustainability reporting in universities. International Journal of Sustainability in Higher Education, 12(1), 67–78. https://doi.org/10.1108/14676371111098311 [Google Scholar] [Crossref]

12. Ministry of Environment, Sri Lanka. (2021). Updated Nationally Determined Contributions under the Paris Agreement on climate change: Sri Lanka 2021. Climate Change Secretariat, Ministry of Environment. https://www.climatechange.lk/CCS2021/NDC%202021%20-%20English.pdf [Google Scholar] [Crossref]

13. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0 [Google Scholar] [Crossref]

14. Tilbury, D. (2011). Education for sustainable development: An expert review of processes and learning. UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000191442 [Google Scholar] [Crossref]

15. UI GreenMetric World University Rankings. (2024). UI GreenMetric World University Rankings guideline 2024. Universitas Indonesia. https://green.rmutk.ac.th/wp-content/uploads/2024/06/UI-GreenMetric-Guideline-2024.pdf [Google Scholar] [Crossref]

16. United Nations. (2015). Transforming our world: The 2030 Agenda for Sustainable Development (A/RES/70/1). United Nations. https://digitallibrary.un.org/record/3923923 [Google Scholar] [Crossref]

17. United Nations Educational, Scientific and Cultural Organization. (2020). Education for sustainable development: A roadmap. UNESCO. https://doi.org/10.54675/YFRE1448 [Google Scholar] [Crossref]

18. United Nations Framework Convention on Climate Change. (2015). Paris Agreement. United Nations. https://unfccc.int/sites/default/files/english_paris_agreement.pdf [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles