A comparative analysis of five textual similarity methods for automatic short answer grading

Imam Rangga Bakti; Handaru Jati; Nurkhamid Nurkhamid; Yola Permata  Bunda

doi:10.52465/joscex.v7i1.11

Authors

Imam Rangga Bakti Department of Informatics Engineering, Universitas Pasir Pengaraian, Indonesia Author
Handaru Jati Department of Electronics and Informatics Engineering Education, Universitas Negeri Yogyakarta, Indonesia Author
Nurkhamid Nurkhamid Department of Informatics Engineering Education, Universitas Negeri Yogyakarta, Indonesia Author
Yola Permata Bunda Department of Information System, Universitas Tjut Nyak Dhien, Indonesia Author

DOI:

https://doi.org/10.52465/joscex.v7i1.11

Keywords:

Text mining, ASAG, Comparison, Five methods, Textual similarity

Abstract

This study investigates the application of text mining techniques in Automatic Short Answer Grading (ASAG) by comparing five textual similarity methods: Cosine Similarity, Jaccard Similarity, Dice’s Coefficient, Overlap Coefficient, and Matching Coefficient. The dataset consists of five definition-based questions answered by 25 students in a Human–Computer Interaction course. The data were preprocessed using case folding, tokenization, stop word removal, and stemming. The results show that Cosine Similarity achieved the highest similarity score of 67.00%, followed by Overlap Coefficient (66.67%) and Dice’s Coefficient (63.16%), while Jaccard Similarity and Matching Coefficient produced lower scores of 46.15%. These findings indicate that vector-based similarity methods are more effective in handling variations in sentence structure and keyword usage compared to set-based approaches, particularly for definition-based short answers. This study provides a comparative evaluation of multiple lexical similarity methods within a unified experimental setting, offering practical insights for selecting appropriate techniques in ASAG applications.