A comparative analysis of five textual similarity methods for automatic short answer grading
DOI:
https://doi.org/10.52465/joscex.v7i1.11Keywords:
Text mining, ASAG, Comparison, Five methods, Textual similarityAbstract
This study investigates the application of text mining techniques in Automatic Short Answer Grading (ASAG) by comparing five textual similarity methods: Cosine Similarity, Jaccard Similarity, Dice’s Coefficient, Overlap Coefficient, and Matching Coefficient. The dataset consists of five definition-based questions answered by 25 students in a Human–Computer Interaction course. The data were preprocessed using case folding, tokenization, stop word removal, and stemming. The results show that Cosine Similarity achieved the highest similarity score of 67.00%, followed by Overlap Coefficient (66.67%) and Dice’s Coefficient (63.16%), while Jaccard Similarity and Matching Coefficient produced lower scores of 46.15%. These findings indicate that vector-based similarity methods are more effective in handling variations in sentence structure and keyword usage compared to set-based approaches, particularly for definition-based short answers. This study provides a comparative evaluation of multiple lexical similarity methods within a unified experimental setting, offering practical insights for selecting appropriate techniques in ASAG applications.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Soft Computing Exploration

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
