Abstract:
Computer-based testing of humanities students has some inconveniences and diffculties, where the whole learning process is practically based on communicative
methods. In this regard, one needs such a testing system, which would allow one
to ask open-ended questions, and students would be able to enter detailed answers.
Despite the popularity of using the shingle algorithm in determining plagiarism,
few researchers have attempted to use it in assessing the academic achievements
of students. In this regard, the aim of this study was to develop an intelligent testing system based on the shingle algorithm in assessing the academic achievements of humanities students. Taking into account that during testing humanities
students will formulate answers of their own understanding, the developed system
should be able to determine the degree of their identity to the correct answer. At
the same time, answers with a high degree of correspondence to the answer stored
in the dictionary should also be entered in the database as one of the variants of
the correct answer. The shingle algorithm, stemming, and MD5 hashing algorithms
were used to achieve this goal. The performance of the algorithm was evaluated in
terms of degree of matching (S), completeness (P), F-measure and performance
(t). The experiment involved 120 humanities students in 2–3 courses at the age of
18–20 years, including 80 girls and 40 boys. It was found that the efectiveness of
the developed algorithm is achieved at the optimal time t=77% and the degree of
compliance of the fnal grade F=77%. In this case, the fnal score of the F-measure
fully refects the result at the proportion of truthfulness equal to 0.5 and is directly
proportional to the degree of compliance (S) and completeness (P) of use. It is found
that a high value of the matching degree (S) is achieved with a smaller shingle
length, while with a larger shingle length the matching degree decreases, thus, the
probability of fnding the same phrase in two documents increases. In addition, with
smaller shingle lengths, the time spent calculating checksums is longer, and with
larger shingle lengths, the time spent calculating checksums is shorter. Calculations showed that the optimal shingle algorithm efciency was at the length of the shingle N=5 of the average data processing time. The results of this study show that
the developed algorithm can be included in pedagogical practice in order to objectively assess the learning achievements of humanities students, taking into account
their communicative and cognitive abilities. In the future, the developed algorithm
can also be used in other areas requiring text analysis, in particular for checking
plagiarism.