Abstract:
Among the many problems in machine learning, the most critical ones involve improving
the categorical response prediction rate based on extracted features. In spite of this, it is noted that
most of the time from the entire cycle of multi-class machine modeling for sign language recognition
tasks is spent on data preparation, including collection, filtering, analysis, and visualization of data.
To find the optimal solution for the above-mentioned problem, this paper proposes a methodology
for automatically collecting the spatiotemporal features of gestures by calculating the coordinates of
the found area of the pose and hand, normalizing them, and constructing an optimal multilayer perceptron for multiclass classification. By extracting and analyzing spatiotemporal data, the proposed
method makes it possible to identify not only static features, but also the spatial (for gestures that
touch the face and head) and dynamic features of gestures, which leads to an increase in the accuracy
of gesture recognition. This classification was also carried out according to the form of the gesture
demonstration to optimally extract the characteristics of gestures (display ability of all connection
points), which also led to an increase in the accuracy of gesture recognition for certain classes to
the value of 0.96. This method was tested using the well-known Ankara University Turkish Sign
Language Dataset and the Dataset for Argentinian Sign Language to validate the experiment, which
proved effective with a recognition accuracy of 0.98.