Samer Al Moubayed,
Gopal Ananthakrishnan & Laura Enflo
Center for Speech
Technology, KTH, Stockolm, Sweden
Automatic
Prominence Classification in Swedish
This
study aims at automatically classifying levels of acoustic prominence on a
dataset of 200 Swedish sentences of read speech by one male native speaker.
Each word in the sentences was categorized by four speech experts into one of
three groups depending on the level of prominence perceived. Six acoustic
features at a syllable level and seven features at a word level were used. Two
machine learning algorithms, namely Support Vector Machines (SVM) and memory
based Learning (MBL) were trained to classify the sentences into their
respective classes. The MBL gave an average word level accuracy of 69.08% and
the SVM gave an average accuracy of 65.17 % on the test set. These values were
comparable with the average accuracy of the human annotators with respect to
the average annotations. In this study, word duration was found to be the most
important feature required for classifying prominence in Swedish read speech.