Studien zur Mustererkennung , Bd. 25
There are some established objective measures which are, however, restricted to the evaluation of sustained vowels. In this thesis, the step from the automatic analysis of vowel recordings to text recordings is done. For judging speech quality objectively in a real communication situation, the analysis of entire words and sentences is necessary because the intelligibility of a substitute voice in a dialogue is a substantial criterion for evaluation. Automatic word recognition methods were applied to a standard text that was read out by the test persons. Information on the intelligibility of the individual speakers was gained by the comparison of word recognition rates with reference evaluation data from human experts.
The use of a prosody module allowed to extract not only acoustic information on the speaker's voice, but it also measured individual speaking characteristics. The inter-rater variability among humans was compared to the automatic analysis results, and the main finding was that the correlation between human and automatic ratings was as good as the agreement among the human rater group. The automatic recognition could be slightly improved on distant-talking recordings by the use of mu-law features which are modified Mel-Frequency Cepstrum Coefficients (MFCC). Artificially reverberated training data for the recognizer is another possibility to achieve better recognition rates even when the reverberation in the test data does not match the acoustic properties of the training data. This is a step towards a therapy session where the patients will not be required to wear a headset any more.