Speech Evaluation is an AI-enabled service from Excelsoft that provides subjective and objective feedback to language learners in computer-assisted language learning.

Use cases:

  • Evaluation of Spoken Responses: To score or evaluate spoken responses in various languages.
  • Language Learning: To help users practice language learning remotely with real-time feedback.
  • Education Games: To help users improve their English pronunciation and vocabulary through interactive games and stories.
  • Corporate Training: To enhance employee training programs, especially for those who need to communicate with customers or partners in different languages.

What's under the hood?

Speech Evaluation:

The speech evaluation service uses two AI models in the background: a speech-to-text model and a pronunciation assessment model.

a)  The speech-to-text model converts the spoken audio into text and compares it with the expected transcript.

b)  The pronunciation assessment model is responsible for scoring the accuracy and fluency of the speech based on various factors such as pronunciation, rhythm, intonation, and stress. For Pronunciation assessment, there are two scenarios: Reading and Speaking.

  • Reading: This scenario is designed for scripted assessment. It requires the learner to read a given text. The reference text is provided in advance.
  • Speaking:This scenario is designed for unscripted assessment. It requires the learner to speak on a given topic. The reference text is not provided in advance.

Pronunciation Assessment Results:

Once you’ve recorded your speech or uploaded the recorded audio, the Assessment result will be output. The result includes your spoken audio and the feedback on your speech assessment. You can listen to your spoken audio and download it if necessary.


Results are presented in a clear and organized manner, showcasing the pronunciation score, Accuracy score, Fluency score and Completeness score.

  • Pronunciation score: This is the overall score that indicates the pronunciation quality of the speech. It is calculated by aggregating the accuracy, fluency, and completeness scores. The higher the score, the better the pronunciation.
  • Accuracy score: This score measures how accurately the speech matches the reference text in terms of phonetic sounds. It is calculated by comparing the speech and the reference text's phonemes and penalizing errors, such as omission, insertion, or mispronunciation. The higher the score, the more accurate the speech.
  • Fluency score: This score measures how fluently the speech flows in terms of pauses and breaks. It is calculated by analyzing the duration and frequency of silent breaks between words and comparing them to a native speaker's standard. The higher the score, the more fluent the speech.
  • Completeness score:This score measures how completely the speech covers the reference text in terms of words. It is calculated by counting the number of words pronounced in the speech and dividing it by the number of words in the reference text. The higher the score, the more complete the speech.
