Of the various methods, the full reference or FR method is the most widely used. It yields the most reliable results, but one constraint is the need to have a distortion-free reference file for comparison.
Full reference methods use algorithms to evaluate speech samples by simulating the process of the human ear listening to reference audio files. Next, they compare the samples to determine the audible difference. The data then undergoes a process, called the cognitive model, comparable to the way the human brain would process such data. Lastly, a picture of overall voice quality is generated.
Over the years, several models for measuring the quality of voice over IP have been developed, such as PSQM (Perceptual Speech Quality Measure), recommended by the ITU from 1996 to 2001, PAMS (Perceptual Analysis Measurement System), and PESQ (Perceptual Evaluation of Speech Quality), the currently recommended model, is an optimized combination of PAMS and PSQM.
The diagram below represents the full reference model.
At ip-label, the main method of assessment is PESQ. This method mainly combines the psychoacoustic and cognitive PSQM model with a time alignment algorithm.
The PESQ algorithm is represented in the diagram below:
The algorithm supplies a mean opinion score known as the MOS, on a scale of 1 (bad) to 5 (excellent).
The table below sets forth the scale defined by the ITU:
The PESQ method can retrieve the following secondary speech indicators as well:
These three indicators are expressed as a percentage with respect to the reference file.
The MOS can be calculated using:
The dashboard for a MOS test is shown below: