The Effect of Segment Selection on Acoustic Analysis

doi:10.1016/j.jvoice.2010.10.009

Journal of Voice

Volume 26, Issue 1, January 2012, Pages 1-7

https://doi.org/10.1016/j.jvoice.2010.10.009 Get rights and content

Summary

Objective/Hypothesis

Acoustic analysis is a commonly used method for quantitatively measuring vocal fold function. Voice signals are analyzed by selecting a waveform segment and using various algorithms to arrive at parameters such as jitter, shimmer, and signal-to-noise ratio (SNR). Accurate and reliable methods for selecting a representative vowel segment have not been established.

Study Design

Prospective repeated-measure experiment.

Methods

We applied a moving window method by isolating consecutive, overlapping segments of the raw voice signal from onset through offset. Ten normal voice signals were analyzed using acoustic measures calculated from the moving window. The location and value of minimum perturbation/maximum SNR was compared across individuals. The moving window method was compared with data from the whole vowel excluding onset and offset, the mid-vowel, and the visually selected steadiest portion of the voice signal.

Results

Results showed that the steadiest portion of the waveforms, as defined by minimum perturbation and maximum SNR values, was not consistent across individuals. Perturbation and nonlinear dynamic values differed significantly based on what segment of the waveform was used. Other commonly used segment selection methods resulted in significantly higher perturbation values and significantly lower SNR values than those determined by the moving window method (P < 0.001).

Conclusions

The selection of a sample for acoustic analysis can introduce significant inconsistencies into the analysis procedure. The moving window technique may provide more accurate and reliable acoustic measures by objectively identifying the steadiest segment of the voice sample.

Introduction

The characteristics of the speech waveform have been used in research to promote a better understanding of normal and pathological voicing, and to evaluate treatment efficacy.1, 2 To provide valuable data on voicing, these acoustic measures must be sensitive to small changes in the speech waveform while generating consistent values for repeated measures.

Currently, acoustic analysis is performed by selecting a particular segment from each voice signal and analyzing the selected segment using defined acoustic algorithms. Titze (1995) suggested that only periodic or nearly periodic voice signals should be analyzed using acoustic measures.³ Therefore, the selection of a stable vowel segment is essential to analysis; however, there is no standard guidance of how to choose a voice segment.

Numerous methods exist to select voice segments. Human perception is commonly used to determine a steady segment of voice.4, 5, 6 This method introduces inconsistencies based on variability between judges; moreover, samples are generally selected with minimum amplitude variation with little or no regard for frequency variation. Other researchers use the midportion of vowels in their studies7, 8, 9; however, such segmentation does not consider changes in the sample stability over time. Feijoo and Hernandez (1990) used a 40-millisecond window and moved in 20-millisecond interval to determine the accurate pitch period for short-term perturbation measurement in normal and glottal cancer patients.¹⁰ Karnell (1991) selected samples by editing out the beginning and end of each phonation.¹¹ Several other authors have used similar methods to exclude the variability of onset and offset, which are generally associated with a rapidly changing fundamental frequency and amplitude leading to increased and unreliable jitter and shimmer values.12, 13, 14 In many studies, the exact means of segment selection is unclear and even where the procedures are well documented they lead to results that are either subjective and irreproducible or unrepresentative of the waveform.

No voice is ever completely stationary. Dynamic changes in stability will always be found on examination with acoustic methods and periodicity (as expressed by the perturbation values of the waveform), will fluctuate with time as well. Even normal voice samples produce a range of perturbation values.7, 15 Therefore, it can be expected that inconsistencies in the selection of a sample will produce different perturbation measurements depending on the exact location of the sample taken from the waveform. To limit these concerns, the point of minimum perturbation and maximum signal-to-noise ratio (SNR) can be identified objectively in the given voice signals and used to measure aperiodicity.

In this study, a moving window was used to identify the most stable portion (optimal values for the parameters of interest) of voice signals collected from 10 normal subjects. Perturbation parameters and the nonlinear dynamic measure of correlation dimension were calculated. The moving window was used to characterize the changing acoustic parameters within a single voice sample. Results of the moving window selection technique were compared with those obtained by selecting a fixed portion or visually selecting the stable portion of the voice signal.

Section snippets

Participants

Five healthy males, aged 21–23 years (mean, 21.8) and five healthy females aged 20–22 years (mean, 20.8) participated in the study. Subject participation was approved by the Institutional Review Board of the University of Wisconsin Madison. All participants were nonsmoking native speakers of American English. They reported normal hearing ability, no laryngeal or airway infection, and good general health. The subjects were judged to present normal voice and language skills as determined by a

Results

The mean, standard deviation, and range of the perturbation parameters, and D₂ as calculated by the moving window are shown in Table 1. As shown in Figure 2, these parameters varied dramatically during the vowel phonation. The locations of the optimal value for each parameter varied across individuals. The minimum percent jitter values were located in widows with an initial time point between 0.275 and 2.05 seconds. Six individuals had minimum percent jitters in windows starting in the first

Discussion

To determine the effects segment selection on acoustic measurements, methods for segment selection that have been used in previous acoustic literature were compared. The whole vowel, mid-vowel, and visually selected vowel segments generated higher perturbation and D₂ measures. SNR values from visually selected segments were not significantly different from those determined using the moving window method; however, whole vowel and mid-vowel segments tended to have lower SNR values.

The impact of

Conclusion

In this study, we applied a moving window to select the steadiest 0.5-second segment from the normal voice signals. The moving window provides evidence of the effects of voice sample segments on acoustic analysis by demonstrating variability of the parameters and their minimums both within and between normal voices. Significant differences were observed for percent jitter, percent shimmer, and D₂ when comparing the moving window selected segment to traditionally used segment selection methods.

Acknowledgments

This study was supported by grants R01DC05522 from the National Institute on Deafness and Other Communication Disorders.

References (26)

J. Laver et al.
An acoustic screening system for detection of laryngeal pathology
J Phon
(1986)
H. Kasuya et al.
An acoustic analysis of pathological voice and its application to the evaluation of laryngeal pathology
Speech Commun
(1986)
J.K. MacCallum et al.
Effects of low-pass filtering on acoustic analysis of voice
J Voice
(2011)
R.C. Scherer et al.
Preliminary evaluation of selected acoustic and glottographic measures for clinical phonatory function analyses
J Voice
(1988)
L.E. Glaze et al.
Acoustic analysis of vowel and loudness differences in children’s voice
J Voice
(1990)
P. Yu et al.
Objective voice analysis for dysphonic patients: a multiparametric protocol including acoustic and aerodynamic measurements
J Voice
(2001)
C.T. Ferrand
Effects of practice with and without knowledge of results on jitter and shimmer levels in normally speaking women
J Voice
(1995)
Y. Zhang et al.
Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis
J Voice
(2005)
Y. Zhang et al.
Acoustic analyses of sustained and running voices from patients with laryngeal pathologies
J Voice
(2008)
P. Grassberger et al.
Measuring the strangeness of strange attractors
Physica D
(1983)

S.E. Linville et al.

Intraproduction variability in jitter measures from elderly speakers

J Voice

(1990)

I.R. Titze

Workshop on Acoustic Voice Analysis: Summary Statement

(1995)

J.J. Jiang et al.

Objective acoustic analysis of pathological voices from patients with vocal nodules and polyps

Folia Phoniatr Logop

(2009)

Cited by (21)

Myofascial Release Effects in Teachers’ Posture, Muscle Tension and Voice Quality: A Randomized Controlled Trial
2023, Journal of Voice
Citation Excerpt :
The assessment was repeated three times and the average was considered.40,41 The acquisition and analysis of the acoustic signal was done with the opensource software PRAAT version 6.0.35 with a single channel, 16 bits resolution and a sampling rate of 44,100 Hz.33,36,42 The recording was done with the dynamic microphone Shure SM58, through the digital interface Behringer FCA 1616 connected to the HP Pavilion beatsaudio computer.43
Myofascial release (MFR) comprises a set of manual therapeutic techniques applied to many conditions, but specific evidence concerning its effects on body posture, muscle tension and voice has been lacking. Thus, the aim of this study was to verify the effects of MFR in teachers’ posture, muscular tension and voice quality.
Randomized controlled trial – crossover.
Twenty-four teachers, after completing a Sociodemographic and Clinical Questionnaire and providing written informed consent, were randomly distributed into two groups designated Group 1 (G1; n = 12; received MFR first) and Group 2 (G2; n = 12; belong to control group first). All participants received treatment and were into control group, since, after a 14 day period, procedures were switched between groups. Photogrammetry, muscle tension assessed through palpation, algometry, aerodynamic assessment of voice, acoustic and auditory-perceptual analysis of voice were performed before and after interventions.
Regarding voice, statistically significant differences were found when intervention was applied to both groups for maximum phonation time (MPT) (G1 P = 0.019; G2 P = 0.004). The acoustic variables did not differ. Concerning the auditory-perceptual analysis of voice statistically significant differences were found when intervention was applied in both groups for Grade in G2 (P = 0.046) and for Roughness in G1 (P = 0.025). Regarding the photogrammetry assessment statistically significant differences were found when intervention was applied to both groups in many parameters while as control group they did not. Concerning the algometry and muscle tension assessed through palpation statistically significant differences were found when intervention was applied in all muscles.
Findings indicated that MFR seems to be an effective therapy in improving MPT, two subscales (Grade and Roughness) of the GRABASH scale, muscle tension assessed through palpation and algometry. Regarding photogrammetry, MFR had an immediately effect in improvement of the posture, especially related with head.
Relationship Between Oropharyngeal Geometry and Acoustic Parameters in Singers: A Preliminary Study
2022, Journal of Voice
To verify possible correlations between formant and cepstral parameters and oropharyngeal geometry in singers, stratified by sex.
Voice records and oropharyngeal measures of 31 singers – 13 females and 18 males, mean age of 28 (±5.0) years – were retrieved from a database and analyzed. The oropharyngeal geometry measures were collected with acoustic pharyngometry, and the voice records consisted of sustained vowel /Ԑ/ phonation, which were exported to Praat software and edited to obtain the formant and cepstral parameters, stratified by sex. The Pearson linear correlation test was applied to relate voice parameters to oropharyngeal geometry, at the 5% significance level; the linear regression test was used to justify the variable related to the second formant.
Differences between the sexes were identified only in the oral cavity length (greater in males) and pharyngeal cavity length (greater in females). There was a linear correlation between the third formant and the cepstrum in the female group. In the male group, there was a linear correlation between the cepstrum and the third and fourth formants. A positive linear correlation with up to 95% confidence was also identified between the pharyngeal cavity volume and the second formant in the female group, making it possible to estimate a regression model for the second formant (R2 = 0.70).
There are correlations between the oropharyngeal geometry and formant and cepstral parameters in relation to sex. The pharyngeal cavity volume showed the greatest correlation between females and the second formant.
Do the Nonlinear Dynamic Acoustic Measurements, Nonlinear Energy Difference Ratio and Spectrum Convergence Ratio, Correlate with Perceptual Evaluation of Esophageal Voice Speakers?
2022, Journal of Voice
Citation Excerpt :
If the vocalization did not allow for two 0.3s segments with 0.1s overlap, one segment from the middle of the vocalization was taken. This method is more subjective than the moving-window method,20 but our samples were not compatible with moving-window segmentation due to large portions of silence and background noise that would get included in the segments which could be calculated as the least perturbed portion of the segment. Data were analyzed using IBM SPSS 25.0.21
The acoustic assessment of phonation after total laryngectomy is challenged by signal aperiodicity which makes frequency-based acoustic measures less reliable. This is important for patients who use esophageal voice since voice samples mostly include type III (highly aperiodic) and 4 (chaotic) signals. As such, using non-linear measures, which are better suited for aperiodic phonation, may be useful to investigate the relationship between acoustic signal characteristics and perception of esophageal voice quality.
This study aimed to investigate whether nonlinear dynamic acoustic methods, nonlinear energy difference Ratio (NEDR) and spectrum convergence ratio (SCR), were correlated with perceptual measures in subjects who used esophageal phonation.
Thirty-one subjects who had undergone total laryngectomy and use esophageal voice as a rehabilitation method were included in this study. Expert and non-expert raters listened to the esophageal voice samples from the subjects and rated vowels and connected speech samples on a scale from 1 to 7 on dysphonia severity and intelligibility. In addition, non-linear acoustic analysis was performed to calculate NEDR and SCR. Analysis from the raters was compared to the non-linear acoustic analysis to find the correlation between the variables.
There were no significant correlations between any of the non-linear acoustic measures NEDR and SCR and the perceptual ratings at the significance level of 0.05. Correlations were calculated for each acoustic measure among the expert raters and among the non-expert raters in both connected speech samples and sustained vowel fragments.
In conclusion, the nonlinear dynamic acoustic analyses of spectrum convergence ratio and nonlinear energy difference ratio do not have a significant correlation with perceptual measures of esophageal voice.
The Relationship Between Pitch Discrimination and Acoustic Voice Measures in a Cohort of Female Speakers
2022, Journal of Voice
Evidence across a range of musically trained, hearing disordered and voice disordered populations present conflicting results regarding the relationship between pitch discrimination (PD) and voice quality. PD characteristics of female speakers with and without a musical training background and no self-reported voice disorder, and the relationship between PD and voice quality in this particular population, have not been investigated.
To evaluate PD characteristics in a cohort of female participants without a self-reported voice disorder and the relationship between PD and acoustic voice measures.
One hundred fourteen female participants were studied, all of whom self-reported as being non-voice disordered. All completed the Newcastle Assessment of Pitch Discrimination which involved a two-tone PD task. Their voices were recorded producing standardized vocal tasks. Voice samples were acoustically analyzed for frequency-domain measures (fundamental frequency and its standard deviation, and harmonics-to-noise ratio) and spectral-domain measures (cepstral peak prominence and the Cepstral/Spectral Index of Dysphonia). Data were analyzed for the whole cohort and for musical and non-musical training backgrounds.
In the whole cohort, there were no significant correlations between PD and acoustic voice measures. PD accuracy in musically trained speakers was better than in non-trained speakers and correlated with fundamental frequency standard deviation in prolonged vowel tasks. Vocalists demonstrated superior PD accuracy and fundamental frequency standard deviation in prolonged vowels compared to instrumentalists but did not show significant correlations between PD and acoustic measures. The Newcastle Assessment of Pitch Discrimination was a reliable tool, showing moderate-good prediction value in differentiating musical background.
There was little evidence of a relationship between PD and acoustic measures of voice quality, regardless of musical training background and superior PD accuracy among the musically trained. These data do not support ideas concerning the co-development of perception and action among individuals identified as having voice quality measures within normal ranges. Numerous measures of voice quality, including measures sensitive to pitch, did not distinguish across musically and non-musically trained individuals, despite individual differences in pitch discrimination.
Probability-Based Best Sample Selection for Acoustic Analysis of Normal and Disordered Voices
2022, Journal of Voice
Citation Excerpt :
Therefore, in normal voices, stability was characterized using jitter, D2, and SCR, while for disordered samples, stability was determined using D2 and SCR. The ability for each segment selection method to discriminate between normal and disordered voices was demonstrated, which validates previous studies.14 Compared to previous studies, the values obtained in this study for jitter and D2 in normal voices were larger, which can be attributed to differences in the voice samples selected for analysis.
Acoustic analysis is a commonly used method for quantitatively measuring vocal fold function. The accuracy of acoustic analysis depends upon the operator selecting a stable segment of the voice sample to analyze. This paper proposes a novel method to more accurately and reliably select a stable voice segment.
Four selection methods were implemented to evaluate each raw audio signal and determine the most stable segment of each signal: The proposed modal periodogram method, the moving window method, the midvowel method, and the whole vowel method. Acoustic parameters of interest—namely perturbation (jitter), correlation dimension (D2), and spectrum convergence ratio (SCR)—were calculated for 48 phonation samples to evaluate each method.
The proposed modal periodogram method utilizes a minimum mean-square error based approach to calculate a stable modal periodogram and obtain the most stable segment. The Wilcoxon Signed-Rank test was used to compare jitter, D2, and SCR values acquired using the modal periodogram method against the current standard segment selection methods.
The modal periodogram method yielded significantly lower D2 values, and a significantly higher SCR for both normal and disordered voice samples (P < 0.01). This indicates that the modal periodogram method is more apt for selecting a stable audio segment than the other selection methods.
Associations Between Teachers’ Autonomic Dysfunction and Voice Complaints
2021, Journal of Voice
Citation Excerpt :
The assessment was repeated three times and the average was considered.33,34 The acquisition and analysis of the acoustic signal were done with the opensource software PRAAT version 6.0.35 with a single channel, 16 bits resolution and a sampling rate of 44,100 Hz.27,29,35 The recording was done with the dynamic microphone Shure SM58, through the digital interface Behringer FCA 1616 connected to the HP Pavilion beatsaudio computer.36
This investigation aimed to verify if there were any differences in autonomic nervous system function and voice parameters of teachers with and without voice complaints.
Cross-sectional study.
The Questionnaire of Autonomic Dysfunction was answered by 24 teachers, 6 males, and 18 females, whose heart rate variability was also assessed. Aerodynamic assessment of voice, acoustic and auditory-perceptual analysis of voice were done. Participants were divided into two groups: without voice complaints (WVCG; n = 11) and with voice complaints (VCG; n = 13) based on the completion of the Sociodemographic and Clinical Questionnaire.
For auditory-perceptual analysis, VCG showed significantly higher values on GRBASH subscales Grade (P < 0.001) and Roughness (P = 0.011). Regarding the heart rate variability, it was found that in the VCG, the square root of the mean squared difference of successive RR intervals (RMSSD) and the percentage of adjacent NN intervals differing by more than 50 milliseconds (pNN50) were significantly lower than in the WVCG (P = 0.023 and P = 0.032, respectively). The VCG presented a higher occurrence of neurovegetative symptoms directly related to voice, namely in fluctuating nose obstruction (P = 0.011), neck pain (while or after speaking) (P = 0.017) and in fatigability when speaking (P = 0.004). Concerning the aerodynamic assessment of voice, acoustic analysis of voice and neurovegetative symptoms not directly related to voice, no statistically significant differences between groups were found.
Findings indicated significantly lower values in RMSSD and pNN50 of teachers VCG when compared with teachers WVCG and that the teachers VCG presented a higher occurrence of neurovegetative symptoms directly related to voice than the ones WVCG.

View all citing articles on Scopus

View full text

The Effect of Segment Selection on Acoustic Analysis

Summary

Objective/Hypothesis

Study Design

Methods

Results

Conclusions

Introduction

Section snippets

Participants

Results

Discussion

Conclusion

Acknowledgments

J Phon

Speech Commun

J Voice

J Voice

J Voice

J Voice

J Voice

J Voice

J Voice

Physica D

J Voice

Workshop on Acoustic Voice Analysis: Summary Statement

Objective acoustic analysis of pathological voices from patients with vocal nodules and polyps

Folia Phoniatr Logop