The Effect of Segment Selection on Acoustic Analysis
Introduction
The characteristics of the speech waveform have been used in research to promote a better understanding of normal and pathological voicing, and to evaluate treatment efficacy.1, 2 To provide valuable data on voicing, these acoustic measures must be sensitive to small changes in the speech waveform while generating consistent values for repeated measures.
Currently, acoustic analysis is performed by selecting a particular segment from each voice signal and analyzing the selected segment using defined acoustic algorithms. Titze (1995) suggested that only periodic or nearly periodic voice signals should be analyzed using acoustic measures.3 Therefore, the selection of a stable vowel segment is essential to analysis; however, there is no standard guidance of how to choose a voice segment.
Numerous methods exist to select voice segments. Human perception is commonly used to determine a steady segment of voice.4, 5, 6 This method introduces inconsistencies based on variability between judges; moreover, samples are generally selected with minimum amplitude variation with little or no regard for frequency variation. Other researchers use the midportion of vowels in their studies7, 8, 9; however, such segmentation does not consider changes in the sample stability over time. Feijoo and Hernandez (1990) used a 40-millisecond window and moved in 20-millisecond interval to determine the accurate pitch period for short-term perturbation measurement in normal and glottal cancer patients.10 Karnell (1991) selected samples by editing out the beginning and end of each phonation.11 Several other authors have used similar methods to exclude the variability of onset and offset, which are generally associated with a rapidly changing fundamental frequency and amplitude leading to increased and unreliable jitter and shimmer values.12, 13, 14 In many studies, the exact means of segment selection is unclear and even where the procedures are well documented they lead to results that are either subjective and irreproducible or unrepresentative of the waveform.
No voice is ever completely stationary. Dynamic changes in stability will always be found on examination with acoustic methods and periodicity (as expressed by the perturbation values of the waveform), will fluctuate with time as well. Even normal voice samples produce a range of perturbation values.7, 15 Therefore, it can be expected that inconsistencies in the selection of a sample will produce different perturbation measurements depending on the exact location of the sample taken from the waveform. To limit these concerns, the point of minimum perturbation and maximum signal-to-noise ratio (SNR) can be identified objectively in the given voice signals and used to measure aperiodicity.
In this study, a moving window was used to identify the most stable portion (optimal values for the parameters of interest) of voice signals collected from 10 normal subjects. Perturbation parameters and the nonlinear dynamic measure of correlation dimension were calculated. The moving window was used to characterize the changing acoustic parameters within a single voice sample. Results of the moving window selection technique were compared with those obtained by selecting a fixed portion or visually selecting the stable portion of the voice signal.
Section snippets
Participants
Five healthy males, aged 21–23 years (mean, 21.8) and five healthy females aged 20–22 years (mean, 20.8) participated in the study. Subject participation was approved by the Institutional Review Board of the University of Wisconsin Madison. All participants were nonsmoking native speakers of American English. They reported normal hearing ability, no laryngeal or airway infection, and good general health. The subjects were judged to present normal voice and language skills as determined by a
Results
The mean, standard deviation, and range of the perturbation parameters, and D2 as calculated by the moving window are shown in Table 1. As shown in Figure 2, these parameters varied dramatically during the vowel phonation. The locations of the optimal value for each parameter varied across individuals. The minimum percent jitter values were located in widows with an initial time point between 0.275 and 2.05 seconds. Six individuals had minimum percent jitters in windows starting in the first
Discussion
To determine the effects segment selection on acoustic measurements, methods for segment selection that have been used in previous acoustic literature were compared. The whole vowel, mid-vowel, and visually selected vowel segments generated higher perturbation and D2 measures. SNR values from visually selected segments were not significantly different from those determined using the moving window method; however, whole vowel and mid-vowel segments tended to have lower SNR values.
The impact of
Conclusion
In this study, we applied a moving window to select the steadiest 0.5-second segment from the normal voice signals. The moving window provides evidence of the effects of voice sample segments on acoustic analysis by demonstrating variability of the parameters and their minimums both within and between normal voices. Significant differences were observed for percent jitter, percent shimmer, and D2 when comparing the moving window selected segment to traditionally used segment selection methods.
Acknowledgments
This study was supported by grants R01DC05522 from the National Institute on Deafness and Other Communication Disorders.
References (26)
- et al.
An acoustic screening system for detection of laryngeal pathology
J Phon
(1986) - et al.
An acoustic analysis of pathological voice and its application to the evaluation of laryngeal pathology
Speech Commun
(1986) - et al.
Effects of low-pass filtering on acoustic analysis of voice
J Voice
(2011) - et al.
Preliminary evaluation of selected acoustic and glottographic measures for clinical phonatory function analyses
J Voice
(1988) - et al.
Acoustic analysis of vowel and loudness differences in children’s voice
J Voice
(1990) - et al.
Objective voice analysis for dysphonic patients: a multiparametric protocol including acoustic and aerodynamic measurements
J Voice
(2001) Effects of practice with and without knowledge of results on jitter and shimmer levels in normally speaking women
J Voice
(1995)- et al.
Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis
J Voice
(2005) - et al.
Acoustic analyses of sustained and running voices from patients with laryngeal pathologies
J Voice
(2008) - et al.
Measuring the strangeness of strange attractors
Physica D
(1983)
Intraproduction variability in jitter measures from elderly speakers
J Voice
Workshop on Acoustic Voice Analysis: Summary Statement
Objective acoustic analysis of pathological voices from patients with vocal nodules and polyps
Folia Phoniatr Logop
Cited by (21)
Myofascial Release Effects in Teachers’ Posture, Muscle Tension and Voice Quality: A Randomized Controlled Trial
2023, Journal of VoiceCitation Excerpt :The assessment was repeated three times and the average was considered.40,41 The acquisition and analysis of the acoustic signal was done with the opensource software PRAAT version 6.0.35 with a single channel, 16 bits resolution and a sampling rate of 44,100 Hz.33,36,42 The recording was done with the dynamic microphone Shure SM58, through the digital interface Behringer FCA 1616 connected to the HP Pavilion beatsaudio computer.43
Do the Nonlinear Dynamic Acoustic Measurements, Nonlinear Energy Difference Ratio and Spectrum Convergence Ratio, Correlate with Perceptual Evaluation of Esophageal Voice Speakers?
2022, Journal of VoiceCitation Excerpt :If the vocalization did not allow for two 0.3s segments with 0.1s overlap, one segment from the middle of the vocalization was taken. This method is more subjective than the moving-window method,20 but our samples were not compatible with moving-window segmentation due to large portions of silence and background noise that would get included in the segments which could be calculated as the least perturbed portion of the segment. Data were analyzed using IBM SPSS 25.0.21
Probability-Based Best Sample Selection for Acoustic Analysis of Normal and Disordered Voices
2022, Journal of VoiceCitation Excerpt :Therefore, in normal voices, stability was characterized using jitter, D2, and SCR, while for disordered samples, stability was determined using D2 and SCR. The ability for each segment selection method to discriminate between normal and disordered voices was demonstrated, which validates previous studies.14 Compared to previous studies, the values obtained in this study for jitter and D2 in normal voices were larger, which can be attributed to differences in the voice samples selected for analysis.
Associations Between Teachers’ Autonomic Dysfunction and Voice Complaints
2021, Journal of VoiceCitation Excerpt :The assessment was repeated three times and the average was considered.33,34 The acquisition and analysis of the acoustic signal were done with the opensource software PRAAT version 6.0.35 with a single channel, 16 bits resolution and a sampling rate of 44,100 Hz.27,29,35 The recording was done with the dynamic microphone Shure SM58, through the digital interface Behringer FCA 1616 connected to the HP Pavilion beatsaudio computer.36