Elsevier

Journal of Voice

Volume 26, Issue 1, January 2012, Pages 1-7
Journal of Voice

The Effect of Segment Selection on Acoustic Analysis

https://doi.org/10.1016/j.jvoice.2010.10.009Get rights and content

Summary

Objective/Hypothesis

Acoustic analysis is a commonly used method for quantitatively measuring vocal fold function. Voice signals are analyzed by selecting a waveform segment and using various algorithms to arrive at parameters such as jitter, shimmer, and signal-to-noise ratio (SNR). Accurate and reliable methods for selecting a representative vowel segment have not been established.

Study Design

Prospective repeated-measure experiment.

Methods

We applied a moving window method by isolating consecutive, overlapping segments of the raw voice signal from onset through offset. Ten normal voice signals were analyzed using acoustic measures calculated from the moving window. The location and value of minimum perturbation/maximum SNR was compared across individuals. The moving window method was compared with data from the whole vowel excluding onset and offset, the mid-vowel, and the visually selected steadiest portion of the voice signal.

Results

Results showed that the steadiest portion of the waveforms, as defined by minimum perturbation and maximum SNR values, was not consistent across individuals. Perturbation and nonlinear dynamic values differed significantly based on what segment of the waveform was used. Other commonly used segment selection methods resulted in significantly higher perturbation values and significantly lower SNR values than those determined by the moving window method (P < 0.001).

Conclusions

The selection of a sample for acoustic analysis can introduce significant inconsistencies into the analysis procedure. The moving window technique may provide more accurate and reliable acoustic measures by objectively identifying the steadiest segment of the voice sample.

Introduction

The characteristics of the speech waveform have been used in research to promote a better understanding of normal and pathological voicing, and to evaluate treatment efficacy.1, 2 To provide valuable data on voicing, these acoustic measures must be sensitive to small changes in the speech waveform while generating consistent values for repeated measures.

Currently, acoustic analysis is performed by selecting a particular segment from each voice signal and analyzing the selected segment using defined acoustic algorithms. Titze (1995) suggested that only periodic or nearly periodic voice signals should be analyzed using acoustic measures.3 Therefore, the selection of a stable vowel segment is essential to analysis; however, there is no standard guidance of how to choose a voice segment.

Numerous methods exist to select voice segments. Human perception is commonly used to determine a steady segment of voice.4, 5, 6 This method introduces inconsistencies based on variability between judges; moreover, samples are generally selected with minimum amplitude variation with little or no regard for frequency variation. Other researchers use the midportion of vowels in their studies7, 8, 9; however, such segmentation does not consider changes in the sample stability over time. Feijoo and Hernandez (1990) used a 40-millisecond window and moved in 20-millisecond interval to determine the accurate pitch period for short-term perturbation measurement in normal and glottal cancer patients.10 Karnell (1991) selected samples by editing out the beginning and end of each phonation.11 Several other authors have used similar methods to exclude the variability of onset and offset, which are generally associated with a rapidly changing fundamental frequency and amplitude leading to increased and unreliable jitter and shimmer values.12, 13, 14 In many studies, the exact means of segment selection is unclear and even where the procedures are well documented they lead to results that are either subjective and irreproducible or unrepresentative of the waveform.

No voice is ever completely stationary. Dynamic changes in stability will always be found on examination with acoustic methods and periodicity (as expressed by the perturbation values of the waveform), will fluctuate with time as well. Even normal voice samples produce a range of perturbation values.7, 15 Therefore, it can be expected that inconsistencies in the selection of a sample will produce different perturbation measurements depending on the exact location of the sample taken from the waveform. To limit these concerns, the point of minimum perturbation and maximum signal-to-noise ratio (SNR) can be identified objectively in the given voice signals and used to measure aperiodicity.

In this study, a moving window was used to identify the most stable portion (optimal values for the parameters of interest) of voice signals collected from 10 normal subjects. Perturbation parameters and the nonlinear dynamic measure of correlation dimension were calculated. The moving window was used to characterize the changing acoustic parameters within a single voice sample. Results of the moving window selection technique were compared with those obtained by selecting a fixed portion or visually selecting the stable portion of the voice signal.

Section snippets

Participants

Five healthy males, aged 21–23 years (mean, 21.8) and five healthy females aged 20–22 years (mean, 20.8) participated in the study. Subject participation was approved by the Institutional Review Board of the University of Wisconsin Madison. All participants were nonsmoking native speakers of American English. They reported normal hearing ability, no laryngeal or airway infection, and good general health. The subjects were judged to present normal voice and language skills as determined by a

Results

The mean, standard deviation, and range of the perturbation parameters, and D2 as calculated by the moving window are shown in Table 1. As shown in Figure 2, these parameters varied dramatically during the vowel phonation. The locations of the optimal value for each parameter varied across individuals. The minimum percent jitter values were located in widows with an initial time point between 0.275 and 2.05 seconds. Six individuals had minimum percent jitters in windows starting in the first

Discussion

To determine the effects segment selection on acoustic measurements, methods for segment selection that have been used in previous acoustic literature were compared. The whole vowel, mid-vowel, and visually selected vowel segments generated higher perturbation and D2 measures. SNR values from visually selected segments were not significantly different from those determined using the moving window method; however, whole vowel and mid-vowel segments tended to have lower SNR values.

The impact of

Conclusion

In this study, we applied a moving window to select the steadiest 0.5-second segment from the normal voice signals. The moving window provides evidence of the effects of voice sample segments on acoustic analysis by demonstrating variability of the parameters and their minimums both within and between normal voices. Significant differences were observed for percent jitter, percent shimmer, and D2 when comparing the moving window selected segment to traditionally used segment selection methods.

Acknowledgments

This study was supported by grants R01DC05522 from the National Institute on Deafness and Other Communication Disorders.

References (26)

  • S.E. Linville et al.

    Intraproduction variability in jitter measures from elderly speakers

    J Voice

    (1990)
  • I.R. Titze

    Workshop on Acoustic Voice Analysis: Summary Statement

    (1995)
  • J.J. Jiang et al.

    Objective acoustic analysis of pathological voices from patients with vocal nodules and polyps

    Folia Phoniatr Logop

    (2009)
  • Cited by (21)

    • Myofascial Release Effects in Teachers’ Posture, Muscle Tension and Voice Quality: A Randomized Controlled Trial

      2023, Journal of Voice
      Citation Excerpt :

      The assessment was repeated three times and the average was considered.40,41 The acquisition and analysis of the acoustic signal was done with the opensource software PRAAT version 6.0.35 with a single channel, 16 bits resolution and a sampling rate of 44,100 Hz.33,36,42 The recording was done with the dynamic microphone Shure SM58, through the digital interface Behringer FCA 1616 connected to the HP Pavilion beatsaudio computer.43

    • Do the Nonlinear Dynamic Acoustic Measurements, Nonlinear Energy Difference Ratio and Spectrum Convergence Ratio, Correlate with Perceptual Evaluation of Esophageal Voice Speakers?

      2022, Journal of Voice
      Citation Excerpt :

      If the vocalization did not allow for two 0.3s segments with 0.1s overlap, one segment from the middle of the vocalization was taken. This method is more subjective than the moving-window method,20 but our samples were not compatible with moving-window segmentation due to large portions of silence and background noise that would get included in the segments which could be calculated as the least perturbed portion of the segment. Data were analyzed using IBM SPSS 25.0.21

    • Probability-Based Best Sample Selection for Acoustic Analysis of Normal and Disordered Voices

      2022, Journal of Voice
      Citation Excerpt :

      Therefore, in normal voices, stability was characterized using jitter, D2, and SCR, while for disordered samples, stability was determined using D2 and SCR. The ability for each segment selection method to discriminate between normal and disordered voices was demonstrated, which validates previous studies.14 Compared to previous studies, the values obtained in this study for jitter and D2 in normal voices were larger, which can be attributed to differences in the voice samples selected for analysis.

    • Associations Between Teachers’ Autonomic Dysfunction and Voice Complaints

      2021, Journal of Voice
      Citation Excerpt :

      The assessment was repeated three times and the average was considered.33,34 The acquisition and analysis of the acoustic signal were done with the opensource software PRAAT version 6.0.35 with a single channel, 16 bits resolution and a sampling rate of 44,100 Hz.27,29,35 The recording was done with the dynamic microphone Shure SM58, through the digital interface Behringer FCA 1616 connected to the HP Pavilion beatsaudio computer.36

    View all citing articles on Scopus
    View full text