Articles From William F. Katz
Filter Results
Cheat Sheet / Updated 03-14-2022
Phonetics is the scientific study of speech sounds. Phoneticians are interested in how people produce and understand speech sounds. Using symbols from the International Phonetic Alphabet (IPA), phoneticians transcribe the sounds of any languages in the world. Here are some important phonetic terms to help you, all described in plain English.
View Cheat SheetArticle / Updated 01-12-2017
To make sure the information in Phonetics For Dummies is technically correct and as clear as possible, the author reviewed the title again after publication. The errata document clarifies some points and corrects errors that appeared in the first printing, despite the best efforts of the author and publisher. For access to the errata, click here.
View ArticleArticle / Updated 03-26-2016
You make consonants by completely or partially blocking airflow during speech. You can do this in different ways: you can completely block airflow, push air through a groove or slit to make a hissing sound, block air then make a hiss, or bring the speech articulators (the organs of speech) close together to shape sound. The result is different manners of articulation (different ways of making a sound). You need to be able to label all these processes in order to work with speech in a clinical or educational setting. Here are some key terms for consonant manner of articulation. Affricate: A stop followed by a fricative with the same place of articulation, such as /ʧ/ as in "chip" and /ʤ/ as in "germ." Approximant: A sound made by bringing articulators together to shape airflow, while not blocking air or causing hissing. Examples include "read," "weed," "lead," and "you." Flap: A rapidly made stop consonant, usually voiced, such as the "t" in "Betty" as pronounced in American English Fricative: A hissy consonant, such as in "fat," "vat," "thick," "this," "sip," "zip," "ship," and "leisure". It's made by producing friction in the airstream. Glide: A subgroup of the approximants, also called semivowels, including the sounds /j/ as in "you" and /w/ as in "we". Lateral: Sounds made by directing airflow around the sides of the tongue, such as /l/ in "listen". Liquid: The other two English approximants (besides glides), /l/ and /ɹ/. Nasal: Sounds produced with airflow escaping through the nasal passage, such as in "meat," "neat," and "sing". Stop: Also known as plosive, a sound made with complete closure of the oral cavity.
View ArticleArticle / Updated 03-26-2016
Phoneticians use the International Phonetic Alphabet (IPA) to distinguish sound substitutions (one speech sound switched for another) from distortion (slurring or mistiming) errors. This information is helpful to pinpoint the level at which a patient is making speech errors. The following terms can assist you in working with individuals with speech and language problems: Aphasia: A language disorder in adults resulting from brain injury or disease in which speaking and listening may be affected. Broca’s aphasia and Wernicke’s aphasia are two common types. Apraxia of Speech (AOS): Also known as verbal apraxia, a condition following brain injury or disease when adults have effortful, dysfluent speech marked by many speech errors. Generally considered a problem with planning and executing speech motor actions (such as putting the lips together and getting the vocal folds ready to make the voiced bilabial stop, /b/). Dysarthria: A group of speech disorders resulting from a disturbance in neuromotor control, resulting in distortion. Affects the clarity of speech and effectiveness of spoken communication. ExtIPA: An extended set of IPA symbols designed for disordered speech. Some symbols, especially diacritics, can also be used for the speech of healthy talkers. Phonemic misperception: A listening problem that occurs when an individual with a communication disorder tries to say a certain speech target but instead makes an improperly timed or poorly coordinated production. As a result, you (the listener) don’t know into which perceptual sound category the production should fall. Sound implementation error: Also known as a phonetic error. It’s the difficulty outputting a selected speech sound. Most frequently associated with Broca’s aphasia and AOS. Sound selection error: Also known as a phonemic error. It’s the difficulty selecting the correct speech sound for speech output. Associated with fluent-type (Wernicke’s) aphasia. VoQS: A set of voice quality symbols useful for describing world languages and also speech pathology conditions, including electrolarynx speech, harsh voice, and so forth.
View ArticleArticle / Updated 03-26-2016
A source-filter system produces human speech. Speech begins with a breathy source. The airflow beginning at the lungs causes sound to be produced through vibration and hissiness at the larynx (also referred to as your voicebox) in your throat. You then shape this sound through a filter, the passageways of the mouth and nasal cavity (nose). As you move your tongue around in your mouth to different areas, different tube-like vocal tract shapes are created. These shapes result in different sounds. Here are some important terms related to speech anatomy: Alveolar ridge: A bony ridge at the roof of your mouth about a half-inch behind your upper teeth. Glottis: The hole (or space) between the vocal folds in your throat. Larynx: Also referred to as the Adam's apple, it's the voice box made of cartilage in your throat that holds your vocal folds. Lips: Important for forming consonants such as in "pat," "bat," "mat," "fat," "vat," and "wet". They're protruded for some vowels. Palate: Roof of the mouth, divided into hard palate (front) and soft palate (back). Pharynx: A tube that connects the larynx to the oral cavity (mouth), located at the far back of your throat. Teeth: Used to make dental sounds such as /θ/ in "teeth" and /ð/ in "those" Tongue: The most important organ of speech production. A large muscle capable of amazing shape changes, used for speech and feeding. Uvula: A dangling piece of tissue at the very end of the soft palate (the velum) that can act as a place of articulation for consonants in many languages Velum: Another name for the soft palate, the back part of the roof of the mouth that is not supported by bony cartilage. Vocal folds: Also known as the vocal cords, they're two small flaps of muscle (about a half-inch long) in the larynx that vibrate and create speech.
View ArticleArticle / Updated 03-26-2016
You make vowels in a different way than consonants. Vowels don't involve air blockage, but instead require a more continual sound flow and sound shaping. Phoneticians describe vowel production in terms of HAR: Height (whether the tongue is high, mid, or low in the mouth) Advancement (how front or back the tongue is) Rounding (whether the lips are protruded, for sounds like the "oo" of "boot.") Another way is to consider place of articulation (where in the mouth the tongue is place) and manner of articulation (how the sound is made) features. Here are some key manners of articulation terms for consonants: Cardinal vowels: Anchor points worked out by the phonetician Daniel Jones to help people classify vowels. Cardinal vowels are special vowels, not really found in any world languages, that phoneticians use for ear training to later detect small differences in sound quality between real-world examples. Diphthong: Sounds that glides from one vowel to another, such as in "cow," "boy," and "fight". These sounds are made with two tongue positions. Lax/tense: An important grouping describing two phonological classes of vowels: Lax vowels can only appear in stressed, closed syllables (ending with a consonant). Tense vowels can also be in open syllables. Monophthong: A vowel with a single sound quality, such as the middle sound in "rat" or "bit". These sounds are made with one tongue position. Rhotic: Also referred to as r-coloring, rhotic means there is an "r" like sound present, such as the vowel sounds in "fear," "fare," "far," and "for". Schwa: A mid-central, unrounded vowel [ə] that is poorly named because it rhymes with "duh".
View ArticleArticle / Updated 03-26-2016
Phoneticians transcribe connected speech by marking how words and syllables go together and by following how the melody of language changes. Many phoneticians indicate pauses with special markings (such as [ǀ] for a short break, and [ǁ] for a longer break). You can show changes in with a drawn line called a pitch plot or by more sophisticated schemes, such as the tones and break indices (ToBI) system. Here are some important terms to know when considering speech melody: Compounding: When two words come together to form a new meaning (such as "light" and "house" becoming "lighthouse." In such a case, more stress is given to the first than the second part. Focus: Also known as emphatic stress. When stress is used to highlight part of a phrase or sentence. Juncture: How words and syllables are connected in language. Intonational phrase: Also known as a tonic unit, tonic phrase, or tone group. Pattern of pitch changes that matches up in a meaningful way with a part of a sentence. Lexical stress: When stress plays a word-specific role in language, such as in English where you can't put stress on the wrong "syllable." Sentence-level intonation: The use of spoken pitch to change the meaning of a sentence or phrase. For example, an English statement usually has falling pitch (high to low), while a yes/no question has a rising pitch (low to high). Stress: Relative emphasis given to certain syllables. In English, a stressed syllable is louder, longer, and higher in sound. Syllable: Unit of spoken language consisting of a single uninterrupted sound formed by a vowel, diphthong, or syllabic consonant, with optional sounds before or after it. Tonic syllable: An important concept for many theories of prosody, the syllable that carries the most pitch changes in an intonational phrase. ToBi: Tone and break indices. A set of conventions used for working with speech prosody. Although originally designed for English, ToBI is now adapted to work with a few other languages.
View ArticleArticle / Updated 03-26-2016
Phonetics has come a long way since the good ol' days of Daniel Jones and his colleagues in London at the turn of the century. Technology and mass communication have revolutionized the field of phonetics, allowing breakthroughs the founders would never have imagined. The following previews some of these amazing new directions. Training computers to recognize human emotions in speech Clearly, many situations exist where recognizing emotion in speech can be important. Think of how your voice may become increasingly tense as you wait on the phone for a computer operator to (finally) hand you over to a real person. Or more seriously, consider people working in emergency situations such as a 911 operator. Major, potentially life-threatening problems can occur if a 911 operator can't understand what you're saying. Working with emotion in speech is a cutting-edge research topic in many laboratories worldwide. For instance, Dr. Carlos Busso at the University of Texas at Dallas has experimented pairing computerized voices and visual heads expressing the emotions of anger, joy, and sadness. This work has compared the speech of actors and ordinary individuals in more naturalistic situations. From the audio recordings, Busso uses pitch features to classify emotions. He then uses motion tracking technology to record speakers' facial movements during speech. The findings show that certain regions of the face are more critical for expressing certain emotions than others. Linguistics and scientists can now use the results from these studies to create more believable avatars (computerized human-like characters), and to better understand disorders, such as Parkinson's disease (in which disintegration of the nervous system causes a loss of facial expression), and autism (in which attendance to facial cues appears to be a problem). Animating silicon vocal tracts Different ways can help you understand the human vocal tract. One way is to study the human body through anatomy and physiology. Another way is to construct models of the system and study the biomechanical properties of these creations. Silicon vocal tracts are a new type of model that can be used for speech synthesis, the manmade creation of speech by machine. The beginning of speech synthesis actually goes back to the 1700s with a bagpipe-like talking machine consisting of leather bellows (to serve as the lungs) and a reed (to serve as the vocal folds). Although this system squeaked its way through speech, it wasn't possible to decipher much about the speech source or filter by studying its components. Today people remain fascinated by talking machines, including robots and humanoid creations. Such robots help with animation and other artistic purposes, as well as helping researchers better understand anatomical systems. Producing a human-like articulatory system isn't simple. The human body has very specific density, damping, elasticity, and inertial properties that aren't easy to replicate. The changing physical shapes of the vocal tract are also difficult to mechanically reproduce. For instance, the tongue is a muscular hydrostat that preserves its volume when changing shape. The tongue elongates when protruded and humps when retracted. Dr. Atsuo Takanishi at Waseda University in Japan has spent decades perfecting a silicon head that can produce vowels, consonants, and fricatives in Japanese. You can watch movies of his various contraptions, including silicon vocal folds, motorized tongues, and gear-driven lips and face. Getting tubular and synthetic A method of synthesizing speech more cerebral than building robots involves making electronic or mathematical models of the speech production system. After researchers understand these complex systems, they can create them and then manipulate these systems in a computer to simulate the human system (albeit electronically). Gunnar Fant, who developed models of the relation between the human speech anatomy and formant frequencies, spearheaded this type of work in the 1950s. This enterprise also draws on the physical models of Hermann von Helmholtz who described how single resonators and coupled resonators shape input sound. More recent versions of tube models are making breakthroughs with difficult problems, such as replicating the voices of women and children, as well giving computers the illusion that they're successfully singing. Brad Story, a professor at the University of Arizona, is working on a prototype called tube talker. This system is based on modeled physiology of the vocal folds and the upper airway system. Its design incorporates video images of the vocal folds and MRI images of the vocal tract taken during speech. By using both articulatory and acoustic constraints, Story and his team can model and move virtual articulators to create smooth, speech-like movements. The result is a sound wave that can be listened to, analyzed, and compared to real speech. Tube talker has been modified in some strange and interesting ways. For example, traditional models of speech suggest that the voice and filter components should be considered separate. However, for some types of sung voice (and perhaps for a children's voice), this may not be the case. Recent versions of the tube talker have tested nonlinear interactions between source and filter as new possible combinations to better model such types of voice and song. Another model using tube-like designs has won a recent European speech synthesis song contest for not only making plausible spoken speech, but also for singing (you can witness the eerie spectacle of transparent 3D computerized vocal tracts, developed by Dr. Peter Birkholz, singing a duet). Training with Baldi and other avatars Instructional agents, such as avatars that are designed to be expert speakers of various languages, are another interesting trend in phonetics. Such systems can help instructors by giving additional practice with lesson plans, assisting in training with second language learning, working with the hard of hearing, or individuals having particular difficulty interacting with live speech partners (such as persons with autism). Under the direction of Professor Dominic Massaro at the University of California at Santa Cruz, researchers have come up with a 3D talking head named Baldi, capable of doing many tasks. For instance, Baldi has helped Japanese students develop their English accent and has assisted in deaf education. In more recent versions, Baldi's head has become transparent in order to better show his vocal tract so that learners of languages in which special tongue and pharynx positions are important (such as Arabic) can see what's going on. Baldi has even sprouted legs, arms, and a body because an avatar's gestures can in some situations add to a more effective language-learning situation. This type of research suggests that work with avatars can hold a bold and promising future for phonetics. Helping the mute talk with silent speech interfaces Silent speech interface (SSI) can be especially useful in military applications, such as for personnel in loud cockpits or vehicles that prevent them from hearing themselves speak or from being recorded by a microphone. Furthermore, SSI can help others who can't produce audible sound from their vocal folds, but their articulators (tongue, lips, and jaw) still work. Having an artificial vocal source would alleviate this problem. If the position of the person's tongue can be tracked in real time, and this information were fed to a computer, the two could be coupled with a voicing source and, presto, speech. Several exciting working prototypes for SSIs are currently under development. The following focus on articulatory acoustic principles and flesh-point articulator tracking technologies: Researchers in South Africa are working on a system using electropalatography (EPG). Scientists at the University of Georgia are exploring the use of a permanent magnet tracking system. Other researchers are working on lip and tongue tracking systems. One day the ultimate goal is to have individuals who can't speak due to the loss of the larynx to simply pull out their phone (or a device roughly that size), push a button, and then have a high quality synthesized voice speak for them as they articulate. Visualizing tongue movement for stroke patients Many individuals with left cortical brain damage have apraxia of speech (AOS), a problem controlling the production of speech sounds. Although these patients generally understand language fairly well, if they want to pronounce a certain sound, say "s" in the word "see," the sound may come out wrong, such as as "she." AOS is very frustrating to patients because they typically know they've produced a sound in error. They commonly feel like they know what to say, but they just can't get it out. One proven principle known to help these patients is practice (practice makes perfect), particularly as such individuals tend to stop speaking due to frustration, depression, and having other family members take over and speak for them. Another important therapeutic principle is articulatory training. The University of Dallas at Texas laboratory (in conjunction with colleagues at the University of Pittsburgh) is giving individuals with AOS visual feedback concerning the position of their tongue during speech. This intervention is based on the premise that individuals with AOS have a breakdown with sound sequencing and sound implementing, but their eye-to-tongue feedback monitoring systems are intact. A number of studies have found that this method can help individuals with AOS increase the accuracy of their sound production after stroke. The work to date has relied on information from a single articulatory data point (such as the tongue tip). Future work will give patients a 3D avatar that shows them the online movement of their tongue while they speak. Doing so will permit treatment of a broader range of speech sounds and will allow clinicians to treat manner of articulation, as well as place. Sorting more masculine voice from less masculine voice A number of properties in the voice can actually indicate masculinity. Phoneticians have terms for this: More masculine speech (MMS) Less masculine speech (LMS) MMS is lower in fundamental frequency (the pitch a person hears). The two also seem to have differences in the spectral quality (how high pitched the hissiness is) of the fricatives. Also, MMS individuals have less pronounced vowel space than individuals judged to be LMS (meaning LMS talkers use greater tongue excursions while talking). Companies or governments may be able to use this information to design a male versus female voice detector and perhaps an even more detailed detector (straight versus gay) for simple kinds of judgments. However, conveying gender through speech is more complicated than a general approximation of the biological properties of the opposite sex. That is, despite what popular culture often implies, the speech of gay men doesn't seem to be merely a feminized version of the speech of straight men (or the speech of lesbians a masculinized version of the speech of straight women). Ron Smyth, a professor at the University of Toronto, has studied the differences between more and less gay-sounding male speech. His work reveals that the following complex mix of acoustic properties characterizes "gay-sounding speech": Vowels produced closer to the edges of the vowel space Stop consonants with longer voice onset times (VOTs) Longer /s/ and /ʃ/ fricatives with higher peak frequencies More light "l" allophones Smyth's work also shows that many of these judgments also depend on assumptions made by the listeners, the types of speech samples provided, and on the gender and sexual orientation of the listeners themselves. Sexual orientation and speech is an ongoing topic of research to determine whether popular-cultural stereotypes are based on anything tangible, and whether people's perception of sexual orientation (gay people's self-proclaimed gaydar) is what it claims to be (His work has shown that people's gaydar based on speech usually isn't reliable.) These issues relate to the field of sociolinguistics, the study of the relationship between language and society. Studies have shown, for instance, that young (heterosexual) men will lower their fundamental frequency when a young female questioner, rather than a male, walks into the room. These men are presumably making themselves attractive through a lower voice. If the previous studies findings are accurate, a research could assume that under the same experimental conditions, women would increase the breathiness of their voice, a characteristic known to increase the percept of more attractive female speech. Figuring out the foreign accent syndrome (FAS) Foreign Accent Syndrome (FAS) is a speech motor disorder where adults present with foreign-sounding speech as the result of mistiming and prosodic abnormalities resulting from brain disorder. It continues to fascinate the public and scientists alike. Study of individuals having this disorder can potentially give a better picture of which brain systems are involved in producing and understanding accent. So far, most of the FAS cases have been native English-speaking individuals, although increasingly other European languages are also being recorded. Now several non-Indo-European (Hebrew, Japanese, and Arabic) cases have been recorded. Researchers are interested in which varieties of languages are affected, and researchers question the extent to which stress- and or syllable-based prosodic factors (commonly quantified as Pairwise Variability Index, (PVI)) plays a role in whether such patients are perceived as foreign, and whether there are high-PVI and low-PVI FAS subtypes. Another puzzle in the FAS picture is how cases that result from frank focal lesions (such as from stroke or tumor) can be related to those of less specific or unknown etiologies (such as migraine, allergy, or possibly psychogenic causes). An individual with a lesion in a well-established brain region known to correspond to speech function (like the perisylvian language zone) may be assumed to have a plausible cause for FAS. The situation for individuals with no known physiological cause is less clear. Many patients referred to the clinic at the University of Texas at Dallas for suspected FAS have been diagnosed with conversion disorder. This is a condition in which patients experience neurological symptoms that medical evaluation can't explain. Conversion disorder isn't malingering (faking illness) and it can affect speech, yet this is not the same thing as the FAS. To best evaluate FAS, professionals should work closely in a team that ideally includes a psychologist and psychiatrist. Including phonetic tests to rule out intentional, inadvertent, or mimicked accent modification is also important. Discovering the genetics of speech Phoneticians have become more interested in the fast-moving and exciting field of genetics to find the basis of speech and language. A tumult started in the 1980s with the discovery of a family in West London that had a series of family-related speech and language problems. Between the various members of the family (named KE) were nine siblings. Four of these siblings had pronounced problems with comprehension, understanding sentences such as "The boy is being chased by the tiger" to mean "The boy is chasing the tiger." They also dropped sounds at the beginning of words, such as saying "art" when intending to say "tart." From such behavior, it became clear there was something family-related particularly affecting their speech and language. In the mid-1990s, a group of Oxford University geneticists began to search for the damaged gene in this family. They found this disorder resulted when only one gene was passed from a generation to the next (autosomal dominant) and wasn't sex-linked. Further investigation pinned the gene to an area on chromosome 7, which was called Speech and Language Disorder 1 (SPCH1). The geneticists proceeded to pinpoint the precise location of the chromosome 7 breakage in the case of another child with a genetic speech and language disorder. It turned out to relate to the KE cases in an amazing way: Both encoded something called Forkhead Box Protein (FOXP2), a transcriptional protein that codes other factors needed for neurological, gut, and lung systems. FOXP2 is associated with vocal learning in young songbirds, echolocation in bats, and possibly in other vocal-learning species, such as whales and elephants. Mice with human-FOXP2 genes spliced into their DNA emitted low funky squeaks and grew different neural patterns in their brains in regions involved with learning. Like all exciting scientific stories, the FOXP2 story isn't without controversy. Many popular reports of these discoveries make simplified claims, overlooking the multifactorial genetic basis for speech and language. For example, the descent of the human larynx was undoubtedly important in making speech physically possible, in comparison to the vocal tract of chimpanzees. Yet this genetic process doesn't likely seem tied to FOXP2, suggesting that other gene loci are arguably involved. Indeed, other genes are already emerging. FOXP2 switches off a gene called Contactin-associated protein-like 2 (CNTNAP2). This gene has been associated in both specific language impairment (SLI) and autism. Nerve cells in the developing brain, particularly in circuits associated with language, deploy CNTNAP2, which encodes the protein. Matching dialects for fun and profit Many people change their spoken accent through the course of a day to match the accent of people to which they're talking. You can call this being an accent sponge, although it's more technically referred to as dialect matching or register matching. Dialect matching is quite natural for people. In fact, it has become one of the hot areas in computer speech recognition for the potential of matching a call-in telephone request with an online response matched in dialect. Because people seem to appreciate group membership, the idea is to have the computer quickly recognize your dialect and match you up with a phone buddy or computerized voice that matches you. Researchers are designing computer systems with phone unit recognition and phone unit adaptation modules. Telephone systems using such technologies can determine the accent of the person calling, extract the features of that accent, and modify the synthesized voicing responding to the caller by best matching that person's accent. If done correctly, it can lead to greater intelligibility and perhaps a better subjective feeling in the conversation. On the other hand, if it's not done well, people may feel mimicked or mocked. You can just imagine how this sort of thing can be used in computerized dating systems. Dialect matching is even natural for Orca whales, bottlenose dolphins, and spear-nosed bats, too. Orcas and dolphins use coordinated squeaks and whistles to decide what they will hunt and travel with. Study of spear-nosed bats has shown that the females match their calls to recruit other members of their roost when they find a rich food source and collectively defend their food from other bats. According to biologists, these animal sounds are all cases of signaling for group membership.
View ArticleArticle / Updated 03-26-2016
Spectrograms make speech visible and are one of the most popular displays used by phoneticians, speech scientists, clinicians, and dialectologists. A spectrogram is a readout that shows frequency on the vertical axis, time on the horizontal axis, and amplitude (amount of sound energy) as either darkness or coloration. See the following figure. Here are some key terms to remember (you can refer to the following figure for where some of these terms appear on the spectrogram): Bandwidth: Original method used to track formants on a spectrogram. Now usually replaced by a fast fourier transform (FFT) and linear predictive coding (LPC). Burst: Acoustic event caused by the sudden release of airflow from a stop consonant. Looks like a thin vertical spike on a spectrogram. Frication: Turbulent airflow marking the presence of fricatives (such as /s/ and /ʃ/) or affricates (such as /ʧ/ and /ʤ/). Shows up on spectrogram as darkness spread across a wide frequency section. Formant frequencies: Important acoustic cues for vowel quality resulting from vocal tract resonance. They show up on the spectrogram as dark bands running roughly horizontal with the bottom of the page. Formant frequency transition: Region of rapid formant movement or change important for identifying consonants, particularly stops and affricates. Locus: Frequency regions that help identify place of articulation in stop consonants. For example, second formant frequency (F2) transitions starting at relatively low frequencies (and then rising) are likely bilabial. Stop gap: A silent region on a spectrogram (which shows up as blank) that helps distinguish the presence of a stop consonant. Voice bar: Dark band running parallel to the very bottom of the spectrogram indicating energy associated with voicing.
View Article