Listening, Pronunciation, and Connected Speech

I stumbled upon a very creative video today called “How English Sounds to Non-English Speakers“. The video centers around a couple having a conversation in what sounds like English, but is not very comprehensible. While watching, you catch glimpses of English words, but with little context. If you have ever studied a foreign language, thus experience will be very familiar to you: you know the individual words, but when they are strung together in natural, connected speech, the only thing you can pick out are keywords – and these may or may not lead to comprehension.

I couldn’t have found this video at a better time. I am currently teaching about word and sentence stress in my graduate pronunciation class.  In this video, the only words that are clear are ones that receive the main stress in a sentence. The words that carry the most important meaning are often the words that are stressed the most in a sentence, and thus are pronounced clearly while other sounds get linked, changed, or even omitted.

This is important information, because it means for anyone trying to understand another language, they need to have a good idea about not only the meaning behind stressed words (the keywords we often ask students to listen for) but also the unstressed connected speech which does in fact carry important meaning.

One way to conceive of this is to think of the prosody pyramid. Gilbert (2008) introduces the idea of the prosody pyramid as a sort of framework for aiding listening and pronunciation. As the first tier of the pyramid is the thought group. Pauses and rising and falling intonation usually signal the beginning and end of a thought group, or what can be called a small chunk of meaning. Think about the commas we use in writing. They are used to not only give the reader a slight break but to signal a shift or change in thought. This could be contrasting two independent clauses (i.e. two thoughts), setting aside information (such as an in a non-restrictive relative clause), or simply listing. Teaching students how to both signal thought groups by using proper intonation, as well as listening for thought groups, is an important skill.

In each thought group, there will be a stream of connected speech with at least one stressed word – the focus word (second tier). This is the word that carries some important meaning. Though there are different levels of stress in a sentence, and all words are important in a sense, the focus word has special importance for a few different reasons. First, for the speaker, it is the word that needs to be pronounced the most clearly. The other words can be somewhat “muddy”. (Proper stress and clear phonemic pronunciation are indicated by the third and fourth tiers respectively.) Second, for the listener, they not only have to pay attention to the focus word, but also be able to make sense of all the unfocused, unstressed speech that occurs around it.

This is where the challenges lie. The student focused on pronunciation must learn unstress and therefore connected speech so that their language flows – not necessarily because native speakers do it. Yes, native speakers use connected speech, but so do non-native speakers. In addition, connected speech occurs because it assists the mouth and tongue, which must rapidly change positions during speech. All languages have connected speech because of this. Though it may fly in the face of the idea of intelligibility, its essential for flow.

Admittedly,  the challenge is not so much for the production but rather the reception of connected speech. Listeners have the real challenge. A great article by Sheila Thorn (2009) discusses this idea at length. She states that most listening tracks are not only unscripted and unauthentic, but are used for modeling and introduction of vocabulary rather than actual listening training. According to Thorn:

The major flaw with this approach is that it has become too successful! Students tend to concentrate too much on constructing meaning from key words, often with a spectacular lack of success. This is because they are not paying attention to other non-content words, many of which contain essential information. (p. 6)

Even when the texts are authentic, they are not used very effectively, as it is automatically assumed learners will just “pick it up” after enough exposure. While relying on stress to pick out the keywords is important, it is not the only important goal of listening comprehension. That gibberish we heard in the video above? That stream of words that explode out of foreign language speakers’ mouths? That is the other goal. And it is note an easy one to achieve:

We need to view this skill as the ultimate objective for our students to attain, whilst accepting that they will only reach this objective at the end of a long learning road. Our role as teachers is to support our students as they take their first steps along this road and help them increase their pace. (p. 7)

Thorn’s article has a number of ideas on how to reach this goal. These include supplementing with authentic text and creating gap-fill activities that do not solely focus on lexis or content words, but rather the words that all students know but probably can’t understand in natural speech. In a class on pronunciation, with a pronunciation focus, or with a teacher who understands the importance of integrating pronunciation into conversation classes, this also means explicit practice in all aspects of connected speech: linking, assimilation, elision, etc. This practice, of course, should be both productive and receptive. Basically, these students should be doing a lot of authentic listening, analyzing and paying attention to connected speech rather than stress.

The peak of the prosody pyramid is contingent on correctly produced stressed focus words. However, without a firm base of connected speech on which to be supported, the peak is fragile. Only a limited sense of meaning can be gleaned from it. Therefore, a balance must be struck between the vast and wide base and the narrow and specific peak. That is to say, a balance between unstress and stress. Though it is quite a challenge, it is a worthy one – one which will likely improve all students’ listening and speaking abilities.


Gilbert, J. B. (2008). Teaching pronunciation using the prosody pyramid. Cambridge University Press. Retrieved from

Thorn, S. (2009). Mining listening texts. Modern English Teacher (18)2, pp. 5-13. Retrieved from

One thought on “Listening, Pronunciation, and Connected Speech

  1. 안소언 says:

    This movie clip was a quite shock for me. I thought that I have watched several American drama or any other Eglish listening materials until now. But I totally faild to understand the conversation of two people in that footage. They didn’t make an exact pronounciation like a dictionary form, English native speakers might understand what two people were talking about. I just picked simple or clear word. As a foreigner, I think I was not accustomed to those kind of pyramid structure, which natives have exposed to their rhythm and melody for their whole life. It seems that it will take long time to be accustomed to the unique sound and prosody of a certain language.

Comments are closed.