This website uses cookies to ensure you get the best experience on the site.
My academic background in linguistics and phonetics informs all of my voiceover work. It shapes how I analyse scripts, how I make performance decisions, and how I think about the listener. I believe this background is relevant not only to clients — who often want to understand why a performance works — but also to other voiceover artists, as it provides a clear descriptive framework for working with spoken discourse rather than relying on instinct alone.
Discourse Intonation: Structuring meaning in speech
Building on a long-standing interest in phonetics, my studies for my Master's degree at the University of Birmingham were based on the model known as Discourse Intonation. This approach offers a structured, replicable way of describing how speakers use intonation and prominence to signal what is new information for the listener and what is shared or assumed knowledge at any point in spoken discourse — whether in conversation or monologue — as well as providing an explanation of why we choose particular intonation patterns.
Central to Discourse Intonation is the idea that intonation in English reflects continual shifts in what matters most at any given moment. When an idea is introduced for the first time, it is typically proclaimed as new information: it is given prominence, highlighted as important, and spoken with a falling tone. Once that idea has been established, it becomes shared knowledge. Subsequent references are usually non-proclaimed, lacking prominence and often realised with non-falling intonation.
As discourse unfolds, this balance is constantly changing. What was new information a moment ago becomes common ground now, and prominence shifts accordingly. Intonation is therefore not decorative or emotional by default — it is fundamentally organisational.
Applying linguistic theory to voiceover practice
In voiceover terms, this framework directly informs decisions about which words to make prominent — what voiceover artists often describe as "hitting" particular words. This is not a trivial choice. It frequently goes beyond the literal text on the page and requires consideration of what the audience is likely to know, expect, or assume at that point.
Getting this right — particularly knowing which words not to emphasise — is what makes a performance feel intelligent and alive rather than mechanical. When unnecessary prominence is removed, the listener feels that their prior knowledge is being respected. They are being included in the communication, rather than lectured at.
Getting it wrong can be surprisingly damaging. Emphasising to many "important" words without regard for the listener results in everything being portrayed as important — in which case nothing is important! This style of delivery can quickly become monotonous and patronising and result in a performance that feels heavy-handed, dull, or even faintly insulting — as though the listener's understanding is being ignored.
Emotional speech and vocal control
Following my Master's degree, I completed a PhD at the University of Reading, where I investigated the slippery subject of the effects of emotional arousal on speech.
Rather than relying on acted portrayals — which was almost universally the approach in the field at the time — I used real, unscripted emotional speech as my data.
One of the key findings was just how context-specific vocal cues to emotion are, and how dependent they are on the type of interaction. I argued that we can never be certain whether the emotions we perceive in speech are genuine. Humans are capable of controlling, exaggerating, disguising, or suppressing emotional expression, and different individuals may respond very differently to the same event depending on age, experience, personality, or training.
Analysing — and learning to reproduce — large quantities of emotionally coloured speech became, in effect, an intensive course in acting. The difference was that it was underpinned by a rigorous descriptive framework rather than intuition alone. That combination continues to inform how I approach emotionally nuanced voiceover work: not by assuming emotion, but by understanding how it is constructed and perceived in context.
What This Means in Practice
In practical terms, this approach leads to recordings that sound purposeful rather than formulaic. Decisions about emphasis, pacing, and tone are made in response to the text and its context, not by applying a pre-learned "house style". The result is speech that moves information forward naturally, respects the listener's intelligence, and avoids the artificial rhythms that often arise from rule-based delivery.
For clients, this usually means fewer revisions, clearer communication, and a performance that does exactly what it needs to do — no more, and no less.
Final thoughts
My final takeaway from my studies was that there is huge variety in what may be acceptable in voiceover performances. Despite what I say above, there may well be a place for the "hard-sell, emphasise (almost) every keyword" style of delivery — if that's what the project demands.
A criticism I have of the many voiceover training courses and sessions I have attended is that, although all useful, they can be too dogmatic, giving the impression that one approach - or a very restricted set of approaches — are the only ones possible. In every script I encounter, I see a huge variety in possible approaches and spoken styles which others may perhaps not even have considered.
Voiceover is not a fixed technique to be learned, but a flexible practice grounded in understanding how speech actually works: judgement, not doctrine!