In science-fiction, a very common trope is the “universal translator,” which allows characters to magically understand any language as if they were hearing their primary tongue. The conceit is almost always some sort of fanciful linguistic tech (in the case of The Hitchhiker’s Guide to the Galaxy trilogy, it is instead a fish whispering in your ear, but I’m more concerned here with tech devices). The inclusion of these devices creates a more feasible environment for your actors and audience, since the actors can then just speak the same language in every scene without a loss in verisimilitude.
Watching TV recently, a stray thought occurred to me that I had somehow not considered in my decades of sci-fi fandom: why and how does a character, from whose perspective we’re seeing a given scene, see lip movements that match the foreign languages being translated for their benefit? This synchronization load would be an incredibly complex process in groups of people wherein multiple languages are being spoken individually.
Of course, you could claim it’s just something we have to suspend our belief on. Except, we’re already dealing with science fiction and nigh-magical tech here, and scenarios in which we are supposed to take occurrences on-screen at face value. So for the purposes of this discussion, let’s assume that we are seeing accurate perspectives on this aspect, too, for in-show technology.
Thus, the “why” of this visual trick is pretty simple to answer: the best user experience (UX) design is that which doesn’t make us think about it. Seeing a mismatch between lip movement and sounds we’re hearing goes against a very basic function of how we intake spoken language as a species, so it would make sense that for smooth adoption and operation of such translational devices across the entire population (or at least, its vast majority), we would need to somehow make any lip movements seen match what the user hears. If we were to allow visual and audio mismatches, such devices would likely always be relegated to niche user bases (e.g. international diplomats) due to the higher threshold for internal normalization, and never catch on at the level of widespread adoption as we see on these shows.
Considering, then, that these devices aren’t just acting as audio filters to meet user needs (supported by what we see on screen), universal translators in most science fiction are incredibly robust visual perception filters as well. Which goes back to the previous point of complexity in significantly multilingual groups: imagine the massive processing power needed to render these audio and visual filters accurately in real time for every single person in your conversation group when each is speaking a different tongue.
Hence, I would propose an alternate theoretical solution to the question of on-the-fly, real-time universal translation devices that would also have widespread adoption throughout the population. Bearing in mind that a seamless UX would necessitate a continuous match of sound to lip movements, and that we’re not discussing some fanciful sci-fi alien conversant that doesn’t even have lips or a similar component to their vocal apparatus, the first piece of the puzzle is the easy part: the most effective universal translator wouldn’t actually translate the sound of what you’re hearing. The physical aspect of linguistic communication would remain unchanged.
Instead, let us look at the basic model of how communication works inside a person on a semantic and representative level: semantic weight belongs to concepts and ideas; words just automatically map to those. The configuration of words (i.e. the actual sounds) are, at their root, a completely arbitrary abstract representation of those concept and ideas (thus, the proliferation of so many incredibly disparate languages). We have a brilliant capability to automatically translate those audio signifiers into their actual semantics in real-time. Language itself is an interlocutor for those internal units of meaning, a means to externalize concepts and ideas via audio representation so we can transmit them between each other. And that is the part of the process a universal translator should hook into.
A universal translator with an effective UX would hook into the process that turns words into meaning, a process that happens so spontaneously we’re not even conscious of it once fluent in a language. A proper universal translator shouldn’t bother translating into words in a language you know, so your brain can then translate those words into actual concepts and ideas understood internally; a proper universal translator would be an extension of your brain’s hardware that could directly translate any and all languages into the actual units of meaning our internal neural processes already understand.
Of course, a universal translator designed as such doesn’t work well for TV and film audiences, nor for actors on set, since it’s all theoretical, and we’re relying in each of those scenarios on traditionally transmitted audio between each other. So in the meantime, let’s just keep depicting it as the same spoken language; the loss in verisimilitude is worth the increase in practicality.
PS: This also reminds me of a conclusion I’ve come to regarding R2-D2 in Star Wars, and why they don’t speak in a way humans can understand: their internal semantic framework doesn’t map well at all to externalized units of meaning in any way that resembles our notion of language, nor to the traditional semantic framework our brains use. That’s why all we get from that rascal directly is beeps and boops, and we thus need a protocol droid, a GUI, or the like as a passthrough for accurate communication.