Penrose Stairs

Background

The American Dialect Society’s 2023 Word of the Year, ‘enshittification’, is now being applied to audio hardware. Ifixit.com recently assigned their prestigious worst-in-show award to Sennheiser’s Momentum True Wireless 4 earbuds on account of their planned obsolescence: batteries that users can’t replace, and that OEMs refuse to replace:

https://www.phonearena.com/news/worst-in-show-awards-ifixit_id154286

In regards size and audio quality, Bluetooth headphones have been improving in recent years, and it feels like the gap with wired IEMs is closing.

 

Penrose’s steps, made famous in many drawings by Maurits Escher, provide a rather close analogy for much of the in-ear monitor market in 2024. As with many nascent technologies, IEMs saw significant improvements in their first few decades, but have now matured and reached something of a plateau where meaningful improvements appear to be increasingly more difficult to achieve. This could be a problem for an industry that wants to maximize sales volume year after year.

Many decades ago (long before Apple discovered planned obsolescence) lightbulb manufacturers solved this problem by engineering bulbs with intentionally limited lifespans (https://spectrum.ieee.org/the-great-lightbulb-conspiracy). The audio industry doesn’t need existing products to break to push new sales. Unsound hyperbolic marketing and misinformation can simply exploit the psychological need for an upgrade and the fear of missing out. The belief among most audiophiles that you can always trust your ears allows IEM manufacturers to leverage influencer-driven hype, placebo and expectation bias, even if objective improvement is non-existent.

Consider a hypothetical selection of headphones, A through G. You’re performing a careful listening test to determine which headphone is the ‘best’, or at least, which constitutes an improvement worthy of your hard-earned pennies. For the sake of this experiment, let’s assume that each headphone’s performance can be ranked by the pitch you hear. Compare A vs B, and rank the ‘winner’ as the headphone with the higher pitch. After all, being able to reproduce higher frequencies is indeed associated with better resolution. Now compare the winner of this first test with headphone C. Now compare the ‘best’ of these first three headphones with headphone D, and so on. By the time you reach headphone G, you should have found the ‘best’ of all these headphones. Now compare that headphone directly against headphone A. Now which headphone is the ‘best’?

This illusion can be better understood by putting an octothorpe after the C, F and G (musicians will note it’s actually an A major scale transposed into the key of C). What’s happening here is strikingly similar to what happens with many product ‘upgrades’: some aspects get better and some new features are added, but at the same time other aspects get worse and certain useful features get removed.

Providing something which appears to be an improvement, year on year, is a rather easy game for headphone manufacturers. There are an infinite number of permutations to a headphone’s tuning and so an infinite number of nuances, but the following describes one of the more obvious tactics. Take a headphone that might sound a bit lifeless and dull and tune in a bit more bass and treble, producing a headphone that sounds more fun and engaging. Next year, flatten out the frequency-response curve to produce a more reference, audiophile-like sound. Rinse and repeat. Manufacturers and their online army of reviewers and sales affiliates can seal the deal by claiming that each new headphone has a night-and-day improvement in ‘technical performance’ – a claim that is, conveniently, so ill-defined that it can’t be refuted with any objective data. Placebo and expectation bias means that some reviewers may genuinely believe they are hearing such an improvement. Even if improvements are identified under blind test conditions, the unique nature of the individual’s HRTF and dependent on the prior used for comparison still makes the comparison subjective. Ultimately, this can lead us all round an infinite staircase that need have no overall upward trend. Indeed, this is pretty much where we see the state of the wired in-ear monitor market these days. Third-party marketing sites that rely on advertising revenue, free-samples, etc., have a vested interest in continuing this ad infinitum. 

We consider ‘technical performance’ to be the last bastion of unsound, pernicious marketing and this is why we view the objective assessment of total waveform error as a critical future measurement for headphones.

The utility of any waveform or ‘technical performance’ test is unlikely to be the discriminating factor when two headphone tunings are significantly different, but might form a tie-breaker for two similarly-tuned IEMs.

 

 

The Penrose Step Analogy

 

 

Summary

We believe we’ve provided strong evidence to demonstrate that headphones do indeed have a ‘technical capability’ – that is, they display a varying level of magnitude and phase error which is present at levels far more significant than that of harmonic distortion. We have named this ‘total non-tonal error’, or NTE for short. Despite the name and intent of the test, it appears that a headphone’s NTE is correlated with how smooth the headphone’s frequency response is – in particular, its departure from a flat frequency response. This is true with or without a minimum phase equalization correction to the waveform, likely because, regardless of the driver quality, certain phase errors simply cannot be untangled via DSP. This suggests that all headphones would benefit from a frequency response that avoids wild peaks and troughs in its amplitude vs frequency curve. This also alludes to the possibility of improved accuracy in future headphone design by more reliance on DSP for adjusting amplitude and phase, and less reliance on sound shaping via driver crossovers, dampers and resonance chambers. Further work would be needed to see if there is indeed a correlation between NTE and the various YouTube reviews that regularly tout headphone features such as ‘technical capability’. Incorporating psychoacoustic effects is likely a necessary next step given the size of the errors originating from existing headphone playback. Assigning any ‘technical capability’ score without verifiable and repeatable measurements and some demonstration of listener preference would run the risk of this being used (intentionally or otherwise) to hype overpriced, poorly-tuned products. Controlled studies would be needed to avoid placebo and other psychological influences on subjective reviews. Consequently, we would urge that our existing NTE database be considered as a beta feature and viewed with caution. A poor NTE score could definitely indicate a problem, but it could also indicate a phase error that might not be all that bothersome. Finally, as the Harman research group pointed out years ago with frequency response, non-tonal error also appears to have no correlation with the manufacturer’s suggested retail price.