Digital media standards for improved user experience

Read time: 6 mins

Standards are a vital component of communication. In 1984, while working for CSELT, the research center of what is today Telecom Italia, I submitted RACE IVICO (Integrated Video Codec), a project aimed at developing a European microelectronic technology for digital video in partnership with representatives of most relevant European industries. The project was approved but two years later was terminated because of the jarring differences with the European audio-visual policy of that time (digital audio-video planned to play a role in the first decade of the 21st century), but also because digital audio-visual was assumed to play a major role in the broadband strategy inside a Telco like Telecom Italia.

A year later, seeing that it was not possible to develop a European microelectronic technology for audio-visual, I decided that an international standard could at least be developed and, in 1988, I established the Moving Picture Experts Group (MPEG), a working group in ISO/IEC JTC 1 Information Technologies.

“Standard” is a well-used and misused word, but not all standards are the same. In the case of audio and video there should not be separate standards for audio and video but unified audio-visual standards for all client industries: Broadcasting, Consumer Electronics, IT and Telecommunications, a single standard for digital representation of audio-visual information separate from potentially different delivery standards. I believe this is the main reason for the success of MPEG standards.

Several definitions of standard can be found:

  • Webster’s
    • A conspicuous object (as a banner) formerly carried at the top of a pole and used to mark a rallying point especially in battle or to serve as an emblem
    • Something that is established by authority, custom or general consent as a model or an example to be followed
  • Encyclopaedia Britannica
    • (Technical specification) that permits large production runs of component parts that are readily fitted to other parts without adjustment
  • My definition
    • Codified agreement between parties who recognise the advantage by all doing certain things in the same way

One common claim made against standards is that they are anti-competitive and stop innovation. This may be true in other fields but not in MPEG as can be seen from the performance verification tests carried out in 1995 on MPEG-2 Video, which showed that coding was subjectively transparent at 6 Mbit/s for composite (PAL) and at 8 Mbit/s for components (YUV). The bitrate originally selected for operation was 4 Mbit/s, but today MPEG-2 is typically used at 2 Mbit/s, without changing decoders.

This was possible because MPEG standards specify the decoder (which provides the ability to reach customers) but are silent on the encoder, whose only constraint is the ability to produce conforming bitstreams.

Standards are an important component in the chain that bring innovation to consumers. An innovator is in a position to file a patent that has value per se, but has a greater value if the patent is within a standard. Since the goal of MPEG is to produce standards yielding maximum performance, licences are typically required to exercise MPEG standard. Royalties allow an innovator to continue innovating and filing other patents for use in new standards. Indeed, MPEG standards do not stop innovation.

Many industrial users are concerned by the amount they have to pay to exercise a patent in a standard, but that should not necessarily be the highest concern because often it is not so a matter of “how much” but of “how”.

In the analogue world patent remuneration used to be typically “apiece” and, in the digitised MPEG world of MPEG-2 remuneration is still per piece of “electronics” (but also per piece of “content” on a DVD). In the digital MPEG-4 Visual world remuneration is per piece of electronics but also per hour of pay streamed content. This licence clause has prevented for years adoption of the standard for pay video services on the web.

Use of digital technologies was hampered for many years by the large bitrate involved in digital audio and video as shown by the tables below, which give indicative nitrates:




Frame freq2525255050


Sampling freq.844.1484848


BaseScalableStereoDepthSelectable viewpoint
MPEG-4 Visual-25%-10%-15%--
MPEG-4 AVC-30%-25%-25%-20%5/10%

Fortunately, there has been a constant progress in compressing digital audio and video while preserving the original quality, as shown by the table below:

In the “Base” column, percentage numbers refer to compression improvement compared to the previous generation of compression technology. The percentage numbers in the “Scalable”, “Stereo” and “Depth” columns refer to compression improvement compared to the technology on the immediate left. “Selectable viewpoint” refers to the ability to select and view an image from a viewpoint that was not transmitted.

In this context, it is interesting to inquire about the bitrate between eye/ear and the brain. There are about 1.2 million nerve fibres connecting the retina to the brain and about 30 thousand nerve fibres connecting the cochlear nerve to the brain. One nerve fibre can transmit a new impulse every ~6ms, i.e. it can generate 160 spikes/s. Assuming that 16 spikes are needed to make a bit we see that one eye sends ~12 Mbit/s and one ear sends ~300 kbit/s to the brain, as depicted in the figure below:

Video can take many forms:

  • Scalable video gives the possibility to extract different streams at different bitrates from a single bitstream
  • Multiview video is a signal that is generated by an array of cameras capturing the scene so that a user can see a scene from different viewpoints (possibly by interpolating existing views to create a view that was not captured and transmitted)
  • Screen content is a type of natural video that is mixed with graphics
  • High Dynamic Range seeks to extend the maximum brightness achievable on today’s displays beyond the usual not 100 nits (cd/sqm) and go to several thousand nits
  • Wide Colour Gamut is a system that is capable of reproducing a much larger set of colours that it is possible today
  • Augmented Reality is the integration of 3D natural and synthetic video and audio (and more)

We have seen that human eyes perform sophisticated processing to convert Pbit/s of visual input information to some output Mbit/s. Compact Descriptors for Visual Search (CDVS), a standard for video search, analysis and detection applications being developed by MPEG tries to do something conceptually similar. Applications for this standards are manifold and extend to mobile, automotive, SmartTV, surveillance, equipment maintenance, robotics, infomobility, tourism services, cultural heritage etc.

On the 10th of June, from 14 to 17 at Via Sannio 2 Milan, the Italian Institute for standardisation UNI will host an event ( titled “ISO/IEC artificial vision standards for new services and industrial applications” and organised by UNINFO, the entity federated with UNI and delegated to handle Information Technologies and their applications.  

In conclusion, it is important to remember that standards are (just) enablers because how to benefit from standards is the real issue. In order to address this question, it is important to ask ourselves whether Italy is able to

  • exploit the Intellectual Property of standards;
  • capitalise on the standards-enabled (hardware and software) manufacturing;
  • have a holistic view of the entire process.

I suggest to have a look at how Digital Media in Italia ( endeavoured to “define and propose action areas that enable Italy to acquire a primary role in the exploitation of the global digital media phenomenon”.

altri articoli

La sinfonia n° 13 di Šostakovič

L'annientamento della popolazione ebraica sul posto, al di fuori dei campi di sterminio, che i nazisti perpetrarono nella loro avanzata verso est dal mar Baltico al mar Nero, copre numericamente quasi la metà della Shoah. Babij Jar è un burrone non lontano da Kiev che, quando i tedeschi occuparono la città ucraina nel settembre del 1941, divenne la tomba della popolazione ebraica residente. E di intellettuali, partigiani ucraini, soldati prigionieri, addirittura calciatori della Dinamo che non si erano voluti far battere dalla squadra delle Forze Armate tedesche, ladri comuni, decine di migliaia di rom. A questo luogo, o meglio a ciò che rappresenta, Evgenij A. Evtušenko dedicò un poema, i cui versi sono stati immortalati dalla loro inclusione nella sinfonia n° 13 di Dmitrij D. Šostakovič. Poiché la sua intenzione era di rendere omaggio alle vittime innocenti non solo del nazismo, ma anche dello stalinismo, Šostakovič chiese poi a Evtušenko altri testi da introdurre nella sinfonia, che furono poi modificati su pressione di Nikita Chruščëv: “Vorrei scrivere una sinfonia per ciascuna delle vittime, ma è impossibile ed è per questo che dedico a tutte loro la mia musica”.
L'articolo di Simonetta Pagliani in occasione del Giorno della Memoria.
Crediti immagine: armenanno/Pixabay. Licenza: Pixabay License

La prima pattuglia russa giunse in vista del campo verso il mezzogiorno del 27 gennaio 1945. Fummo Charles ed io i primi a scorgerla: stavamo trasportando alla fossa comune il corpo di Sómogyi, il primo dei morti fra i nostri compagni di camera. Rovesciammo la barella sulla neve corrotta, ché la fossa era ormai piena, ed altra sepoltura non si dava: Charles si tolse il berretto, a salutare i vivi e i morti. Erano quattro giovani soldati a cavallo, che procedevano guardinghi, coi mitragliatori imbracciati, lungo la strada che limitava il campo.