24 May 2022

File format inflation

“Why do there have to be so many?” a friend asked me recently. All she wanted to do was save an image (simple, right?) but she faced a bewildering array of possible file formats: .png, .jpg, .gif, .tiff, .tga, and on and on. How’s anyone who’s not a file format expert meant to choose?

When file formats breed

Sometimes you need a new file format. Newer image file formats such as .webp, .avif and .apng give higher quality images & animations at smaller file sizes.

With any technological development, however, it’s worth asking: who benefits?

Take the transition to the .webp image file format:

  • we all benefit from better-looking web browsing on our phones without blowing through our data plans;
  • browser developers, if they’re quick to support the new technology, benefit from appearing a step ahead of their competitors;
  • web developers, if they’re quick to adopt the new technology, also benefit, because their web sites will load faster, so Google will rank them higher, so they’ll get more traffic.

We could stop there: everyone benefits! But we’d be guilty of benefitism. Yep, that’s a made-up word. Benefitism is the surprisingly common vice of touting only the benefits of whatever it is you’re peddling, and wilfully ignoring the costs.

Who bears the costs?

If we’re to avoid benefitism, as well as asking: who benefits? we need to ask: who bears the costs?

  • browser developers bear the cost of supporting the new image file format – though for organizations the size of Mozilla, Google, Microsoft and Apple, that’s no big deal;
  • web developers, too, bear the cost of adopting the new image file format – this is a much bigger deal, since there are millions of developers around the world, each of whom has to learn about the new format, convert all their images to that new format and release updated versions of their web sites (if they don’t, their web sites will load no faster than before, so Google will rank them lower, so they’ll get less traffic);
  • for the rest of us, it may seem that there are only benefits, but in reality, the web developers have no choice but pass on their costs to us, in the form of higher prices or more annoying ads.

Never neglect the second-order effect

As my grandmother always used to sing:

Never neglect

the second-order effect

It was her favourite nursery rhyme.

Sorry, I’m degenerating from made-up words to made-up family heritage here. Get a grip, Mark.

The long-term effects of ever increasing complication in technology include increased specialization, increased reliance on software and increased barriers to entry.

If there are only a few image file formats, every engineer and designer can understand them all. If there are too many image file formats, only specialists will understand them. This is how our knowledge of technology becomes fragmented, with legions of specialists understanding mere slivers of the technology stack, and no one able to grasp the big picture.

Of course, most of us don’t need to understand all the different image file formats: we can use software to save an image in any file format we choose. That lands us back where we started, with my friend facing a bewildering array of possible file formats, and asking: “Why do there have to be so many?”

It’s actually worse than that: the ever increasing number of image file formats forces us to buy ever more complicated software at ever decreasing intervals just to be able to handle every image we might find on the web in some new file format or other.

Most pernicious of all are the increased barriers to entry. Back in the early days of the web, it was possible for one person (such as Pei-Yuan Wei) to develop a web browser (such as ViolaWWW) in no time at all (Wei created a version of ViolaWWW for X terminals in the year 1990). Try doing that these days.

Don’t get me wrong: the advances in web browsers since those days have been extraordinary and extraordinarily beneficial.

I have no problem with the .webp image file format. It complicates things a little for us web developers, but that’s fine. It makes the web faster, and that’s good.

File format inflation under the radar

The worst instances of file format inflation, unlike the introduction of the .webp image file format, often fly under the radar.

I ran into one last week. I use elevation data from the US Geological Survey (USGS) to carve three-dimensional mountain maps out of wood.

Elevation data is simple: imagine a two-dimensional array of numbers, each of which is the elevation in metres at a particular point on the map. It’s no more complicated than that.

Traditionally, such data has been stored in a file format called GeoTIFF. It’s an unnecessarily complicated format for a simple array of numbers, but it has the advantage that it’s based on the TIFF image file format, which means that it can be opened in any image processing software.

Recently, the USGS ditched the GeoTIFF format in favour of Cloud Optimized GeoTIFF. The new format has the benefit that it supports downloading just part of a file, rather than the whole file. This can be useful, given how large elevation data files can be.

So, that all sounds good, right?

Well, yes, if you consider only the benefits.

The cost to me, personally, was having to shut down my map-making business while I figured out how to handle the new file format. Not only is Cloud Optimized GeoTIFF an even uglier format than GeoTIFF, which is an uglier format than TIFF, which is a pretty ugly format itself; it no longer even has the advantage that it can be opened in any image processing software to be converted to a less ugly format. None of the software I’ve previously used to handle GeoTIFF can handle Cloud Optimized GeoTIFF.

If I were cynical, I’d say that’s the whole point. Like all file format inflation, the move to Cloud Optimized GeoTIFF increases reliance on software, in this case the expensive mapping software that can handle Cloud Optimized GeoTIFF.

Cloud Optimized Capture

When the USGS first ditched the GeoTIFF format, I spent a couple of weeks rewriting my wood-carving software to handle a simpler format, GridFloat, which mercifully encodes elevation data as the simple array of numbers it is.

Then, a couple of weeks ago, the GridFloat file format disappeared from the USGS web site, along with every other file format except Cloud Optimized GeoTIFF.

It’s as if Google, developer of the most popular web browser, Chrome, suddenly stopped supporting .jpg files, along with every other image file format except .webp. Every image on every web site, other than the few that have already adopted .webp, would vanish overnight.

My map-making business is too small for me to afford the expensive mapping software now needed to read the USGS data. I’ve now found a way to convert it to a simpler format, but my business remained shut down for some time while I worked on this workaround.

If the impenetrable Cloud Optimized GeoTIFF had been the only file format available when I first started carving three-dimensional maps, I might never have got started. This file format inflation, at least, has increased barriers to entry for small map-making businesses like mine.

Never neglect the second-order effect.