The 10 paradoxes of HTML

Marcos Sandrini
12 min readNov 18, 2021

--

Photo by 愚木混株 cdd20 on Unsplash

After having some feedback on my main article about HTML (and on the CSS one) and also after reading some other articles that may go roughly in the same way, I noticed that there is a lot of mutual misunderstanding when talking about HTML and saying why it would be good, or bad, or necessary or not. This may be directly influenced by the many paradoxical aspects of HTML that I notice, some of which were brought to my attention by the linked external article above, even by its contesting comments.

1. The design paradox

The few developers who master HTML in all its quirks and tiny caveats seem to like blaming a supposed stupidity of the vast majority of developers who are basically using HTML all wrong.

See, the developers using HTML wrong, which is indeed a verifiable fact, remind me of many very different things that, ultimately, resort to the same principle:

  • Why compliance to writing the “right” HTML is so low;
  • Why JS frameworks are so popular;
  • Why we invented high-level languages (one may argue that low-level machine code can do everything, and potentially better);
  • Any other design vs. features story like why the Apple’s iPad was massively successful when it first launched, despite its underwhelming specs at the time (and the lack of Flash, pointed as a capital sin at the time).

All of those are facets of why we should design things for humans, even if not by the hands of a “designer”. Very roughly speaking, one of the most important design principles is that things have to adapt to people, not the opposite. This applies to anything, including programming languages because, after all, programmers are humans, even if they try to behave like they aren’t.

When we say HTML can do mostly everything a content system for the Web requires, this is absolutely true. However, people who develop have lots of things to do and limited time and focus and, due to HTML being lax with mistakes (more about it later), most developers end up writing HTML code that severely lacks compliance. So much so that even the tools that relied on semantic information from HTML are moving on. Why? I have a possible explanation in my head:

  • First, semantic HTML was established with good practices for SEO and accessibility, but certainly too ambitious (and this may be the main paradox here): lots of different tags (or elements) and attributes for different situations, sometimes with overlapping or fuzzy definitions.
  • Developers, overwhelmed by the increasing complexity of the Frontend stack components (JS, CSS, HTML), massively failed to comply with HTML (which is the least critical of the three, as we’re going to address later) even with benefits like search engine optimisation.
  • Search engines (Google, mostly) started to see that, should they follow HTML semanticity only, they would not deliver to its users the best content, hurting its business, as very few websites did everything right. So, they switched to an approach where their crawlers behave more like humans when grasping website content, ignoring a great deal of the HTML semantics on websites.

A minority of developers that dedicate themselves to be totally proficient in HTML (or come from another era when HTML was more of a first need) call these “real-world devs” bad names because they don’t care to know the hundreds of tags and aria-* attributes and the subtle difference between them. As already implied, I cannot help thinking that HTML has an acquired design flaw, as it is too complex for its own good. More on that follows.

2. The affordability paradox

HTML was designed by Tim Berners Lee, the creator of the Web itself, to be an affordable language that would allow everyone to build their pages on the Web. This was surely achieved at the beginning, in the early 1990s, but then when the commercial Web boomed its capabilities were extended, via the extension of HTML and the creation of CSS and JavaScript. By then, non-developers without a lot of free time to spare or at least some programming knowledge got pushed out of web development, never to return.

This sounds like a failure, especially because both HTML and CSS are there instead of more “classical” programming structures arguably because of non-developers. Bottom line is, most amateurs who used to build websites from scratch with native technology in 1995 could not do it anymore in 2002, let alone later.

Web standards could have been tweaked to achieve a non-developer audience primarily. Or, instead, they could be rethought to fully aim at developers ditching the need for HTML and CSS entirely and giving JS better rendering capabilities, for example. It seems, though, that the “standards people” agreed on HTML to be in this unaffordable state where no one is the target audience of it and it is too hard for amateurs and too static for programmers.

3. The laxity paradox

HTML is not a true language in the sense that you may write it totally wrong and it will still display things on the screen. Its primary concern is the content, although some tags also have an influence on the presentation (which is the task given to CSS). One can write a page header inside a <footer> tag and nothing immediately bad will happen, like an error message showing up, the page behaving slower or some JS functionality not working. In fact one can even write content inside a tag that doesn’t exist at all, let’s say, <foo>.

Because of that laxity, a lot of developers just don’t care about gettings things right. Again, this is not because of stupidity, but rather for the lack of rewards. If there is no strong, compelling reason for most of these devs to do something right, it is not realistic to expect them to do it, especially in a universe inherently complex like Frontend development where you also have CSS and JS and each of them alone would be complex enough.

Even in the early 2000s, when JS was less of a focus and wrong tags could severely impact a website placement on Google results, the compliance was not too much higher than today: instead, many companies relied on search engine optimisation consultants, because they knew that the rules were little more than HTML and they knew how to apply those rules. A classical case of difficulties generating new kinds of jobs.

Now, with search rules that are much more organic, the major reasons for compliance are smaller search engines and especially accessibility, because Google can afford to pay AI technology to read the page as humans read, but screen readers are not up to that level yet. I dare to say, though, regardless of what could be ideally best, that every single service that relies on HTML semanticity will eventually have to change because the bulk of developers are not going to comply anytime soon and most websites will keep being having lots of divs and spans.

4. The compatibility paradox

All HTML versions are meant to be retro compatible and that sounds fine, but in the case of this markup, that has been rethought and repurposed, starting as one easy way to write text with links and then expanded ad infinitum this potentially brings a lot of issues.

For whoever is going to build actual browser engines old tags are a burden one has to carry, in order to keep the Web historically accessible. This, by the way, is not a bad thing at all, but it means that browsers of 2021 must render deprecated tags from as early as 1989 but while rendering them to maintain compatibility they must at the same time deprecate them somehow. In practical terms, this means practically nothing, as the tags must continue working because of older websites.

This hesitant and ineffective “deprecation diplomacy” also represents confusion for developers who suddenly discover that they can write deprecated tags and sometimes the functionality will possibly still be there. What will an inexperienced developer do if they find out a given deprecated tag works? Well, not only they may think it is a totally valid tag, as there is no difference in the result, it is possible that they just keep it there regardless, pushing the confusion to the future.

5. The “separation of concerns” paradox

At some point, all presentation related tags were deemed deprecated and “removed” from HTML (although not removed, as stated above). CSS was created to bear all presentation but all elements, even those without meaningful content, would still have to be created in HTML to be styled in CSS after. Even with what is called “pseudo-elements” and the non-semantic tags (the super popular div and span), the whole integration of HTML and CSS is clumsy enough to make people wonder what to do when they want to do things right.

Aside from that, for reasons already discussed, most developers couldn’t care less about separation of concerns. Successful UI libraries like Bootstrap, Foundation, Bulma and even more “extreme” ones like Tailwind openly defy the concept of concern separation in favour of a more practical approach, that seems to have a lot of support from developers, judging that using this sort of library today is very frequent, perhaps even the current standard.

6. The success measurement paradox

Flash, Java Applets, ActiveX: many technologies that were supposed to be adopted on websites and perhaps even replace HTML are now dead. For the people who stand for HTML like a religion, this is undeniable proof that, despite the hate, HTML is better than all other alternatives and it will, because of that, stick with us for a long time.

I think HTML will indeed stick with us for a long time. The reason for that, though, is less about its “success” and more about the way web standards work. The Web was designed to be decentralised, which is great per se, and even with companies like Google and Microsoft dominating the browser market, the standards were never subject to imposition regarding the big picture, at least not in the latest 20 years.

As part of this decentralised design, there is an elected committee that tries to balance the power among the biggest companies and institutions with tight relations to the Web. This committee (W3C) decides things on the Web and, for reasons natural to such structures, decisions tend to be very conservative.

Committees like this only approve changes via outstanding majority, which is rarely achieved for more significant changes. Also, they will never want to bear the weight of “killing” their own product, so they end up inadvertently holding things still as much as possible, which is easier in the case of HTML, a markup. Also, suggesting something to replace HTML or even something that would run in parallel would be a truly herculean task, involving lots of politics and the generalised fear of generating a new failing standard, which could put the very committee in jeopardy.

This is part of the reason why we are still on HTML 5, a standard from 2008, and why the Web, even heavily skewed towards app building, still requires us to have HTML “pages”, even if those serve the sole purpose of calling some JavaScript that will do pretty much everything.

Of course, HTML could have been worse, much worse. In this case, the implications would be so deep that we may even have had something else than the current Web, that did the same but used a different structure. Regardless, for all that has been said (and for all that will be said ahead) it is a gigantic stretch to state that HTML is successful and this success would be the reason why it is still there untouched.

7. The fuzziness paradox

HTML is a markup language composed of literally hundreds of tags, whose usage is sometimes not entirely clear, and attributes to those tags that are also often open to interpretation. The interaction between the tags and attributes is also subject to many rules. All those bring a good deal of uncertainty to what means writing HTML.

I have seen many times (and still do from time to time) people misusing tags based on perfectly possible assumptions like thinking one can only use header and footer on the top level and once per page and also thinking that to use an h4, for example, one has to use h1, h2, and h3 first. Those are just a tiny bit of what is possible to get wrong about HTML, which is a lot. Most of the time, pressed by the circumstances and stimulated by the fact that anything “works” in HTML people don’t bother reading docs and just move on, assuming their personal assumptions are true.

One can always read the documentation, but there are so many tags, attributes and interactions that the documentation ends up being extremely lengthy, which itself contributes to the low compliance to the “right” HTML we already discussed.

8. The content ownership paradox

HTML can be considered, because of the point above, to be somewhat open to interpretation. Even if this is not entirely true because there is an objective way to use tags according to the content, it relies on some degree of content classification, which is not objective. This is a big red flag from a programmers’ perspective: subjectivity is noise for a programmer.

It gets worse because the content classification itself is part of the content, even if a somewhat “hidden” part. This means to me that HTML ideally belongs to writers and content creators, and perhaps that was the original intention when separation of concerns was brought to the Web.

Nonetheless, in practice, HTML is always entirely in the hands of developers, who have neither interest nor the proper skills in content classification, or content in general. So, as the people who should handle HTML cannot handle it because it is too hard, it ends up in the hands of people who don’t care about its core business. After almost 25 years of a development career, I have yet to see people who earn money to write and maintain content putting their hands in HTML (and designers taking care of CSS too, for that matter).

9. The focus paradox

Because of the point above we fall into yet another paradoxical case. As the content owners are distant from the actual “content code” which is HTML, its use by the developers ends up being mostly very shallow.

We have to use HTML, even if just a tiny bit, to develop anything for the Web, even the most abstract non-textual application. Generally speaking though, regarding all aspects of web-related development, HTML is perhaps the one with the least focus and concern.

Not only developers don’t care about tagging the content precisely, but also there is so much to learn about HTML and it is so convoluted that people give up digging deeply on it and go for easy routes. Those can include using mostly or only non-semantic tags and using just a small subset of its long arsenal of tags.

Maybe HTML could have benefited immensely from a more generic approach, something akin to JSON, a very simple and very open format, aimed at being easy to comply with, also with some smart degree of expressiveness that could take into account differences in the content regarding types, importance and even language. Language?

10. The language paradox

This is a point a lot of people (especially native or fluent English speakers) will dismiss as a minor hassle, but I see it as a very important point that helps explain why the vast majority of HTML written is not compliant.

Statistics vary, but one optimistic source tells that 13% of the world speaks English, whatever “speak” means in this context. The percentage may be much higher among developers worldwide, but I wonder if the majority of developers are totally aware of the many subtleties of the English language, some of which are necessary for a better understand of HTML.

For a programmer working with a mainstream language or even JS, a lack of English knowledge is not a deal-breaker. One can learn some keywords like if, some built-in functions like slice or Math.round and that’s it, no need to really speak or read English well, or even to know its basic grammar, as long as the documentation is in their native language, which is usual.

However, HTML requires more, ideally. To be able to match the site content with the proper tags for it (which is subjective as said on the previous point) one has to bear some notions of English. One can argue that a function like slice and a tag like section fall into the same case, but I object to that: the subjectivity in HTML requires knowledge of English, even if not advanced, to match the type of content with the tags and attributes.

There are other problems too: first, the sheer amount of tags and attributes and second, the difference between those, sometimes fuzzy or just misleading, that makes even a fluent English speaker scratch their head.

Conclusion

In the end, much of the HTML issues come down to this dissonance between its complexity and its importance in the mind of most developers. This, together with the inherent error tolerance it has, makes for a situation of very low compliance overall. Outside of the committees and the closed guilds of few experts, down to the real world of web development, everyone uses HTML because they must, but no one is obliged to use it right and very few people actually aim for using it right, as we said, because its design doesn’t help.

Right now, neither we have any alternatives for it nor do people bother too much with having one, given that the current Frontend development standards usually put HTML to a secondary place, almost as a nuisance to handle without special care, somewhat distantly.

HTML would have benefited a lot from simplification, which would make it fit its current secondary role. In fact, as most developers just choose to ignore the complexity in it using its bare minimum and tools like search engines follow this trend to also ignore its semantic information, we are in a situation where HTML is already used as a subset of its own self, no matter how wrong this may sound to the very few that swim against the tide.

--

--

Marcos Sandrini

Designer and front-end programmer with 20+ years of experience, also a keen observer of the world and its people