Digital Web Magazine

The web professional's online magazine of choice.

HTML5, XHTML2, and the Future of the Web : Comments

By David "liorean" Andersson

April 10, 2007

Comments

David Hammond

April 10, 2007 10:58 AM

This post has so many errors in it. You keep saying HTML 1.0 when you mean XHTML 1.0.

Tiff Fehr

April 10, 2007 11:20 AM

Okay, that was totally my bad. Search-n-replace gone amok. For the record, Liorean’s original article was correctly notated as XHTML 1.0, and I’m the one who bungled the versions. It’s now corrected. I apologize for the error. It seems I’m one of many who could benefit from an in-depth read of the article, too, to get my versions and spec history straight.

Thanks for the alert, Mr. Hammond! Again, my apologies.

Erland Flaten

April 10, 2007 2:02 PM

Interesting posting. I feel that this could be the Second World War of browsers. If w3c works with several standards and diffrent browsers suports a little bit here and a bite there. Do I have reasons to fear? maybe I just dont undestand everything here.

heart

April 10, 2007 10:28 PM

Well! i don’t think XHTML2 will be used anytime soon.
maybe HTML5 stands a chance since it offers backward compatibility.

Jonny Sweetman

April 11, 2007 1:28 AM

I agree with Erland, sounds like a recipe for more browser inconsistencies! It also sounds to me like two different groups are duplicating their efforts. Perhaps the goals are slightly different, but it will be interesting to see how XHTML2 and HTML5 will be integrated by browsers in the future.

Simone D.

April 11, 2007 1:50 AM

A really interesting article.
I think is clear that in the future we will use more and more browsers, why Microsoft doesn’t feel the necessity to support new standars that would be useful for herself?
I can’t understand that.

Alex Jones

April 11, 2007 2:58 AM

Good read, but I think you’re way off with putting “draconian error handling” down as a bad point of XHTML. Yes, it is an inherent characteristic of XML, but that can only be a good thing, as it keeps implementation extremely simple.

Personally, I don’t have any problems XHTMLizing untrusted content from Atom/RSS feeds or end-user provided content. Unfortunately, most peoples thinking in this regard doesn’t extend beyond “echo $post;”.

d3a1i0

April 11, 2007 3:49 AM

Great read! I can’t believe that MS is actually working on a next gen standard with the W3C. I thought that they hated each other or something since IE has never seemed to be interested in conforming to any standards. I also agree with Jonny where it sounds like two groups are duplicating their efforts, but that is not really a new thing when it comes to developing new technologies.

mattur

April 11, 2007 5:53 AM

Great article.

Perhaps the goals are slightly different, but it will be interesting to see how XHTML2 and HTML5 will be integrated by browsers in the future.

XHTML2 will never be supported by browsers. The continuation of XHTML2 development is solely a face-saving exercise for the W3C old guard involved. XHTML2 is irrelevant, and has been ever since Zeldman published his infamous Skyfall article in 2003:

“[XHTML2] may have tremendous benefits, but I

Adnan Ali

April 11, 2007 6:18 AM

XHTML2 was not a bad idea. It just happens that there are a lot of stumbling blocks in its implementation. As the article states, the worst fear is that XHTML2 documents will collapse in browser on the slightest hint of an error. Now this error can come from a page we gather information from or any other third party.

HTML5 on the other hand adds functionalities like add & drop and is more focused towards the transition of the world wide web into an application platform, while still keeping backward compatability.

Also, it is prudent that the W3C standardize to one standard, prefarably HTML5. Developers like to work with IDEs because they allow for flexibility and yet speedy work. If there are multiple standards, IDEs would not perform very well too.

Ric

April 11, 2007 7:21 AM

I have to say, some of the article is questionable at best, laughable at worst. I’m sorry but to try and introduce a standard called WA1 for web applications is just plain silly. We already have standards laid out in xhtml and dom so why add another level of obfustication around both of them and give it an almost identical name as the accessibility specification WAI simply shows a lack of knowledge of the client side.

stelt

April 11, 2007 7:57 AM

XML rules, IE sucks
therefore
get XML-ready, ditch IE
is the way to go.

IE could even have XHTML be its savior:
http://www.oreillynet.com/xml/blog/2006/05/why_xhtml_can_save_internet_ex.html
If not, less and less care.

chris adams

April 11, 2007 8:59 AM

No offense but stop pulling an Apple Iphone and talking about something that is not even being used.
I do not get why people like you write these articles about things that I cannot even start to learn.
At least Ajax is real.

j.p.

April 11, 2007 9:57 AM

HTML5 will legitimizes bad markup. Web browsers will forever more have to support the FONT and MARQUEE tags.

HTML5 will mean the death of Web standards. Web standards are about writing device independent, formatting free and accessible markup. HTML5 is the opposite of Web standards:

Tag soup – OK!
FONT and MARQUEE – OK!
Block elements inside inline elements – OK!
Incorrect sequence/nesting of H1 to H6 – OK!

Bye, bye Web standards!!!

Matthias

April 11, 2007 10:11 AM

Now…
can anyone explain in simple terms why serving xml as text content-type is so bad? I’ve seen different angles of the same discussion so many times, but something in my head doesn’t allow me to understand the core of the problem. Yes, IE doesn’t do it, that’s wrong, but that’s not what I don’t understand. Why is content-type an issue? Please don’t just post a link, a simple comparison with a real life situation or a paragraph would be much nicer.
Like: I send you a letter. I’d imagine the content-type would be something on the envelope, saying “there’s XML in here”, and the doctype something like the letterhead, saying “the stuff below is this kind of xml, here’s a dtd as well”. Where am I wrong?

j.p.

April 11, 2007 10:20 AM

Matthias, here goes….

When you send a letter, on the envelope you put the destination address. If the letter is being delivered by XML post office, the address must be absolutely correct. If the letter is sent through HTML post office, the address can be incorrect and the post office will make a best effort guess where it needs to be delivered.

So, back to the Web…. When content is served as XML, it needs to be well-formed. When content is served as HTML and there are errors in the markup, the browsers have go guess and auto-fix the markup. Sometime they guess correctly and sometimes they don’t.

HTML5 is in part trying to make sure that all browser “guess” or auto-fix in the same way.

j.p.

April 11, 2007 10:26 AM

HTML5 is in part trying to make sure that all browsers “guess” or auto-fix in the same way.

Oh, it’s a futile effort. Web browser vendors can’t even agree of correct behavior of the OBJECT element, use of ALT text or even support CSS in the same way. They will never agree or implement somethings as complex as error handling the same way. HTML5 is a futile effort.

Phillip Rhodes

April 11, 2007 10:43 AM

The last thing the Web needs is HTML 5. HTML 5 just perpetuates mistakes that were made in the past, and keeps us from moving forward to XML based formats. Let’s get XHTML2 finished and leave this legacy HTML crap to the dustbins of history.

David "liorean" Andersson

April 11, 2007 3:32 PM

Erland Flaten: I think the future is pretty clear. The HTML WG has no realistic choice other than adopting most of the HTML5 spec that WhatWG have produced so far. Most likely they will indeed be the same actual document, with Ian Hickson as editor. Microsoft made it clear that it CANNOT change ANY of the current parsing in IE. IE needs an opt-in so that users can say “this document is not built for the bugs in IE4-IE7, so use the best you can give me instead”. HTML WG will make HTML5, including a new STANDARD tag-soup parsing mode based on current browsers, a new HTML DOM and an XML parsing mode.

As for many standards, one of them will win out with browsers, though all of them likely will find SOME use. That one is HTML5. Browser vendors don’t really have a choice. Bug’s don’t get fixed on the web, and browsers have to be able to handle the web.

Jonny Sweetman: I don’t think the feature is bleak at all. All major browser vendors have a strong interest in making HTML5 good and not splitting it up. And the groups may not be exactly the same, but the Ian Hickson have made it clear he will keep the WhatWG spec a strict superset of whatever comes out of the HTML WG. As for XHTML2 vs. HTML5, the victor was decided the moment the W3C chose to recharter the HTML WG. That is the closest the W3C will ever come to saying “we were wrong, XHTML2 isn’t what people need, we’ll make amends”.

Alex Jones: No, draconian error handling is not only good. It’s not only bad either, to be fair, it’s just something that doesn’t fit into the current web. But I’ve made all those arguments before, for both sides…

j.p: The reason the object element is incompatible is the implementation Microsoft made. But remember that particular implementation has remained almost unchanged since before IE overtook Netscape in usage numbers. Microsoft is not prepared to break current web content without substantial financial threat in case it doesn’t. It is, however, prepared to improve in ways that won’t break current web content. IE7 and it’s slightly better handling of fallback content is a good example.

Now, I don’t think HTML5 is futile – all browser vendors are in the game now and they will eventually find a solution. It won’t be the most elegant solution. HTML5 will by necessity carry loads of baggage – cruft from HTML2, HTML3.2, HTML4.01 and from every major legacy browser since NSCA Mosaic. If things don’t work across different browsers then the browsers not behaving like IE either haven’t got many users or their developers haven’t seen a need (from users) to make sure they are interoperable with IE, probably because that feature is little used on the actual web or is used in a way that provides a good fallback behaviour.

But HTML5 will have several different conformance levels – HTML5 browsers will need to handle much content that is not legal in HTML5 documents. HTML5 standardises how browsers should handle ANY content, whether that content is valid HTML5 or not. That handling is simply a standardisation of what browsers do today, or in cases browsers don’t do the same thing, do what makes sense.

j.p.

April 11, 2007 4:34 PM

David “liorean” Andersson, why don’t you address the issue that HTML5 will mean the death of the Web standards movement. These standardistas have been a thorn in the backsides of browser vendors for a long time and this is a brilliant scheme to shut them up once and for all.

mpt

April 11, 2007 9:17 PM

j.p., probably because neither your nor anyone else described any such issue. And you still haven

David "liorean" Andersson

April 12, 2007 12:28 AM

j.p.: In what way will HTML5 mean the death of the web standards movement? I can’t understand how you arrived at that conclusion – HTML5 is not in any way a step in the wrong direction from HTML4.01. If I haven’t made that clear, HTML5 isn’t lax when it comes to either documents or browsers.

When it comes to browsers, the rule is that browsers should know exactly what to do when faced with valid markup AND know what to do in the way of error recovery for nonvalid documents. This is necessary for browsers to handle the web already, HTML5 just makes sure it’s standardised instead of like HTML4.01 pretending HTML is SGML when it in fact never was in browsers.

When it comes to documents, there will be conformance rules that are much more limited than what browsers must be able to handle. There will be and in fact already exists conformance checkers for HTML5, so authors can make sure their documents are conformant.

So, can you explain to me how and why this will kill the standards movement, seeing as how HTML4.01 haven’t done anything of the kind while it’s much more ambiguous and in some cases makes choices browsers just can’t implement for web compatibility reasons?

mattur

April 12, 2007 2:38 AM

j.p:
Using Web Standards does not now, nor ever has, meant using XHTML. The Web Standards Project (WaSP) group idiotically promoted this idea for several years, but quietly changed their line as of March 2006. They didn’t have the integrity to publicly announce they were wrong and had been giving out duff advice for the past few years, so there’s a small proportion of the web community who still believe WaSP’s facile Web Standards = XHTML mantra.

As David’s article points out, and some commenters appear to have missed, HTML5 will be available in HTML and XHTML serialisations.

Matthias

April 12, 2007 3:32 AM

j.p.,
thanks for that, but this is the bit I’m aware of. I wonder why some prefer the content to served as application/xml. Who benefits from that?

Other than that, (x)HTML5, sure, bring it on. But so far I’m not excited; although I’m not as fatalistic as j.p., I think that there are more important things to be fixed right now, and somehow I don’t see HTML of any flavour in that category.

j.p.

April 12, 2007 4:25 AM

Please don’t put words in my mouth. I never said Web standards equals XHTML.

The Web standards movement started because of inconsistent browser behavior and only a sub-set of HTML 4.01/XHTML 1.0 and CSS would work consistently across all browsers. As a side-effect, this sub-set of markup also was accessible and separated presentation from data.

If HTML5 succeeds, any kind of tag soup will render consistently. There is no longer a need for writing markup in a certain way to make it render consistently in different browsers. The Web standards message will loose its impact and Web standards movement will die.

mattur

April 12, 2007 10:11 AM

If HTML5 succeeds, any kind of tag soup will render consistently. There is no longer a need for writing markup in a certain way to make it render consistently in different browsers.

Just because errors are handled consistently absolutely does not mean the error is eliminated. The options for error-handling are:

a) Continue with the current model of vague, incomplete specs, where browser makers guess what to do and reverse-engineer each other’s behaviour. (Sub-optimal)

b) Use draconian error-handling, where any non-conforming page just displays an error. The problems with this approach have become abundantly clear: it’s a PITA with few, if any, benefits. (Impractical)

c) Standardise browser behaviour as far as possible (the HTML5 approach).

Our goal should not be to make authors pointlessly jump through hoops, it should be to make the web open to everyone. Low barriers to entry are A Good Thing.

The Web standards message will loose its impact and Web standards movement will die.

The killer for any standards movement is advocating poor standards that don’t match user requirements. They’re ignored. A good standard reflects the needs of the market, fulfills user requirements and offers new benefits, and this will drive adoption of the standard.

If this means the current Web Standards movement dies, well… good. Although well-intentioned, too often they substituted dogma and mindlessly parroting the W3C for independent thought and the hacker ethic. They stole our revolution, now we’re stealing it back :-)

Cristian

April 13, 2007 4:10 PM

HTML5 will kill Web Standards.

With XHTML2 web developers will need to create well-formed content with no errors in the markup and that’s the way it should be.

David "liorean" Andersson

April 14, 2007 3:03 AM

Cristian: Well, look at it like this – If you were a browser vendor wanting to render the current web, would XHTML2 help you to that? If you were a search engine, would indexing XHTML2 be enough to give relevant results to your users? If you were a CMS vendor, would generating XHTML2 be enough for your users with current browsers? If you were an authoring tool vendor, would outputting XHTML2 be enough for your users with current browsers?

The answer is no on all those. HTML5 aims at making it possible to render the current web using browsers written only for HTML5, while still improving the markup available to authors. It aims at improving the quality of new documents on the web by making the error handling reliably interoperable. The idea is that content shouldn’t have to do any user agent sniffing, and user agents shouldn’t have to reverse engineer other implementations in order to support current content.

Your perspective only covers new content written for a new spec using tools all written to make sure the new content is conformant. The HTML5 spec is intended to succeed HTML4.01, HTML3.2, HTML2 and legacy markup with a single specification both as an improved document language and as a backwards compatible standard user agents can be written against.

Vasil Rangelov

April 14, 2007 6:44 AM

Well, look at it like this – If you were a browser vendor wanting to render the current web, would XHTML2 help you to that? If you were a search engine, would indexing XHTML2 be enough to give relevant results to your users? If you were a CMS vendor, would generating XHTML2 be enough for your users with current browsers? If you were an authoring tool vendor, would outputting XHTML2 be enough for your users with current browsers?

The answer is no on all those.
Actually, if I was any vendor like those, draconian error handling will help me a lot as it makes implementation easy. I don’t have to implement cases where the code is not well formed, but decline it altogether instead.

What is the real problem with draconian error handling? The only good argument about it that I’ve seen is “user input” which is assumed to be out of the control of the developer. But if you think about it, it is.

Every application accepting user input should somehow implement error checks before submission and enforce valid writing. At least for security’s sake.

Furthermore, if the end page is well-formed XHTML, it could easily be validated on-the-fly before being sent and application specific error correction could be applied (though such would be difficult to make). The result for the user agent is still a working, well-formed code.

The article stelt gave a link to earlier describes pretty much the whole good of XHTML.

btw, if XHTML is not written with developers in mind, how come there are many who want it implemented? I guess there’s a difference between a “(X)HTML developer” and a “newbie (X)HTML developer”. The later is opened to anything, especially if it makes their job quicker (with no real benefit for the end user) while the first knows end users (no matter of their user agent of choise) come first.

It all comes down to implementation. When MS implements XHTML, we will see a variety of sites, some of which big and corporate ones, migrating. Until then, we should get ready by using valid XHTML and content negotiation to enable a single file rendered with two MIME types. Encoding problems you say? Well, specify them! Everywhere you can, even if it’s the default one! Other problems are avoidable with the compatability guidelines followed and good error checking of user input.

Austin Ziegler

April 14, 2007 5:12 PM

A lot of folks just aren’t getting it. Draconian error-handling is useful for site authors and implementors only.

If I visit a website and get an XML error on the site — especially as they are implemented now, where I don’t see ANYTHING on those sites except the browser’s error report — I will stop visiting that site. If, even worse, that XML error shows up because the site author has permitted user content (such as comments) and some bozo user who doesn’t know ANYTHING about web standards or XML, I’ll be even less inclined to visit the site again.

User agents should ALWAYS try to display something useful, and the rules of XML parsing do not permit that. HTML5 takes an important step: it stops independent implementations of tag soup parsers from doing error handling differently, by specifying HOW errors should be handled. It won’t be complete, of course, but it will certainly be infinitely better than what we have now.

Vasil Rangelov

April 15, 2007 1:10 AM

Well, if the site author has not been clever enough to make a good error handling on the user input, then they certanly deserve visitors that leave. A good developer will simply not allow such situation to happen and will forbig input that will render their page invalid/ill-formed. Period.

Austin Ziegler

April 15, 2007 5:49 AM

As has been stated, it’s not just user input — it’s input from other sites. It’s ensuring transformations are always 100% correct. It’s normalizing different encodings.

In a medium where display matters (that is, it’s intended for people), stopping early is the worst thing you can do. In a medium where correctness matters over display (that is, it’s intended for computers), stopping early is the best thing you can do. The web is a display medium, intended for people. You should never display an error message when you can — through a little fuzzing — render something close to the meaning.

XML parsing rules are far too strict for display. (See also PDF. Acrobat never stops displaying an entire document because it can’t figure out stuff on one page/section. Other PDF readers are more forgiving and will try to guess what was intended for certain specific subsets, like lines that are pathed but not stroked.)

The concept here is no different than providing your own 404 page rather than Apache’s, and trying to fail over gracefully so that the user doesn’t have to see a browser’s error page, either.

Barry

April 15, 2007 8:59 AM

I don’t think that it was the intention of the browser vendors to stop the Web standards movement because they continually embarass them in public, but I do think that the result of HTML5 will be the demise of the Web standards movement and it’s community. Pity, we were just getting started.

Vasil Rangelov

April 15, 2007 9:32 AM

If those other sites use XHTML as well, there’s nothing that could go wrong with fetching their content and importing it.. all easily. That’s what standarts are for. And if they are in (invalid) HTML... that’s what SAX processors (again a thing for the server, not the client to worry about) are for and furthermore, it’s another reason to convice the crappy third party site developer to create well formed and valid code.

And for normalizing encodings, you can easily do that on the server. Look at PHP’s htmlentities() function for example. I suppose other server side scripting languages have their equivalents to this, maybe even better ones. In the worst case, there will be a need for a third party program to run on the server to do this.

And btw, if encodings are messed up, the user would only see question marks (or squares or whatever) at that spot anyway, so I guess the least degradation a developer will have to make is to return those… pseudo characters… when the server is not able to convert them, that is if convertion application/API is not available.

dangitman

April 16, 2007 7:28 PM

David writes:

The HTML5 spec is intended to succeed HTML4.01

Don’t you mean “supercede,” instead of “succeed”?

Mit The Destroyer

April 18, 2007 12:40 PM

Actually, I’ve recently discovered how to serve application/xhtml+xml to IE… yes you heard me right we can serve application/xhtml+xml to get all the benefits of a semantic web… I’ve detailed my findings here: Fix for IE

Sean Fraser

April 22, 2007 11:30 AM

I cannot comprehend the comments of web standards’s demise. It will only die if those who practice it cease. These defeatist comments are similar to those stated when web standards was nascent.

Web standards will die only if web developers allow it to.

thacker

April 28, 2007 8:20 AM

Ladies/Gentlemen—

Very interesting article.

Most of the problems seem to revolve around two issues, Internet Explorer and existing Web content.

The W3C proposed HTML 5 standard, appears to me, is attempting to address years of developed content created by applications such as Frontpage [which was the most widely used Web content authoring tool] and other content development tools that fostered fubar code.

Frontpage, including Microsoft Word, have been abandoned by Microsoft as Web development tools. The damn things should have never been used in the first place.

I firmly believe that the next introduction of IE will support application/xhtml+xml, will include further [if not complete] implementation of CSS 2.1 and may abandon the haslayout silliness [although that may necessitate abandoning the Trident rendering engine]. I believe that sooner or later the Trident engine will be replaced in IE.

Keep in mind that litigation from the EU and the DOJ was part of the reason for a 5 year delay between IE6 and IE7. In addition, Microsoft abandoned the most widely use Web authoring tool – Frontpage. They also introduced IE7 as an automatic critical update and in the process, with full knowledge of the impact, alienated many business customers whose Intranets were poorly built around IE6 and break within IE7. In addition, Microsoft switched back to Word as the rendering engine within Outlook 2007. Microsoft Web products are embracing standards and their development products have the tools to generate standards based markup, albeit improvements are needed and those improvements are in the pipeline and under development.

The point I am trying to make is that over the next five years, the amount of fubar content on the Web will diminish because of the tools to create it are diminishing. The importance of HTML 5, I believe, is not as critical as Berners-Lee may believe but that it will continue to develop.

HTML 5 may become the defacto standard for hobby Web content and XHTML 1.0 – 1.1 will become the defacto standard(s) for professional Web content.

There is a place for both.

mattur

May 3, 2007 1:45 AM

HTML 5 may become the defacto standard for hobby Web content and XHTML 1.0 – 1.1 will become the defacto standard(s) for professional Web content.

Since HTML5 adds new, useful features, features we’ve been crying out for, for more than a decade: unlikely.

Diona Kidd

May 21, 2007 6:33 AM

This was a very interesting and timely article.

HTML 5 appears to address a realistic issue. Most of the web isn’t valid. The WHATWG group is not advocating the absence of standards. They are merely realistic about the current use of standards. A browser should just work.

Considering the ease of implementing HTML 5 vs. XHTML 2.0, it seems HTML 5 is the logical way forward. Especially if the W3C adopts some crucial parts of the WHATWG’s work, including.

Thanks for the write up. I’d be interested to know how you researched this article. It’s very well written and informative.

pauldwaite

May 29, 2007 1:45 PM

I’m not entirely sure about the backwards-compatibility of HTML5, due to the requirement to change doctypes. If you do that, Internet Explorer 6 will render your page in quirks mode, not standards mode, resulting in your CSS not behaving as expected.

I’m not suggesting changing doctypes should be optional for HTML5 documents, but I think its real-world backwards-compatibility is over-stated.

mattur

June 1, 2007 8:04 AM

Paul: the HTML5 doctype triggers standards-compliance mode in IE6, not quirks mode. Try it.

Anton Vesely

July 10, 2007 4:16 PM

YOU FOOLS! Was I wrong in expecting W3 to bring us together, to enhance our access to data which would allow us to advance and share our individual knowledge, in order to create a greater and better society where we could all benefit from sharing information?

As a former technical writer and instructional designer, I find these arguments a great waste of talent. I suggest we could all contribute a great deal more to humankind if we focused on the message, rather than the medium.

Anton

Anton Vesely

July 10, 2007 4:27 PM

It’s the best I have to offer, and I’ve been through the HTML wars when indexing Shop Manuals for the EPA at Mitsubishi Corporate in 1990. Sorry, but there’s no thing better than a pure ASCII search string for finding facts. Keep It Simple Stupid. No offense intended, but I’ll respond in kind.

Sorry, comments are closed.

Media Temple

via Ad Packs