structured data and the death of wysiwyg

This article was originally written for Advogato. It's reproduced here to gather all Conglomerate-relevant documents in one place.

Structured data and the death of WYSIWYG

By Joakim Ziegler

As commercial WYSIWYG word processors are close to dying from bloat, free software projects scramble to catch up, going in the exact same direction. But it might be time for structured data editing to emerge from the cloud of hype.

Today, almost all document editing is done in WYSIWYG applications. It's been heeded as the perfect way to write, where what you see on the screen exactly matches what you get on paper, and modern WYSIWYG word processors are extremely close to that ideal. It took a bit of work, though. In fact, in addition to computer games, the main driving force behind replacement and upgrading of hardware on the desktop has been Microsoft Office, with its endless cycle of upgrades and new releases running up the resource requirements, and when there was nothing much left for a word processor to do, they decided to suck up few more resources by adding animated assistants to the thing.

So, where are we? The stuff on your screen is damn close to the stuff you get when you print. The fonts are antialiased, your image files appear in full 24-bit glory when inserted, and everything is well. Or is it? I've worked in office environments where MS Word is the de facto, or even official, standard for writing. Watching people struggle to get their templates to work, all the letters to look the same, the document not to rewrap so there's a single line on the final page, how to synchronize the different versions of their documents, and generally just making sense of the inane way Word makes bulleted lists, makes me think there's a better way.

This is where the structured document buffs stick their head in. They usually come from technical writing backgrounds, and they have a remedy. Write the document in LaTeX, or an SGML application like DocBook, or something similarly structure-oriented, and let a backend take care of the formatting to paper. Trust the backend to be smart enough to handle everything, and there you are. Except... Where are the tools? "Oh, it's just plain text, you can use Emacs, or even Notepad." Well, how do I know what I just wrote will parse correctly? "Just save and do a test parse, and you'll get error messages that tell you what's wrong."

For normal users, this doesn't cut it. The two camps remain, the technical writers with their structured formats, and Emacs or, if their employer is rich, Framemaker/SGML, and the office workers with their MS word, userfriendly, as long as the macrovirus doesn't get you, and your hardware is fast enough.

But does it have to be that way? It seems that structured formats clearly solve a lot of the problems people have with WYSIWYG: It lets you concentrate on what you write, not how it looks, it's easy to get all your documents to follow a standard, and the semantics generally allow for smarter searching and archiving. But the tools are in the way.

What if we rethought the authoring process around structure? We have the formats, we know the semantics. And after years and years of Word resource use, most people have desktop processing power enough for almost anything. So how can we apply this, and maybe get a few completely new benefits along the way?

At Simplemente, the company in Mexico City I work for, we recently sat down and thought about this. It was in the course of developing a system for a client that currently used Word for a task it was entirely unsuited for, namely to write news bulletins for wire, paper and internet distribution. After some research, we figured out that XML would do exactly what we needed, but there were no tools suitable for end users who didn't have technical experience. So we figured out some ways of making one. Now that the system is in beta at the client site, we decided to make something more of it, and Conglomerate was born. The current codebase, which there are screenshots of at the site, and which we'll be releasing some test code of very soon, is a bit messy and suboptimal, because of limitations we ran into in the tools we used.

Here's a quick summary of some design goals, and what problems we ran into.

Problems and solutions:

These were some thoughts on what it'll take to get structured editing to the masses. We believe that once someone (not necessarily us, but we've started, at least) builds this framework, structured editing can get down from the hype, and nest nicely on everyone's desktop. All it needs is this, and some good example DTDs and transformation sheets, and everything from documentation writing to business letters to resumes becomes orders of magnitude easier to write.

I'd love to hear comments and suggestions on these ideas. I apologize if they're not as edited and polished as they could be, but the idea of peer review is to get feedback, so, feel free.