The W3C has a page with the original WWW proposal from Tim Berners-Lee. One of the downloads says
The "I can't test it" made me sad. There are two other files (an RTF version and an HTML version generated in 1998 from the original file). But can we open the original document?
The original document is 68,608 bytes and file on my Mac says it's a Microsoft Word for Macintosh 4.0 file. That matches with TBL's note on the W3C page saying: "A hand conversion to HTML of the original MacWord (or Word for Mac?) document written in March 1989 and later redistributed unchanged apart from the date added in May 1990."
Microsoft Office for Mac came out in 1989 with System 6.0. That was Microsoft Word 4.0 so we're looking for compatibility with Microsoft Word for Macintosh 4.0. Let's see what modern software can open this. What I really want to be able to do is open it and convert it to, say, PDF with high fidelity.
Microsoft Word
Let's begin with Microsoft Word itself. I uploaded the file to Microsoft OneDrive with the extension .doc and clicked on it to open it in Microsoft Word.
Apple Pages
I switched to the Mac and hoped that Apple Pages might understand an old Microsoft Word for Macintosh file. No such luck.
Apache OpenOffice
Next let's hope open source software will come to the rescue. I downloaded the latest Apache OpenOffice and it did open the file but the formatting is gone and the diagrams are missing.
LibreOffice
OK, maybe I need different open source software, so I switched to the latest LibreOffice and it opened it. And the diagrams are crisp! Although there's something weird about the margins and there are other formatting problems.
CERN PDF
CERN makes available a PDF version of the proposal which was apparently created in 1998 using Acrobat Distiller Daemon 2.1 for SunOS/Solaris (SPARC). It has 20 pages. The LibreOffice imported version has 24 pages.
To get an overview of what's different I created a PDF from the LibreOffice version and then looked at it and the CERN PDF in the contact sheet version in Apple Preview.
Here's the CERN PDF:
Here's the LibreOffice-generated PDF:
Things that are different:
1. The right-hand margin is missing in the LibreOffice version.
2. The LibreOffice version is using 14 pt vs. 12 pt for most of the text.
3. The LibreOffice version has turned headers with TBL's initials in them into footers.
4. The page breaks look in the right places (see how the images are correctly placed towards the end); thus it's probably the font size that's the biggest problem.
5. There CERN PDF has a space under the heading and the LibreOffice version does not.
Emulation
To make sure that I knew what the actual original document looked like I decided to use Infinite Mac to boot a 1990-era Macintosh and run actual Word for Macintosh 4.0 on the original document.
That way I can see actual fonts, font sizes and layout to confirm how the document should have looked. And that's where it became obvious that the original document on the original Mac and the CERN PDF are quite different. The CERN PDF has 20 pages. On the Mac running Word for Macintosh 4.0 with A4 paper it has 22 pages. So I decided to aim to get us close to the original document on the Mac.
So... set to A4 paper and set right margin to same size as left margin. Change the first page format to be different since it doesn't have the same gutters, footers or headers. Manually change the body text from 14 pt (and other sizes) to 12 pt. Manually deal with text that breaks across pages incorrectly and other alignment problems. Fix the footer that should be a header.
In the end I got pretty close to what's visible on the Mac.
Converting this document from its original format was a bit of a victory for open source software. And a lesson in how hard document preservation is. To help preserve it a bit, and in an open format, I've uploaded my .odt version to GitHub here. It's interesting, and a little disheartening to see that this 34 year old document is difficult to open, and even when opened the resulting output isn't exactly the same as the original.
PS If you're wondering why I ever started this project. I just wanted a high quality version of the diagrams in the original proposal for a presentation. Took me a lot longer than I thought it would.
PPS A comment on Hacker News pointed out that I could probably either create a PostScript file or a PDF via an emulated Mac. I was able to boot another Mac (System 7) that had Word from 1992 and Print2PDF (a driver that creates a printer that makes a PDF file) and print directly from Word for Macintosh 5.1a.
PPPS A Hacker News comment links to another conversion done using different versions of Word and bit of fiddling around to get a really nice version of the document in modern formats.
18 comments:
LibreOffice opens it right up. It's support for old document formats is really excellent. I keep it around for just this purpose. https://imgur.com/a/JENgq6V
But I also love using BasiliskII and InfiniteMac emulators!
It's pretty unsurprising that Apache OpenOffice has trouble, given that that fork of what used to be StarOffice has been dead more or less since it joined Apache (and since long before it left the incubator), and hasn't managed even critical security releases for many years: indeed this state of total deadness is why LibreOffice exists. They literally don't even have anyone left "working" on the zombie project who knows how to build it any more.
All it has is the name. It's a scandal that it's still there, misleading people into using something more than half a decade dead, and not redirecting people straight to LibreOffice.
(The fact that in more than half a decade they have neither done any releases nor retired the project nor even done the much simpler job of simply *redirecting people to the project that is still alive* despite many people begging them to says everything you need to know about the bad faith of the remaining "contributors". They appear to see OpenOffice purely as a thing to let them say on their CVs that they are on an Apache PMC. If it was finally retired they wouldn't be able to do that.)
This is great. It would be good to have a couple of upstream bugs in libreoffice to fix the remaining formatting issues, the document itself probably works as a test case.
I have a mac plus in the shed which probably still works and would be useful for this very sort of thing. Alternatively BasiliskII and word from here h
ttps://www.macintoshrepository.org/1110-microsoft-word-3-01-4-0-5-0-5-1-5-1a-personalize-word-1-0
This conversion software converts it great: https://archive.org/details/KeyViewPro
Here is the converted PDF: https://smallpdf.com/result#r=091f20f23de353fac21376a3a49a609c&t=share-document
not quite, look more carefully, for example it clobbers some of the diagrams. and that’s what the OP was originally after.
I’ll second the shout out for LibreOffice, I was in the same situation for opening my 1988 Masters thesis also published in Word 4.0 (not in the same ballpark of importance!) it was useful also for opening and converting all my old PICT files. It’s all I use for times I am forced to open a MS Office file.
Thanks for going to this effort.
Has a bug report been logged with the LibreOffice people?
Did you have any luck with pandoc? Just curious how it worked.
FYI, you aren't using Word 4.0 on that emulator. That toolbar was released in 5 if memory serves.
Even a far simpler document from roughly that time can be fairly hard to reproduce properly with modern software -- here is the tale of an ASCII table printer test from the Microsoft Word for DOS era: https://bsdly.blogspot.com/2013/11/compatibility-is-hard-chartestdoc-is.html
Despite what it says at the top of proposal.html, I don't believe this is the original (nor is it true that it was "later redistributed unchanged apart from the date added in May 1990".
I posted the details on HN: .
Despite what it says at the top of proposal.html, I don't believe this is the original (nor is it true that it was "later redistributed unchanged apart from the date added in May 1990".
I posted the details on HN https://news.ycombinator.com/item?id=39366960.
Thank you for walking us through this. This is why Digital Preservation is so important.
Looking through the original, I can see it used a LaserWriter for the original page setup. I mirrored the setup and printed to a PostScript and then generated a PDF/A. I also added the Type/Creator back into the original file and made a MacBinary that travels better over the internet. Then Word should "see" the file once decoded. Download here
I think mgiampapa is right. I tried this file in AusEaaSI Australian Emulation-as-a-Service Infrastructure Mac OS 7.5.5 with Word 4, which did not properly render proposal.doc file. In the same environment with Word 5, the file shows up with larger title font and nice diagrams.
Cynde Moya
PDF (an old and a current version) are probably best. But maybe you should keep a .txt and a .rtf version of the document(s) as well.
I was able to open the file in the newest desktop version of word, and it seems to render the formatting, layout, tables, etc. just fine, however the figures are replaced by empty boxes. This is accompanied by an error stating that "Word cannot display this picture format because the correct graphics filter was not found" when opening the file, and when clicking on the image, "The program used to create this object is Microsoft Word Picture. That program is either not installed on your computer or it is not responding. To edit this object, install Microsoft Word Picture or ensure that any dialog boxes in Microsoft Word Picture are closed."
I feel like if I am able to get a copy of this "Microsoft Word Picture" or otherwise a way to get word to read this format (in the form of an addon perhaps?), the figures could work fine too without needing emulation or using an ancient version of Word, however a few google searches proved unhelpful in finding a copy and I gave up.
Agree with Sarath. I was able to open it in Word 2016 after adding the location I had it saved to the Trusted Locations in Security Center. After that it's just the image that won't play nice. "Word cannot display this picture format because the correct graphics filter was not found," which might be easy enough to fix if I had my MS Office CD handy.
Post a Comment