Degunkifying Word Files

September 17, 2007 - 2:00am ||| 0 Comments | Add new

I think messy Word files have some sort of cosmic connection with ragweed. Over the past few weeks I've received an inordinate number of e-mails from people (half of whom I don't know and just found my site on Google) pleading for help with Word documents that misbehave when imported into InDesign or QuarkXPress.

And over the same period I've had to keep a bottle of Claritin and Windex on my desk because my hay fever is acting up and it's hard to read my laptop monitor with sneeze droplets all over it.

Or … maybe I'm just developing an allergy to bad Word files? There's a scary thought.

Here are my Handy-Dandy Tips for Degunkifying Word Files that I've been sending to these people. They're only needed when you're trying to retain the styling information in the Word file as you import it. (If you're stripping the formatting, then there's no gunk to de-anything.)

I'm confident that in most cases, these methods will do the trick, not just because I use them myself, but because often, the people who e-mailed me reply a few days later in an all-caps style with lots of exclamation quotes, such as (and I quote): "THANK YOU!!! Tip #2 did the trick!!!! You're a lifesaver!!!!"

Aww shucks.

Tip #1: Maggie The File
Did you know that Word stores all sorts of hidden information in its paragraph returns and section breaks? And in a Word document with no section breaks, the final paragraph return is also the de facto section break? These markers are particularly prone to carrying corrupted information, which can lead to strange effects in the Word document itself.

If you never actually open the Word doc in Word, you won't see the weird symptoms. That is, until you import the styled file into your layout program and then try to apply different styles.

Sometimes there's just one particular paragraph that won't "take" a style. In that case, open the file in Word and select everything in the problem child paragraph except for its final paragraph return. (If you can't see the paragraph return symbol in the text, click the Paragraph Symbol … the pilcrow! … in the toolbar to show them all.)

Then cut the selected text, leaving the empty return sitting by itself in the line. Delete the lone paragraph marker, hit Enter/Return to insert a new one, and paste the paragraph text back in. Do a Save As with a new name and try importing it into your layout again.

If you're having a problem with lots of text in the file (for example, importing it crashes the layout program … that's a problem) do the same procedure for all the text in the Word doc in one fell swoop. Open it in Word, choose Edit > Select All, and then deselect the final return by typing Shift-Left Arrow. Cut the selection and paste it into a new, empty Word document. You'll use this document from now on, though you may need to reapply a style to the final paragraph. Save the new doc with a different name and try placing that one into your layout.

Why is this tip called "Maggie the File?" I'm just passing along a bit of insider Internet lore. On one of the editorial mailing lists I follow — editors are the world's best Word experts — a long-time list member named Maggie discovered this marker/break factoid buried in a Microsoft web page, along with the fix. Maggie posted about it a long time ago, and ever since, whenever a user on the list posts a problem they're having with a Word doc, a stock suggestion from long-time subscribers is to "Maggie the file." Now you know.

Tip #2: Convert and Reconvert
Here's another method for stripping out document corruption and style confusion that I found myself on a Word support web site. It sounds eerily like the InDesign troubleshooting technique of exporting an .indd file to InDesign Interchange (.inx) format and then opening that .inx file back in InDesign to reconvert it back to an .indd file.

Apparently, saving a Word document as an HTML Web Page (in Word, choose File > Save As Web Page), and then converting it back to a Word document, does the same thing as the InDesign indd > inx > indd technique. To convert the file back to a Word .doc format, just open the HTML file in Word, and from the Save As dialog box choose Microsoft Word as the format. The trip out to HTML/XML strips out gunk, and because Word understands its own HTML format, it can resurrect all the styles and formatting applied when you re-save the HTML file as a Word file.

For a complete guide to fixing corrupt Word files, try this Microsoft MVP page:

Tip #3: Import/Export/Import
This is the tip I use most often when dealing with Word style bloat (unused styles that come along for the ride) in InDesign or XPress. It's especially useful when you choose to map Word styles to InDesign styles (a feature not available in QuarkXPress) so you have fewer styles to deal with. I think it's another way to Maggie a file, too, which would be handy if you don't own Word.

Here's how it works. Instead of importing the .doc file into the actual layout, import it into a new, temporary layout file, making sure it imports all the Word styles as well. You don't need to flow the whole file, just one frame's worth is fine, even if it has an overset.

Then click inside the text with the Type tool and export the story to Rich Text Format (.rtf), which is like a generic text file that retains all text formatting/style sheets, and if the layout program supports it, footnotes and tables. In InDesign, you do this via File > Export; in Quark, choose File > Save Text. Be sure to choose the Rich Text Format option from either program's dialog boxes, then name and save the file somewhere handy.

Now you can close your temp layout file, no need to save it.¬†

In the actual layout file, import the .rtf file instead of the .doc file. In most cases your superfluous paragraph and character styles will disappear, and any problems with the styled text will go away. By the way, don't bother saving the Word doc as an RTF file; it doesn't degunk nearly as well as exporting to RTF from within the layout program.

My hypothesis is that even when you set InDesign to ignore unused styles (which you can do in the Place Options dialog box), some are imported anyway because the final paragraph marker has them in its "history." So, exporting it to RTF clears that out — degunkifies it.


Post new comment

The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.