gen_dw_format

Initial formatting for Dreamweaver pastes

Performs some basic string formatting on the HTML documents generated by pasting Word documents into Dreamweaver to help conform the documents to WCAG/WET standards. It assumes proper indentation (in Dreamweaver, you can generate this with Edit -> Code -> Apply Source Formatting).

HTML page of tool here.

The current formatting checks are as follows (the user can select which to apply). Checks that are generally applicable to all Dreamweaver-pasted documents are checked by default; checks that may only be useful for specific documents are unchecked by default.

List of checks

Format Dreamweaver footnotes:

Clean up spacing for coding style:

These don’t affect the document’s structural correctness for WET, but it’s helpful to keep your document tidy for visual clarity, string searching, and other coding purposes.

Replace word-formatted characters/html entities:

Join consecutive tags of the same type:

Replace/remove/format <br> tags:

Fix formatting around the period (.), comma (,), colon (:), and semicolon (;) punctuation symbols:

Replace em/strong with other tags for italics/bold:

These checks take precedence in the given order. For example, if both “change em tags to cite tags on lines that have links” and “change all em tags to i tags” are checked, then the first check will take precedence. So on lines that have links, em tags will be changed to cite tags, and on other lines without links, em tags will be changed to i tags.

Add/fix/remove tag attributes:

Translate structure for French documents:

Convert manually inserted fake tags to actual tags (see below for details):

I sometimes split regex statements up into multiple calls for clarity, but a lot of these checks can be done in one or two regex statements.

This is not intended to perform the complete WET formatting process and only covers the basic initial steps; further manual adjustments to the document may be required to make it fully WET valid.

Details on changing strings indicating tags to actual tags

Superscripts and subscripts

For the second last check, the tool looks for the following strings, which indicate fake tags for superscripts/subscripts:

and converts them to actual tags (so that we have a sup or sub tag at that location instead).

This check is to be used in conjunction with some manual find-and-replace in the original Word document before pasting into Dreamweaver. The idea is that when pasting a Word document into Dreamweaver, we turn off including styles because of the unnecessary css bloats it adds (you can turn including styles on/off in the Dreamweaver preferences, but they are off by default). However, superscripts and subscripts count as styles to Dreamweaver, meaning they get copied over as regular text; you have to manually insert them into Dreamweaver’s generated html document. The easiest way to do this involves marking down where these superscripts and subscripts are in the Word document before pasting.

For this tool, you should mark superscripts and subscripts in the Word document with <sup>, </sup>, <sub>, and </sub>, using the process described below.

This tool looks for strings indicating subscript/superscript tags in the html document (where the angle brackets <> have been converted to their html entities by Dreamweaver) and changes them to be actual tags. Afterwards, it joins consecutive sup and sub tags.

Steps to mark superscripts and subscripts in Word

  1. Open the “replace” box in Word (ctrl+h).
  2. Select “More »” for additional options.
  3. Check “Use wildcards”. This is a pattern searcher used by Word that is similar to regex.
  4. In the “Find what” box, enter ([!(^2)]). This is equivalent to ([^(^2)]) in regex, where ^2 is the character for a footnote/endnote in Word. In other words, we are searching for superscripts that aren’t footnotes/endnotes.
  5. While in the “Find what” box, select “Format”, then “Font”.
  6. Only check the “Superscript” box. Leave the “Subscript” box unchecked; all other boxes should be filled in, but not checked.
  7. Click “OK”. You are now searching for superscript text.
  8. In the “Replace with” box, enter <sup>\1</sup>. This is equivalent to <sup>$1</sup> in regex.
  9. While in the “Replace with” box, select “Format”, then “Font”.
  10. Only check the “Superscript” box. Leave the “Subscript” box unchecked; all other boxes should be filled in, but not checked.
  11. Click “OK”. You are now replacing superscript text.
  12. Click “Replace All” to surround all superscripts with <sup> and </sup>.

Afterwards, repeat these steps for subscripts, but use <sub>\1</sub> for step 8 instead.

Mathml

For the last check, the tool looks for the following fake tags:

and replaces them with actual tags.

Similarly to superscripts/subscripts, you can’t copy a Word document with equations into Dreamweaver and have the equations formatted properly (assuming you want them formatted as mathml). Since you need to actually copy the equations one-by-one to have them written out as mathml instead of the default linear format, this is best done with a macro, which can be found here. The linked macro replaces equations with their mathml code, which this tool then fixes the tags of once the Word document is pasted into Dreamweaver.

Adding checks

The steps to add a check that follows the tool’s current formatting/organization are as follows:

  1. In dw_paste_format.html: Add the new check into the form where the other checks are located. Different groups of checks are separated by two <br /> instead of one; put the check in whichever group you think makes the most sense.
  2. In dw_paste_format_helpers.js:
    • in set_default_checks(), set when it should and shouldn’t be a default check, based on how safe/useful it is, for English Word pastes, French Word pastes, English WET-formatted documents, and French WET-formatted documents.
    • In format_file(), create an if statement for the new check, positioned at the same place as where you put it in the HTML document. For the sake of consistency, put the logic for the check in a helper function even if it’s only one line.
    • Create the helper function for the logic below format_file(), positioned at the same place as where you put it in the HTML document.
  3. In README.md (this file): Add a description of the new check in the first section, positioned at the same place as where you put it in the HTML document.
  4. In sample_page.html: Add some text to test the new check with.