Formats table of contents tables generated by pasting Word documents into Dreamweaver, and adds their links to the main body following WET standards.
You should run the input document through the general Dreamweaver paste formatting tool first to reduce errors resulting from malformed documents.
I’m considering integrating this with the Dreamweaver paste formatting tool, but I haven’t yet because unlike the currently pretty safe checks in that tool, this tool is likely to introduce HTML structural errors. Also, the code itself is a lot more involved than other checks.
Different Word documents produce very different content structures when pasted into Dreamweaver; I’ve tried to go through a few different cases and work out some patterns, but the inconsistency here means that this tool may not be generally reliable - be sure to double check the output and click through the table of contents links to check that they work.
<p> tag, and each entry is separated by <br>, <br/>, or <br />, e.g.
<p>Entry 1 <br>
Entry 2 <br>
Entry 3 </p>
<ul> tag, and each entry is its own <li> element, e.g.
<ul>
<li>Entry 1</li>
<li>Entry 2</li>
<li>Entry 3</li>
</ul>
<br clear="all">. So if the start/end lines are not provided, then the tool searches for a block of text between two <br clear="all"> that contains at least two instances of entry separators (described in the first input).<ol lst-num> is selected, then the WET table of contents might be formatted as so:
<ol class="lst-num">
<li>Entry 1
<ol class="lst-num">
<li>Entry 2</li>
<li>Entry 3</li>
</ol>
</li>
</ol>
<li>Entry........ 5</li><li>Entry . 3</li>This tool has the three following functionalities:
Individual entries in a WET table of contents table use formatting similar to this example:
<li><a href="#toc_3.1">3.1 Overview</a></li>
This entry should represent a header in the main document, which should then be linked to in the table of contents entry by ID. For the above entry, there would be the following header later in the document:
<h3 id="toc_3.1">3.1 Overview</h3>
Notice how the table of contents entry contains the link “#toc_3.1”, which links to this header with an ID of “toc_3.1”.
Functionality 3 (where tags/lines in the main body of the document are converted to headers that are linked to by table of contents entries) is very likely to produce false positives because it replaces all tags and lines, except for <li> and <td> tags, that have the same value as each table of contents entry. For example, if there is a table of contents entry containing “3.1 Overview”, then if there are multiple paragraphs later in the document that consist solely of “3.1 Overview” or “Overview”, all of them will be replaced, even though only one of them can be the actual header.
These false positives will have to be manually fixed afterward. Since the IDs will be duplicated as well, this will show up as an error in the HTML structure (IDs have to be unique), which should make them easier to locate in Dreamweaver.
The tool adds a comment above every tag/line that is replaced which consists of the original value of the line. This is to help with figuring out whether the replaced line was a false positive or not.
<li> and <td> tags in particular are ignored by the tool in this step because headers are usually not formatted as list or table data items, so those tags are almost always false positives.
In addition, since the entire tag or line is replaced, the resulting HTML may not be well structured. For example, it may find the following lines:
<p>Overview<br>
</p>
and only replace the first line, resulting in this:
<!-- Original tag: <p>Overview<br> -->
<h3 id="toc_3.1">3.1 Overview</h3>
</p>
which contains an extra closing p tag. You will have to go through and fix any errors with the HTML structure yourself afterwards; the comments containing the original values above the lines that have been replaced should help with this as well.
When cleaning each table of contents entry, the following tags are removed because they are usually introduced unnecessarily when pasting a document into Dreamweaver, and/or may produce formatting errors if kept.
I have noticed that Dreamweaver formats its table of contents either as a p tag separated by br, or a ul list (corresponding to the options for the first input of this tool).
If this assumption is incorrect, then the tool will not work.
As mentioned earlier, I have noticed that the table of contents tables are usually surrounded by two lines that consist of <br clear="all">. So if no inputs are provided for the start/end line positions of the table of contents in the HTML document, then the tool searches for a block of text between two <br clear="all"> that contains at least two <br>, and uses that block of text as the table of contents.
If this does not properly find the lines of the HTML document that consist of the table of contents, then you should manually enter the start/end line positions instead.
If the option for list numbering is selected:
If the option for list numbering is not selected, then header IDs are all formatted as “toc_internal counter”, e,g, “toc_1”.
In either case, the internal counter increments at each table of contents entry that uses it, to keep the IDs unique.
Most of the usefulness in the tool comes from its attempt at guessing the hierarchy of the table of contents, which it does using list numberings at the start of each entry. List numberings must be formatted with numbers and periods. The hierarchy of the table of contents is used for two things:
For indentation, each entry is compared to the previous entry to see whether it is higher or lower in the hierarchy; a sublist is created if it is higher, and the previous sublist is closed if it is lower.
For the header level, the tool checks how many times a period followed by a number appears in the list numbering. The lowest level is 2, for h2. For example:
Any initial list numberings are set to be optional in step 3’s regex statement that searches for tags/lines consisting of table of contents entries. So if a table of contents entry consists of “Overview”, both of the following tags would match:
<p>3.1 Overview</p><p>Overview</p>For entries that do not have list numberings, the level is set to be to be [the level of the last list numbering that did exist] + 1. If there have been no list numberings so far, then the tool uses a level of 2. For entries without a list numbering, the list numbering value itself is set to a blank string, so it will not be included in the entry.
For example, if the table of contents has the following entries:
Then the levels would be 2, 2, 3, 4, 5, 5, and 4.
Some documents may not have list numberings, or the list numberings may be misformatted (not formatted with numbers and periods), so the tool wouldn’t be able to indent the table of contents or choose header levels properly. In this case, you can manually add the list numberings into the Dreamweaver document after pasting it from Word.
For example, if you had the following table of contents entries:
and you wanted Definitions to be indented one level more than Introductions, then you could manually add in list numberings yourself:
If the option to remove manual list numbering is checked, then list numberings will be excluded from table of contents entry text, as well as the headers that replace tags/lines in step 3. So after generating the table of contents links and indentation with the manual list numberings, the table of contents entries would still be formatted as so:
Note that the list numberings will still be included as optional parts of the tag/line search regex used in step 3.
This option is ignored if the option to use list numberings isn’t also selected.
I have structured format_toc_helpers.js as follows:
For neatness, I split step 3 of the above implementation details into two helper functions. Only one of the two helper functions is called, depending on how the ToC indentation should be generated.
Both functions return an array with the same structure. Each value in the array is an object with four properties: - list numbering, which is extracted from the start of the entry’s content. - link id for the link to the header, created as described earlier. - indentation level. - the content of the entry, passed into clean_entry() to clean it.
These four properties are what is required to create a WET-formatted ToC entry. For example, suppose we have the following ToC entry:
<li><a href="#toc_3.1">3.1 Overview</a></li>
which produces the following header:
<h3>3.1 Overview</h3>
<h3>.This array of objects containing ToC entry properties is then used for steps 4 and 5.