What Is Thoal?

Thoal is a name derived from the acrynymic spelling of the title The Heritage of American Labor. Thoal may refer to the ebook of that title (capitalized proper name), the software created to format the precursive HTML form of that ebook (all-lowercase proper name), or the Web service using a variant of that software (general term). The software thoal is a batch (i.e non-interactive) HTML analyzer and editor, and not very sophisticated.

The software is intended for HTML generated by a word processor and so HTML used for static documents. Thoal cannot process all valid HTML documents. Notably required are explicit start and end tags for the HTML elements html, head, and body. Also, HTML5 does not always require elements of types a, del, ins, and map to nest (as far as I can tell), but thoal requires strict nesting and no omitted start tags. The edits made by thoal are minimal such that a textual comparison will generally be useful, as with the diff utility.

The service Web page uses these abbreviations:

LO5
LibreOffice 5.x
WD13
Microsoft Word 2013
KDP
Kindle Direct Publishing
Epub is a popular ebook format, not an ad hoc acronym.

As you may have guessed, thoal was developed specifically for HTML generated by LibreOffice 5 and Microsoft Word 2013 and to format HTML submitted to Kindle Direct Publishing. Those word processors were the ones available to me. LibreOffice 6 was released in the summer of 2018. I admit candidly right here that I did not get the results I wanted for the product previews with KDP and Barnes and Noble Press. I used Sigil to create an epub2 format for submission to Barnes and Noble. Any given HTML input with any given processing options will work how it works. No predictions. Thoal is good for some things and not others.

FAQ

Why is the text formatting of the product preview a bit off, but the HTML from which it derives looks correct in my browser?

I don't know, and I don't know how to fix that. I have the same problem with my ebook. Reflowable format is more difficult to get right than fixed format. I don't know enough about generating the product preview, perhaps to include the issues of converting to a pdf digital representation.

Why does the HTML generated by LO5 look so bad in my browser?

The generated HTML is structurally sound and close to useful HTML5. The biggest potential structural problem is that similarly named paragraph styles in LO5 will generate identical classnames for their class selectors. The thoal digest report will tell you if that is the case. If it is, rename your styles so that the corresponding classes are unique, perhaps by adding a suffix. Also, CSS standards require that Web browsers ignore the styling governed by class selectors with a name that starts with an unescaped digit. LO5 will make that difficulty too. The thoal service has an option to escape such leading digits.

Why is thoal only in English?

I am fluent only in English. Neither have I written software that uses i18n or l10n. It would be challenging to add other languages to say the least.

Why does thoal add explicit end tags without that option being set?

I need explicit tags to navigate the HTML text for most editing operations. Not all omitted end tags permissible by HTML5 can be detected by thoal.