Space added between closing angle bracket and text.
In text hyphen-minus, figure dash and em dash replaced with en dash.
In text spaces added before and after En dash.
Spaces between numbers and dashes are being removed. Dash is being set to figure dash.
Space between N. and Y. also removed. N. Y. → N.Y
Following rules apply for Russian:
Space between initials removed as shown below:
А.[possible space]А. Иванов → А.А. Иванов
Иванов А.[possible space]А. → Иванов А.А.
In followings spaces also removed:
и т. д. → и т.д.
и т. п. → и т.п.
т. к. → т.к.
т. е. → т.е.
т. н. → т.н.
Symbols и/И with following « combining breve» replaced with й/Й
Symbols е/Е with following «combining diaeresis» replaced with ё/Ё
Manual page break at document start removal
Manual page break is invisible if it is placed at the start of the document and usually useless. On the other hand it could induce problems at stage of making-up. That’s why it is recommended for removal in most cases.
Custom page styles removal
All custom page styles are being removed. Page styles should be defined
Loading styles from template
It is a good practice to have all documents have the same look as our eyes get used to text formatting and fonts. This stage loads predefined styles from template document. Predefined styles replace document’s initial style definitions. It relies on style naming conventions.
If input document have custom styles, which were not defined in template, then they won’t be changed. In that case styles should be assigned by hand after cleaning.
Basic macro removal
Usually no macro is needed in articles or books, as input documents by default should only have text and images. To clean macros from documents this stage occur.
Advanced mode intended for professionals and advanced users and provides a selection for cleaning stages to apply or to skip. In this mode you also have access for additional cleaning stages described below.
Manual page breaks removal
If no manual page breaks allowed in the document, they could all be removed with this function.
Is document eligible for HTML publishing? It depends on various aspects, like symbol codes, table and images preferences. A lot of this is not visible for editor until html export stage is done and document is tested on various devices. To eliminate most of problems occur with exported documents this validation function has been made. It saves a lot of time for testing and makes process of making up of HTML much more stable and reliable. Current checks made by this function described below.
Check for symbols in text
Symbols should be checked for membership to Unicode “private use area”. Symbols from “private use area” are not recommended for HTML export as correct display depends on availability of initial font. It is better to replace symbols from that area with standardized Unicode symbols.
Footnote symbols check
Footnote symbols should also be checked for membership to “private use area”. As that symbols are not recommended for HTML export it is better to replace them with standardized symbols.
Numbering styles check
Numeration markers should also be checked for membership to “private use area”. As that symbols are not recommended for HTML export it is better to replace them with standardized symbols. In advanced mode additional information provided.
Check for drawings and embedded objects
Currently conversion to HTML doesn’t support nor drawings made in Writer, nor any embedded objects except formulas. Supported formats are JPEG, PNG, TIF, SVG.
This extension could be downloaded from LibreOffice Extensions website https://extensions.libreoffice.org/extensions/clean-and-validate-for-publishing-with-pagination
To enable advanced mode ePublishing extension should be installed which is allow available at LibreOffice Extensions website https://extensions.libreoffice.org/extensions/epublishing
Advanced mode could be enabled via menu ePublishing → «Configure cleaning»