The original converter

W2phtml converter is a fork of writer2latex project by Henrik Just.

Like the original project this converter is distributed under GPLv3 license.

Unlike the original, this converter is only for exporting text documents for websites, ebooks, digital libraries and creating semantic data in rdf format.

Unlike the original, this project provides the ability to export the document with preserved pagination so that the resulting html document is almost identical to the output pdf, which is important when exporting scientific publications.

Graphic interface

The graphic interface of the converter is written in Java. By running the graphical interface of the converter you do not have to worry about accidentally changing the document because document hyphenations can be rearranged and soft page breaks can be recalculated when the document is opened.

This converter can be launched with a graphical interface not only as a LibreOffice extension, but also as a standalone application. In addition, it retains the ability to run it on the command line for mass conversion of documents.

The graphical interface has the ability to save named sets of settings for different tasks. In this case, the settings are saved in new tabs.

Metadata

Metadata for documents can be defined in a spreadsheet, which you can then export to CSV format.

Also metadata can be set using metadata editor directly in the document.

Special settings

For html formats, it is possible to export html comments to provide information how Greenstone3 digital library should split document into chapters and generate table of contents.

While exporting documents with preserved pagination, it is possible to specify CSS styles for page breaks.

Unlike the original, this converter exports almost all CSS styles inline, in other words as attributes of html elements to resolve problems with import on websites.

There is also an option to split document parts by headers and limit cutting to page boundaries only, which may be necessary to obtain individual journal article facets.

When exporting to any of the formats, it is possible to set an upper resolution limit for bitmap images to optimize them during conversion.

By default, text sizes are saved in rem (relative size depending on the font size of the root html element), which allows you to control the scale of the text by site settings. But there is till an option to save sizes in pixels.

Links

Site of the project which was taken as a basis for this converter.

Project source code in GitLab

Download the latest version of the converter by following this link.