The use of text editors in the publishing process has both positive and negative sides. On the positive side, we can note their availability (text editors are available on almost any computer), ease of use, and also the price (noticeably lower than paid products for layout).
The solution described in this article, based on the use of the LibreOffice office suite. In addition to such obvious advantages as low cost and low entry threshold it allowed to resolve a number of problems that were difficult to solve while using professional closed source software products for layout.
The technology has made it possible to obtain various types of printed and electronic layouts from a single file. The resulting layouts are suitable not only for printing (and distribution) in PDF format, but also for publishing in EPUB format with preservation of pagination, corresponding to the printed layout. In addition, this approach also made it possible to obtain layouts: in HTML format (with preservation of pagination), which was used for publication in the electronic library, and in RDF format, which was found useful for importing publications into the triplestore. Thus, there is no need to prepare different layouts for printing and e-books purposes, and also to modify layouts for publication in the digital library (Greenstone3) or triplestore.
The use of an open standardized format ODF removed the dependence on proprietary software (InDesign) and it’s file format and also simplified further work with publications archive.
However, the use of text editors in the publishing process has long been "notorious". This is due to the limited functionality of text editors, as well as their unstable work, which primarily refers to the text editor MS Word. At the same time, bug fixing in its operation and expansion of functionality is impossible due to the closed code of this software product.
Preparation of materials
To make work on publications as predictable as possible preliminary processing of materials is necessary. It is desirable to carry out such preliminary processing at the earliest stage of work with authors’ files. This processing can consist of various stages, for example:
Substitute fonts to allow only accepted by the publisher, replace non-standard characters with Unicode characters.
Filter manual formatting based on the whitelist principle. Remove all manual formatting except for allowed to authors. Examples of permitted formatting could be: bold, italic, underline, superscript, subscript and spacing.
Remove hidden links and bookmarks in the text
Correct common typing mistakes
Remove unused styles
Remove manual page breaks
Remove macroses from author’s files
Remove a page break at the beginning of a document if it has one
Set correct paragraph styles, used in publishing, to simplify further layout
In addition to the preliminary processing presented above, it is advisable to make sure that there are no more problems in the document and it wont cause any trouble on next phases of the process.
To automate and simplify the cleaning procedure, I have written an extension for LibreOffice "Clean and validate". It allows you to automatically make most of the necessary corrections in copyright documents by clicking on the "Clean" button and then make sure that the document is ready for layout by clicking on the "Validation" button. More information about this extension can be found on the extension page.
To implement functionality that is currently missing in LibreOffice Writer, I made ePublishing extension to LibreOffice Writer, which among other functions contains function for assembling a journal issue from a set of articles. This of course requires you to prepare an issue template and save it in the same directory with articles. Article files should be numbered according to their intended location in the journal. To learn more about ePublishing extension, visit extension page.
Since the layout file can be used not only for printing, but also for distributing materials in EPUB and HTML formats, it is preferable to set relative width for images and tables during the layout process.
Before publication, the layout must be checked to eliminate both legal problems that may arise and technical problems associated with the restrictions imposed by the universality of the applied technology. The checks described in this document were suitable for my case.
To print a layout on risographs using A3 format, you need to export the resulting document to PDF format and then use any imposing software. This program I made for my case.
If a CMYK color model is required, which is used for offset printing, you can convert the document with help of this script which depends on pdftops and ghostscript and requires postscript file to correctly convert black from RGB to CMYK.
If printing does not involve color illustrations, you can use this script for converting to grayscale color space, which uses ghostscript.
HTML, EPUB и RDF
To get layouts in HTML format for publishing on a website (for example, in the OJS system) or an electronic library, as well as distributing a document in EPUB format, optimized for reading on e-books or for exporting a publication in the RDF semantic data format, I made a converter which is a fork of Writer2xhtml. I implemented page splitting during conversion, conversion to RDF and made other improvements that made it easy to get the layout in the required format. I named this fork Writer2phtml (p stands for paginated). I am very grateful to Henrik Just, the author of Writer2xhtml. If it were not for his work, which he published under GPL v3 license, maybe I would not have made decision to create software for this entire project.
Upon completion of the final check, the document is saved to the archive, which means that hyphenations in it must be saved exactly in the form in which it was prepared and accepted for publication. To do this, the ePublishing add-on has a function for converting automatic hyphenations into manual hyphenations, which allows to get rid of dependencies on hyphenation dictionaries used in layout phase. The quality of the Russian hyphenation dictionaries supplied with LibreOffice turned out to be unsatisfactory, and therefore I made extension with russian dictionary converted from TEX.