Help:Report generation
This document describes a workflow for converting manuscripts received in Microsoft Word format to digital publications that are disseminated through DFM's various online platforms.
Target outputs
Given a source manuscript, we will end up with the following outputs:
- Editable, version-controlled source text of the working paper (report) on the DFM Wiki
- Copies of all images used in the report, along with description and licensing metadata, on the DFM Wiki
- Metadata for all bibliographic references in the report within the DFM Zotero library
- A web-browsable version of the report on the DFM public-facing website (driedfishmatters.org)
- A downloadable PDF version of the report in the DFM Zotero library
- A brief description of the report, with cover thumbnail/preview and links to web and PDF versions, in the Working Papers listing on the DFM Wiki and public website.
Rationale
If our only goal is to distribute a manuscript shared by a research team, the simplest approach is to use the "Export as PDF" function within Microsoft Word to generate a shareable document, which can then be disseminated through the DFM mailing list and website.
The various outputs listed above are intended to add value to the reports prepared by project co-investigators and collaborators, by maximizing options for the dissemination and re-use of project data across multiple platforms. For example, this allows us to locate and re-use images embedded in various reports through the image catalogue contained within the DFM Wiki. (If the same image is used within a subsequent report, our image fingerprinting tools should be able to locate the existing version in our catalogue.)
At the same time, this workflow implements mechanisms to streamline the design and copy-editing of reports. For example, we end up with a series of reports with the same branding; consistent citation, table, and figure formatting; and valid reference data for in-text citations and image captions.
Step 1. Copy-edit and format the source document
Before converting a Word document to other formats, we need to ensure that the layout is clean and semantically valid. If authors have used direct formatting instead of semantically defined styles (e.g., boldface type for headings), the document will not convert well into wikitext, HTML, epub, or PDF. In this step we will conduct some basic copy-editing of the document.
- Correct the formatting
- Go to File > Check for issues > Inspect document. Select everything then press the "Inspect" button. If there are any comments, tracked changes, document properties, etc., click "Remove All".
- Ensure that headings follow a nested outline: Heading 2 (chapter), Heading 3 (subsection), Heading 4 (sub-subsection).
- Remove extra line breaks (between paragraphs, before and after headings or tables, line breaks used instead of page breaks, etc.)
- Review figures
- If pictures are inside of frames, cut and paste back into the document so they are no longer contained within a frame. (Complex layouts in Word will break when converted to other formats.)
- Ensure that each image in the document has a caption that fully describes the image and includes attribution (source and license data). Place the caption text for images in the alt text field for each image (Right click and "Edit alt text"). A group of images with an individual caption should be split up, so that each image has an individual caption.
- Graphs and Smart Art need to be embedded as images, instead of as editable objects. Cut and paste back into the document, selecting "paste as image".
- Review citations
- Check all in-text citations to ensure they are linked to the DFM Zotero library; create or edit citations as needed. NOTE: It is possible that contributors will have used Zotero to include citations, but will have linked the citations to a private library rather than the DFM group library. This will NOT work as we have no way of retrieving the reference data from a private library. Citations will need to be transferred to the DFM group library and updated manually in the manuscript. Citations can also be updated in the DFM Wiki; see Help:Adding Zotero citations and Help:Importing text with Zotero citations from a word processor.
- Navigate to Zotero > Document Preferences > Switch to a different word processor. This will convert all the citations into hyperlinks.
Step 2. Upload to the wiki
The docx2wiki
bot script will update the image database with hash values for any new (unrecognized) images, upload new images from the current document to the wiki, then upload the document text.
- Run the docx2wiki command.
- Use the
-pagename
option to provide the target pagename on the wiki (i.e., the title of the document) - Use the
-input
option to provide the path to the docx manuscript. - Optionally, use the
-db
option to provide the path to an image fingerprint database.
- Use the
- If successful, you will see the message "
Page [[<pagename>]] saved
".
Here is an example of the command:
python pwb.py docx2wiki -pagename:"Dried Fish in West Bengal, India: Scoping report" -input:"/mnt/c/users/Eric/Downloads/WBG/DFM_RPT_IITK_Revised-Scoping-Report_2022-02-09_clean.docx"
The command should work fairly reliably, however it will give errors if there are any images that do not contain a caption set in the alt text field. Note also that images in unknown formats will be ignored; currently the only recognized formats are JPEG and PNG.
Step 3. Review and clean up the wiki page
- Insert a metadata template
- At the top of the document, insert Template:Report metadata. Fill in the required fields: authors (separated with ampersands), abstract, series, number in the series, and institution.
- Check table formatting
- Manually check the tables for formatting errors. Sometimes Word can store incorrect table layout data if columns, rows, or cells have been merged and re-split, or otherwise edited in non-standard ways. If this is an issue, delete the table then copy and paste from the source document into the wiki visual editor.
- If the table has a caption, move the table caption text into the actual table caption. (Word formats table captions as paragraphs.)
- Convert table headers to "Header Cell". (Word formats table headers as regular cells.)
- Check citation formatting
- Check citations. If the citation data could not be retrieved from the DFM Zotero library (there will be an error printed in red), locate the broken citation and edit the Zotero citation template within the citation field. Occasionally it is just a communication error with the Zotero server that causes the problem; purging the page (or saving again after making further edits) may fix the issue.
- Add a "Notes" header at the bottom of the document.
- If there were any footnotes or endnotes in the Word manuscript, convert those to footnotes in the wiki using the tool Cite > Basic and copying/pasting the footnote text into a note at the correct location in the document.
Step 4. Create a cover
Step 5. Generate PDF
Step 6. Distribute
- Add to Zotero
- Add to the DFM Working Papers listing
- Publish to the DFM website
Step 7. Publicize
- Post to the DFM blog
- Post to Twitter
- Post to the DFM mailing list
- Send message to authors