Difference between revisions of "Help:Report generation"

From DFM Wiki
(Completed the instructions.)
m (Fixed code examples formatting)
Line 84: Line 84:
  
 
=== Report script example ===
 
=== Report script example ===
Here is a fully working example of the <code>report</code> bot script (line breaks and indents added for clarity). In general, only the <code>-title</code> and <code>-outdir</code> options need to be set.<blockquote>
+
Here is a fully working example of the <code>report</code> bot script. In general, only the <code>-title</code> and <code>-outdir</code> options need to be set.
  python pwb.py report
+
  python pwb.py report -title:"Dried fish in Cambodia: Literature review" -outdir:"/mnt/c/Users/Eric/Downloads/KHM/" -zotero_library:2183860 -address:"Dried Fish Matters Project<nowiki><br></nowiki> Department of Anthropology, Faculty of Arts <nowiki><br></nowiki> 432 Fletcher Argue Building, 15 Chancellor Circle<nowiki><br></nowiki> The University of Manitoba, Winnipeg, MB, R3T 2N2<nowiki><br></nowiki> CANADA<nowiki><br></nowiki> <nowiki><br></nowiki> dried.fish.matters@umanitoba.ca" -acknowledgements:"<nowiki><img src=\"SSHRC_CRSH_logo.svg\"></nowiki><nowiki><br></nowiki> This work draws on research supported by the Social Sciences and Humanities Research Council of Canada." -license:"<nowiki><img src=\"CC-BY-SA_icon.svg\"></nowiki><nowiki><br></nowiki> This work is licensed under the Creative Commons AttributionShareAlike 4.0 International License. To view a copy of this license, visit <nowiki><a href=\"http://creativecommons.org/licenses/by-sa/4.0/\"> creativecommons.org/licenses/by-sa/4.0/ </a></nowiki>."
  -title:"Dried fish in Cambodia: Literature review"
 
  -outdir:"/mnt/c/Users/Eric/Downloads/KHM/"
 
  -zotero_library:2183860
 
  -address:"Dried Fish Matters Project<nowiki><br></nowiki>
 
    Department of Anthropology, Faculty of Arts <nowiki><br></nowiki>
 
    432 Fletcher Argue Building, 15 Chancellor Circle<nowiki><br></nowiki>
 
    The University of Manitoba, Winnipeg, MB, R3T 2N2<nowiki><br></nowiki>
 
    CANADA<nowiki><br></nowiki>
 
    <nowiki><br></nowiki>
 
    dried.fish.matters@umanitoba.ca"
 
  -acknowledgements:"<nowiki><img src=\"SSHRC_CRSH_logo.svg\"></nowiki><nowiki><br></nowiki>
 
    This work draws on research supported by the Social Sciences and Humanities  
 
    Research Council of Canada."  
 
  -license:"<nowiki><img src=\"CC-BY-SA_icon.svg\"></nowiki><nowiki><br></nowiki>
 
    This work is licensed under the Creative Commons AttributionShareAlike 4.0  
 
    International License.
 
    To view a copy of this license, visit
 
    <a href=\"http://creativecommons.org/licenses/by-sa/4.0/\">
 
    creativecommons.org/licenses/by-sa/4.0/
 
    </a>."
 
</blockquote>
 
  
 
=== Ebook convert script example ===
 
=== Ebook convert script example ===
Here is a fully working example of the calibre <code>ebook-convert</code> script (line breaks and indents added for clarity). This can be saved and run from a shell script file; no customizations are needed.<blockquote>
+
Here is a fully working example of the calibre <code>ebook-convert</code> script. This can be saved and run from a shell script file; no customizations are needed.
  ebook-convert index.html report.pdf  
+
  ebook-convert index.html report.pdf --output-profile tablet --extra-css "*{font-family:\"Georgia\";text-align:justify} p, dl, ol, ul, h1, h2, h3, h4, h5, h6{ max-width:80%; box-sizing: border-box;} p, li, dd, dt{line-height: 1.6em; margin-top:0.4em; margin-bottom:0.4em;} #frontmatter p {margin:2em auto; max-width:100%} div{ margin: 0pt; border: 0pt; padding:0pt;} img{margin-bottom:1em;} .fullwidth{width:100%; height:auto;} .thumbcaption{color:grey;} .thumb{page-break-inside:avoid} article table{font-size:small; page-break-inside:avoid; border-collapse:collapse; vertical-align:top; text-align:center; margin-bottom:2em;} th{border-bottom:solid 1pt black;border-top:solid 1pt black;} td{vertical-align:top}  article td, article th{padding:0.4em 0.5em;text-align:center;} caption{color:grey; margin-bottom:1em; margin-top:1em; font-size:normal}   .thumb{margin:1em auto;} .reference::before{content:\" \"} a{text-decoration:none;} h2{margin: 2em 0;} h1,h2,h3,h4,h5,h6{color:royalblue;} .calibre-pdf-toc .level-0, .calibre-pdf-toc .level-1 {font-size: initial;}" --chapter "//h:h1" --chapter-mark both --level1-toc "//h:h2" --level2-toc "//h:h3" --toc-threshold 12  --pdf-add-toc --toc-title "Contents" --base-font-size 12  --pdf-default-font-size 16 --pdf-footer-template "<nowiki><footer style=\"justify-content: space-between;margin:0 0.4em; \"> <p>_TITLE_</p></nowiki><nowiki><p><span>_PAGENUM_</span></nowiki><nowiki></p></nowiki> <nowiki><script> if (_PAGENUM_ == 0) { document.currentScript.parentNode.innerHTML = ''</nowiki>} else { document.currentScript.parentNode.querySelector('span').innerHTML = _PAGENUM_; } <nowiki></script></nowiki><nowiki></footer></nowiki>" --pdf-page-number-map "if (n < 2) 0; else n - 2;" --cover cover.png --pdf-hyphenate
<nowiki> </nowiki> --output-profile tablet  
 
<nowiki> </nowiki> --extra-css "*{font-family:\"Georgia\";text-align:justify}  
 
<nowiki> </nowiki>  p, dl, ol, ul, h1, h2, h3, h4, h5, h6{ max-width:80%; box-sizing: border-box;}  
 
<nowiki> </nowiki>  p, li, dd, dt{line-height: 1.6em; margin-top:0.4em; margin-bottom:0.4em;}  
 
<nowiki> </nowiki>  #frontmatter p {margin:2em auto; max-width:100%}  
 
<nowiki> </nowiki>  div{ margin: 0pt; border: 0pt; padding:0pt;}
 
<nowiki> </nowiki>  img{margin-bottom:1em;}  
 
<nowiki> </nowiki>  .fullwidth{width:100%; height:auto;}  
 
<nowiki> </nowiki>  .thumbcaption{color:grey;}  
 
<nowiki> </nowiki>  .thumb{page-break-inside:avoid}  
 
<nowiki> </nowiki>  article table{font-size:small; page-break-inside:avoid; border-collapse:collapse;
 
<nowiki> </nowiki>    vertical-align:top; text-align:center; margin-bottom:2em;}
 
<nowiki> </nowiki>  th{border-bottom:solid 1pt black;border-top:solid 1pt black;}  
 
<nowiki> </nowiki>  td{vertical-align:top} 
 
<nowiki> </nowiki>  article td, article th{padding:0.4em 0.5em;text-align:center;}  
 
<nowiki> </nowiki>  caption{color:grey; margin-bottom:1em; margin-top:1em; font-size:normal}
 
    .thumb{margin:1em auto;}
 
<nowiki> </nowiki>  .reference::before{content:\" \"}  
 
<nowiki> </nowiki>  a{text-decoration:none;}
 
<nowiki> </nowiki>  h2{margin: 2em 0;}
 
<nowiki> </nowiki>  h1,h2,h3,h4,h5,h6{color:royalblue;}  
 
<nowiki> </nowiki>  .calibre-pdf-toc .level-0, .calibre-pdf-toc .level-1 {font-size: initial;}"  
 
<nowiki> </nowiki> --chapter "//h:h1"  
 
<nowiki> </nowiki> --chapter-mark both  
 
<nowiki> </nowiki> --level1-toc "//h:h2"  
 
<nowiki> </nowiki> --level2-toc "//h:h3"  
 
<nowiki> </nowiki> --toc-threshold 12   
 
<nowiki> </nowiki> --pdf-add-toc  
 
<nowiki> </nowiki> --toc-title "Contents"  
 
<nowiki> </nowiki> --base-font-size 12   
 
<nowiki> </nowiki> --pdf-default-font-size 16  
 
<nowiki> </nowiki> --pdf-footer-template "<nowiki><footer style=\"justify-content: space-between;margin:0 0.4em; \">
 
    <p>_TITLE_</p></nowiki><nowiki><p><span>_PAGENUM_</span></nowiki><nowiki></p></nowiki>
 
<nowiki> </nowiki>  <nowiki><script> if (_PAGENUM_ == 0) { document.currentScript.parentNode.innerHTML = ''</nowiki>} else  
 
<nowiki> </nowiki>  { document.currentScript.parentNode.querySelector('span').innerHTML = _PAGENUM_; }  
 
<nowiki> </nowiki>  <nowiki></script></nowiki><nowiki></footer></nowiki>"  
 
<nowiki> </nowiki> --pdf-page-number-map "if (n < 2) 0; else n - 2;"  
 
<nowiki> </nowiki> --cover cover.png  
 
<nowiki> </nowiki> --pdf-hyphenate
 
</blockquote>
 
  
 
==Step 6. Distribute==
 
==Step 6. Distribute==
Line 165: Line 104:
 
##Upload to the server. From the directory containing the wiki export, run the command: <code>rsync -avz ethrift@driedfishmatters.org/public_html/pub/</code>
 
##Upload to the server. From the directory containing the wiki export, run the command: <code>rsync -avz ethrift@driedfishmatters.org/public_html/pub/</code>
  
=== Wiki2html example ===
+
=== Wiki2html script ===
Line breaks and indents added for clarity.<blockquote>
+
  python pwb.py wiki2html -category:Public -address:"<nowiki><p><b>Dried Fish Matters Project</b></nowiki><nowiki><BR></nowiki> Department of Anthropology, Faculty of Arts<nowiki><BR></nowiki> 432 Fletcher Argue Building, 15 Chancellor Circle<nowiki><BR></nowiki> The University of Manitoba, Winnipeg, MB, R3T 2N2<nowiki><BR></nowiki> CANADA<nowiki></p></nowiki><nowiki><p>dried.fish.matters@umanitoba.ca</p></nowiki>" -credits:"This website draws on research supported by the Social Sciences and Humanities Research Council of Canada." -homepage:"<nowiki>https://driedfishmatters.org</nowiki>" -logo_url:<nowiki>https://driedfishmatters.org/dfm/wp-content/uploads/2020/08/DFM-LOGO_500px-1.png</nowiki>
  python pwb.py wiki2html  
 
  -category:Public  
 
  -address:"<nowiki><p><b>Dried Fish Matters Project</b></nowiki><nowiki><BR></nowiki>
 
    Department of Anthropology, Faculty of Arts<nowiki><BR></nowiki>
 
    432 Fletcher Argue Building, 15 Chancellor Circle<nowiki><BR></nowiki>
 
    The University of Manitoba, Winnipeg, MB, R3T 2N2<nowiki><BR></nowiki>
 
    CANADA<nowiki></p></nowiki><nowiki><p>dried.fish.matters@umanitoba.ca</p></nowiki>"  
 
  -credits:"This website draws on research supported by the  
 
    Social Sciences and Humanities Research Council of Canada."
 
  -homepage:"<nowiki>https://driedfishmatters.org</nowiki>"  
 
  -logo_url:<nowiki>https://driedfishmatters.org/dfm/wp-content/uploads/2020/08/DFM-LOGO_500px-1.png</nowiki>
 
</blockquote>
 
  
 
==Step 7. Publicize==
 
==Step 7. Publicize==

Revision as of 14:02, 16 May 2022

This document describes a workflow for converting manuscripts received in Microsoft Word format to digital publications that are disseminated through DFM's various online platforms.

Overview

Target outputs

Given a source manuscript, we will end up with the following outputs:

  • Editable, version-controlled source text of the working paper (report) on the DFM Wiki
  • Copies of all images used in the report, along with description and licensing metadata, on the DFM Wiki
  • Metadata for all bibliographic references in the report within the DFM Zotero library
  • A web-browsable version of the report on the DFM public-facing website (driedfishmatters.org)
  • A downloadable PDF version of the report in the DFM Zotero library
  • A brief description of the report, with cover thumbnail/preview and links to web and PDF versions, in the Working Papers listing on the DFM Wiki and public website.

Rationale

If our only goal is to distribute a manuscript shared by a research team, the simplest approach is to use the "Export as PDF" function within Microsoft Word to generate a shareable document, which can then be disseminated through the DFM mailing list and website.

The various outputs listed above are intended to add value to the reports prepared by project co-investigators and collaborators, by maximizing options for the dissemination and re-use of project data across multiple platforms. For example, this allows us to locate and re-use images embedded in various reports through the image catalogue contained within the DFM Wiki. (If the same image is used within a subsequent report, our image fingerprinting tools should be able to locate the existing version in our catalogue.)

At the same time, this workflow implements mechanisms to streamline the design and copy-editing of reports. For example, we end up with a series of reports with the same branding; consistent citation, table, and figure formatting; and valid reference data for in-text citations and image captions.

Step 1. Copy-edit and format the source document

Before converting a Word document to other formats, we need to ensure that the layout is clean and semantically valid. If authors have used direct formatting instead of semantically defined styles (e.g., boldface type for headings), the document will not convert well into wikitext, HTML, epub, or PDF. In this step we will conduct some basic copy-editing of the document.

  1. Correct formatting
    1. Go to File > Check for issues > Inspect document. Select everything then press the "Inspect" button. If there are any comments, tracked changes, document properties, etc., click "Remove All".
    2. Ensure that headings follow a nested outline: Heading 2 (chapter), Heading 3 (subsection), Heading 4 (sub-subsection).
    3. Remove extra line breaks (between paragraphs, before and after headings or tables, line breaks used instead of page breaks, etc.)
  2. Review figures
    1. If pictures are inside of frames, cut and paste back into the document so they are no longer contained within a frame. (Complex layouts in Word will break when converted to other formats.)
    2. Ensure that each image in the document has a caption that fully describes the image and includes attribution (source and license data). Place the caption text for images in the alt text field for each image (Right click and "Edit alt text"). A group of images with an individual caption should be split up, so that each image has an individual caption.
    3. Graphs and Smart Art need to be embedded as images, instead of as editable objects. Cut and paste back into the document, selecting "paste as image".
  3. Review citations
    1. Check all in-text citations to ensure they are linked to the DFM Zotero library; create or edit citations as needed. NOTE: It is possible that contributors will have used Zotero to include citations, but will have linked the citations to a private library rather than the DFM group library. This will NOT work as we have no way of retrieving the reference data from a private library. Citations will need to be transferred to the DFM group library and updated manually in the manuscript. Citations can also be updated in the DFM Wiki; see Help:Adding Zotero citations and Help:Importing text with Zotero citations from a word processor.
    2. Navigate to Zotero > Document Preferences > Switch to a different word processor. This will convert all the citations into hyperlinks.

Step 2. Upload to the wiki

The docx2wiki bot script will update the image database with hash values for any new (unrecognized) images, upload new images from the current document to the wiki, then upload the document text.

  1. Run the docx2wiki command.
    1. Use the -pagename option to provide the target pagename on the wiki (i.e., the title of the document)
    2. Use the -input option to provide the path to the docx manuscript.
    3. Optionally, use the -db option to provide the path to an image fingerprint database.
  2. If successful, you will see the message "Page [[<pagename>]] saved".

Here is an example of the command:

python pwb.py docx2wiki -pagename:"Dried Fish in West Bengal, India: Scoping report" -input:"/mnt/c/users/Eric/Downloads/WBG/DFM_RPT_IITK_Revised-Scoping-Report_2022-02-09_clean.docx"

The command should work fairly reliably, however it will give errors if there are any images that do not contain a caption set in the alt text field. Note also that images in unknown formats will be ignored; currently the only recognized formats are JPEG and PNG.

Step 3. Review and clean up the wiki page

  1. Insert a metadata template
    1. At the top of the document, insert Template:Report metadata. Fill in the required fields: authors (separated with ampersands), abstract, series, number in the series, and institution.
  2. Check table formatting
    1. Manually check the tables for formatting errors. Sometimes Word can store incorrect table layout data if columns, rows, or cells have been merged and re-split, or otherwise edited in non-standard ways. If this is an issue, delete the table then copy and paste from the source document into the wiki visual editor.
    2. If the table has a caption, move the table caption text into the actual table caption. (Word formats table captions as paragraphs.)
    3. Convert table headers to "Header Cell". (Word formats table headers as regular cells.)
  3. Check citation formatting
    1. Check citations. If the citation data could not be retrieved from the DFM Zotero library (there will be an error printed in red), locate the broken citation and edit the Zotero citation template within the citation field as needed. Typically, this type of error will be caused by a citation that is missing from the DFM library. Occasionally the error message will indicate there was a communication error with the Zotero server; purging the page (or saving again after making further edits) may fix the issue.
    2. Add a "Notes" header at the bottom of the document.
    3. If there were any footnotes or endnotes in the Word manuscript, convert those to footnotes in the wiki using the tool Cite > Basic and copying/pasting the footnote text into a note at the correct location in the document.

Step 4. Create a cover

  1. Download one of the report covers from the DFM Wiki to use as a template. For example, the file linked in the thumbnail is the cover for the Myanmar Dried Fish Consumption Survey.
    Myanmar dried fish consumption survey report cover
  2. Open the SVG file in Inkscape and modify the text (title and authors), background image, and partner organization logo. The background image in the template is cropped using a clipping shape; right-click then "Release Clip", insert a new image, send it below the shape (PGDN), and right-click then "Set Clip".
  3. Upload the source file to the DFM Wiki, placing it in Category:Report covers.
  4. From Inkscape, export the image as PNG, with a resolution of 96 dpi for web output. (This should give an image of 816x1056 pixels.)

Step 5. Generate PDF

The conversion to PDF is a semi-automated, two-stage process. First, we retrieve the page from the DFM Wiki along with all the original-resolution images. (If the Word manuscript contained low-resolution versions of images already in the DFM Wiki, the higher-resolution ones will be used instead.) Second, we run the source files through the open source Calibre conversion utility, with a supplied stylesheet, to create a formatted PDF document.

IMPORTANT: The version of Calibre in the Ubuntu repositories may give errors due to incorrect dependencies. Install directly from the project website, using the instructions at https://calibre-ebook.com/download_linux.

  1. Create a working folder.
    1. Copy the cover image (PNG format) to this folder and save with the filename cover.png.
    2. Copy or download the files SSHRC_CRSH_logo.svg and CC-BY-SA_icon.svg to this folder (or any other images that are referenced in the license and acknowledgement text for the report).
  2. Run the report bot script.
    1. Set -title to the page title corresponding to the report on the DFM Wiki.
    2. Set -outdir to the directory path for the working folder.
    3. Set -zotero_library to 2183860
    4. Set -address to the DFM project address, using <BR> codes for line breaks
    5. Set -acknowledgements to the SSHRC acknowledgement message, plus any additional text appropriate to include in the front matter.
    6. Set -license to the licensing message for Creative Commons BY-SA 4.0 International or other appropriate license.
  3. Run the ebook_convert script.

Report script example

Here is a fully working example of the report bot script. In general, only the -title and -outdir options need to be set.

python pwb.py report -title:"Dried fish in Cambodia: Literature review" -outdir:"/mnt/c/Users/Eric/Downloads/KHM/" -zotero_library:2183860 -address:"Dried Fish Matters Project<br> Department of Anthropology, Faculty of Arts <br> 432 Fletcher Argue Building, 15 Chancellor Circle<br> The University of Manitoba, Winnipeg, MB, R3T 2N2<br> CANADA<br> <br> dried.fish.matters@umanitoba.ca" -acknowledgements:"<img src=\"SSHRC_CRSH_logo.svg\"><br> This work draws on research supported by the Social Sciences and Humanities Research Council of Canada." -license:"<img src=\"CC-BY-SA_icon.svg\"><br> This work is licensed under the Creative Commons AttributionShareAlike 4.0 International License. To view a copy of this license, visit <a href=\"http://creativecommons.org/licenses/by-sa/4.0/\"> creativecommons.org/licenses/by-sa/4.0/ </a>."

Ebook convert script example

Here is a fully working example of the calibre ebook-convert script. This can be saved and run from a shell script file; no customizations are needed.

ebook-convert index.html report.pdf --output-profile tablet --extra-css "*{font-family:\"Georgia\";text-align:justify} p, dl, ol, ul, h1, h2, h3, h4, h5, h6{ max-width:80%; box-sizing: border-box;} p, li, dd, dt{line-height: 1.6em; margin-top:0.4em; margin-bottom:0.4em;} #frontmatter p {margin:2em auto; max-width:100%} div{ margin: 0pt; border: 0pt; padding:0pt;} img{margin-bottom:1em;} .fullwidth{width:100%; height:auto;} .thumbcaption{color:grey;} .thumb{page-break-inside:avoid} article table{font-size:small; page-break-inside:avoid; border-collapse:collapse; vertical-align:top; text-align:center; margin-bottom:2em;} th{border-bottom:solid 1pt black;border-top:solid 1pt black;} td{vertical-align:top}  article td, article th{padding:0.4em 0.5em;text-align:center;} caption{color:grey; margin-bottom:1em; margin-top:1em; font-size:normal}   .thumb{margin:1em auto;} .reference::before{content:\" \"} a{text-decoration:none;} h2{margin: 2em 0;} h1,h2,h3,h4,h5,h6{color:royalblue;} .calibre-pdf-toc .level-0, .calibre-pdf-toc .level-1 {font-size: initial;}" --chapter "//h:h1" --chapter-mark both --level1-toc "//h:h2" --level2-toc "//h:h3" --toc-threshold 12  --pdf-add-toc --toc-title "Contents" --base-font-size 12  --pdf-default-font-size 16 --pdf-footer-template "<footer style=\"justify-content: space-between;margin:0 0.4em; \"> <p>_TITLE_</p><p><span>_PAGENUM_</span></p> <script> if (_PAGENUM_ == 0) { document.currentScript.parentNode.innerHTML = ''} else { document.currentScript.parentNode.querySelector('span').innerHTML = _PAGENUM_; } </script></footer>" --pdf-page-number-map "if (n < 2) 0; else n - 2;" --cover cover.png --pdf-hyphenate

Step 6. Distribute

  1. Add to Zotero
    1. Create a new "Report" item in the collection "*DFM Reports and publications > DFM Working Papers" of the DFM Zotero group library.
    2. Fill in the metadata: Title, authors, report number, report type ("Working Paper"), series title ("Dried Fish Matters"), institution ("The University of Manitoba / <partner organization>", and date.
    3. In the "Extra" field, enter the text "cover: <filename>" where <filename> is the name of the cover image file on the DFM wiki. For example: cover: File:Gujarat policy review report cover.svg
    4. Upload the PDF file as an attachment.
  2. Update the DFM Working Papers listing
    1. Run the command: python pwb.py zotero2wiki -key:QJiaTK7SzNDiuMELwlPggobh -user_id:2183860 -collection:BXHG7UDL -library:group -pagename:'DFM Working Papers'
  3. Publish to the DFM website
    1. Run the wiki2html bot script. (This can also be run from a shell script.)
    2. Upload to the server. From the directory containing the wiki export, run the command: rsync -avz ethrift@driedfishmatters.org/public_html/pub/

Wiki2html script

python pwb.py wiki2html -category:Public -address:"<p><b>Dried Fish Matters Project</b><BR> Department of Anthropology, Faculty of Arts<BR> 432 Fletcher Argue Building, 15 Chancellor Circle<BR> The University of Manitoba, Winnipeg, MB, R3T 2N2<BR> CANADA</p><p>dried.fish.matters@umanitoba.ca</p>" -credits:"This website draws on research supported by the Social Sciences and Humanities Research Council of Canada." -homepage:"https://driedfishmatters.org" -logo_url:https://driedfishmatters.org/dfm/wp-content/uploads/2020/08/DFM-LOGO_500px-1.png

Step 7. Publicize

  1. Post to the DFM blog - See examples of prior announcements.
  2. Post to Twitter
  3. Post to the DFM mailing list
  4. Send message to authors