Difference between revisions of "Help:Report generation"
EricThrift (talk | contribs) (Created page with "== Manual preparation of source document == Before converting a Word document to other formats, we need to ensure that the layout is clean and semantically valid. If authors h...") |
|||
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | This document describes a workflow for converting manuscripts received in Microsoft Word format to digital publications that are disseminated through DFM's various online platforms. | |
− | |||
− | + | ==Overview== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | + | ===Target outputs=== |
+ | Given a source manuscript, we would like to end up with the following outputs: | ||
− | + | *Editable, version-controlled '''source text''' of the working paper (report) on the DFM Wiki | |
− | + | *Copies of all '''images''' used in the report, along with description and licensing metadata, on the DFM Wiki | |
+ | * Metadata for all '''bibliographic references''' in the report within the DFM Zotero library | ||
+ | *A '''web-browsable version''' of the report on the DFM public-facing website ([https://driedfishmatters.org/pub driedfishmatters.org]) | ||
+ | *A downloadable '''PDF version''' of the report in the DFM Zotero library | ||
+ | *A brief description of the report, with cover thumbnail/preview and links to web and PDF versions, in the '''Working Papers listing''' on the [[DFM Working papers|DFM Wiki]] and [https://driedfishmatters.org/pub/dfm-working-papers.html public website]. | ||
− | == | + | === Rationale === |
+ | If our only goal is to distribute a manuscript shared by a research team, the simplest approach is to use the "Export as PDF" function within Microsoft Word to generate a shareable document, which can then be disseminated through the DFM mailing list and website. | ||
− | # Manually check the tables. Sometimes Word can store incorrect table layout data if columns, rows, or cells have been merged and re-split, or otherwise edited in non-standard ways. If this is an issue, copy and paste from the source document into the wiki visual editor. | + | The various outputs listed above are intended to '''maximize options for the dissemination and re-use of project data across multiple platforms'''. For example, this allows us to locate and re-use images embedded in various reports through the [[Help:Images|image catalogue]] contained within the DFM Wiki. (If the same image is used within a subsequent report, our image fingerprinting tools should be able to locate the existing version in our catalogue.) |
− | # | + | |
− | # Convert table headers to "Header Cell" | + | These goals support to the [[wikipedia:FAIR_data|FAIR principles]] for data management, which are explicitly mentioned within the SSHRC Data Management Policy: |
− | # | + | |
− | # Check citations. If the citation data could not be retrieved from Zotero, edit the template. | + | * Findability - data (including images and tabular data embedded within reports) can be searched and located in a catalogue; they have persistent identifiers |
− | # Add a "Notes" header. | + | * Accessibility - users are able to access the data from a trusted source |
− | # | + | * Interoperability - data can be used across different platforms and systems |
+ | * Reusability - data have usage licenses, provenance details, and technical features supporting reuse in future research or other works | ||
+ | |||
+ | In addition to supporting these principles, this workflow implements mechanisms to streamline the design and copy-editing of reports. For example, we end up with a series of reports with the same branding; consistent citation, table, and figure formatting; and valid reference data for in-text citations and image captions. | ||
+ | |||
+ | ==Step 1. Copy-edit and format the source document== | ||
+ | Before converting a Word document to other formats, we need to ensure that the layout is clean and semantically valid. If authors have used direct formatting instead of semantically defined styles (e.g., boldface type for headings), the document will not convert well into wikitext, HTML, epub, or PDF. In this step we will conduct some basic copy-editing of the document. | ||
+ | |||
+ | #Correct formatting | ||
+ | ## Go to File > Check for issues > Inspect document. Select everything then press the "Inspect" button. If there are any comments, tracked changes, document properties, etc., click "Remove All". | ||
+ | ##Ensure that headings follow a nested outline: Heading 2 (chapter), Heading 3 (subsection), Heading 4 (sub-subsection). | ||
+ | ## Remove extra line breaks (between paragraphs, before and after headings or tables, line breaks used instead of page breaks, etc.) | ||
+ | #Review figures | ||
+ | ##If pictures are inside of frames, cut and paste back into the document so they are no longer contained within a frame. (Complex layouts in Word will break when converted to other formats.) | ||
+ | ##Ensure that each image in the document has a caption that fully describes the image and includes attribution (source and license data). Place the caption text for images in the '''alt text''' field for each image (Right click and "Edit alt text"). A group of images with an individual caption should be split up, so that each image has an individual caption. | ||
+ | ##Graphs and Smart Art need to be embedded as images, instead of as editable objects. Cut and paste back into the document, selecting "paste as image". | ||
+ | #Review citations | ||
+ | ##Check all in-text citations to ensure they are linked to the DFM Zotero library; create or edit citations as needed. NOTE: It is possible that contributors will have used Zotero to include citations, but will have linked the citations to a private library rather than the DFM group library. This will NOT work as we have no way of retrieving the reference data from a private library. Citations will need to be transferred to the DFM group library and updated manually in the manuscript. Citations can also be updated in the DFM Wiki; see [[Help:Adding Zotero citations]] and [[Help:Importing text with Zotero citations from a word processor]]. | ||
+ | ##Navigate to Zotero > Document Preferences > Switch to a different word processor. This will convert all the citations into hyperlinks. | ||
+ | |||
+ | ==Step 2. Upload to the wiki== | ||
+ | The <code>docx2wiki</code> bot script can update the image database with hash values for any new (unrecognized) images, upload new images from the current document to the wiki, then upload the document text. Alternatively, the text of the report can be copied and pasted manually into a wiki page, and the images uploaded through the Upload Wizard then inserted individually into the report wiki page along with their associated captions. | ||
+ | #Run the <code>docx2wiki</code> command. | ||
+ | ##Use the <code>-pagename</code> option to provide the target pagename on the wiki (i.e., the title of the document) | ||
+ | ##Use the <code>-input</code> option to provide the path to the docx manuscript. | ||
+ | ##Optionally, use the <code>-db</code> option to provide the path to an image fingerprint database. | ||
+ | #If successful, you will see the message "<code>Page [[<pagename>]] saved</code>". | ||
+ | |||
+ | Here is an example of the command: | ||
+ | python pwb.py docx2wiki -pagename:"<REPORT TITLE>" -input:"../report/<FILENAME>.docx" -nohashes:true | ||
+ | The command should work fairly reliably, however it will give errors if there are any images that do not contain a caption set in the alt text field. Note also that images in unknown formats will be ignored; currently the only recognized formats are JPEG and PNG. | ||
+ | |||
+ | For a working example of this script, see the [https://umanitoba.sharepoint.com/:u:/r/sites/DriedFishMattersProject/Shared%20Documents/DriedFishMatters/NEW%20working%20directory/SCRIPTS/REPORT.sh?csf=1&web=1&e=ZxgAF3 version on SharePoint]. | ||
+ | |||
+ | ==Step 3. Review and clean up the wiki page == | ||
+ | |||
+ | #Insert a metadata template | ||
+ | ##At the top of the document, insert [[Template:Report metadata]]. Fill in the required fields: authors (separated with ampersands), abstract, series, number in the series, and institution. | ||
+ | #Check table formatting | ||
+ | ##Manually check the tables for formatting errors. Sometimes Word can store incorrect table layout data if columns, rows, or cells have been merged and re-split, or otherwise edited in non-standard ways. If this is an issue, delete the table then copy and paste from the source document into the wiki visual editor. | ||
+ | ## If the table has a caption, move the table caption text into the actual table caption. (Word formats table captions as paragraphs.) | ||
+ | ##Convert table headers to "Header Cell". (Word formats table headers as regular cells.) | ||
+ | #Check citation formatting | ||
+ | ##Check citations. If the citation data could not be retrieved from the DFM Zotero library (there will be an error printed in red), locate the broken citation and edit the [[Template:Zotero|Zotero citation template]] within the citation field as needed. Typically, this type of error will be caused by a citation that is missing from the DFM library. Occasionally the error message will indicate there was a communication error with the Zotero server; [[mediawikiwiki:Manual:Purge|purging the page]] (or saving again after making further edits) may fix the issue. | ||
+ | ##Add a "Notes" header at the bottom of the document. | ||
+ | ##If there were any footnotes or endnotes in the Word manuscript, convert those to footnotes in the wiki using the tool Cite > Basic and copying/pasting the footnote text into a note at the correct location in the document. | ||
+ | |||
+ | ==Step 4. Create a cover== | ||
+ | |||
+ | #Download one of the report covers from the DFM Wiki to use as a template. For example, the file linked in the thumbnail is the cover for the Myanmar Dried Fish Consumption Survey.[[File:DFM RPT Myanmar-dried-fish-consumption-survey.svg|thumb|Myanmar dried fish consumption survey report cover]] | ||
+ | #Open the SVG file in [https://inkscape.org/ Inkscape] and modify the text (title and authors), background image, and partner organization logo. The background image in the template is cropped using a clipping shape; right-click then "Release Clip", insert a new image, send it below the shape (PGDN), and right-click then "Set Clip". | ||
+ | #Upload the source file to the DFM Wiki, placing it in [[:Category:Report covers]]. | ||
+ | #From Inkscape, export the image as PNG, with a resolution of 96 dpi for web output. (This should give an image of 816x1056 pixels.) | ||
+ | |||
+ | ==Step 5. Generate PDF== | ||
+ | It is possible to generate a very basic PDF using the [https://driedfishmatters.org/cgi-bin/wiki2pdf.py wiki2pdf script] on our server; click on the "Print view" link in the sidebar of a page in the wiki to create a basic printable version, then print/save as PDF from the browser. We have experimented with some different stylesheets to support printing to more of a book or report-style format, however it is difficult to automate the production of a document that contains a cover, front matter, and table of contents, and we need additional logic to process images and links. | ||
+ | |||
+ | The conversion to PDF for the most recent reports has been done as a semi-automated, two-stage process. First, we retrieve the page from the DFM Wiki along with all the original-resolution images. (If the Word manuscript contained low-resolution versions of images already in the DFM Wiki, the higher-resolution ones will be used instead.) Second, we run the source files through the open source [https://calibre-ebook.com/ Calibre] conversion utility, with a supplied stylesheet, to create a formatted PDF document. | ||
+ | |||
+ | IMPORTANT: The version of Calibre in the Ubuntu repositories may give errors due to incorrect dependencies. Install directly from the project website, using the instructions at https://calibre-ebook.com/download_linux. | ||
+ | |||
+ | # Create a working folder. | ||
+ | ## Copy the cover image (PNG format) to this folder and save with the filename <code>cover.png</code>. | ||
+ | ## Copy or download the files <code>SSHRC_CRSH_logo.svg</code> and <code>CC-BY-SA_icon.svg</code> to this folder (or any other images that are referenced in the license and acknowledgement text for the report). | ||
+ | # Run the <code>report</code> bot script. | ||
+ | ## Set <code>-title</code> to the page title corresponding to the report on the DFM Wiki. | ||
+ | ## Set <code>-outdir</code> to the directory path for the working folder. | ||
+ | ## Set <code>-zotero_library</code> to 2183860 | ||
+ | ## Set <code>-address</code> to the DFM project address, using <code><nowiki><BR></nowiki></code> codes for line breaks | ||
+ | ## Set <code>-acknowledgements</code> to the SSHRC acknowledgement message, plus any additional text appropriate to include in the front matter. | ||
+ | ## Set <code>-license</code> to the licensing message for Creative Commons BY-SA 4.0 International or other appropriate license. | ||
+ | # Run the <code>ebook_convert</code> script. | ||
+ | |||
+ | === Report script example === | ||
+ | Here is a fully working example of the <code>report</code> bot script. In general, only the <code>-title</code> and <code>-outdir</code> options need to be customized; the rest of the command can be used as-is. | ||
+ | python pwb.py report -title:"Dried fish in Cambodia: Literature review" -outdir:"/mnt/c/Users/Eric/Downloads/KHM/" -zotero_library:2183860 -address:"Dried Fish Matters Project<nowiki><br></nowiki> Department of Anthropology, Faculty of Arts <nowiki><br></nowiki> 432 Fletcher Argue Building, 15 Chancellor Circle<nowiki><br></nowiki> The University of Manitoba, Winnipeg, MB, R3T 2N2<nowiki><br></nowiki> CANADA<nowiki><br></nowiki> <nowiki><br></nowiki> dried.fish.matters@umanitoba.ca" -acknowledgements:"<nowiki><img src=\"SSHRC_CRSH_logo.svg\"></nowiki><nowiki><br></nowiki> This work draws on research supported by the Social Sciences and Humanities Research Council of Canada." -license:"<nowiki><img src=\"CC-BY-SA_icon.svg\"></nowiki><nowiki><br></nowiki> This work is licensed under the Creative Commons AttributionShareAlike 4.0 International License. To view a copy of this license, visit <nowiki><a href=\"http://creativecommons.org/licenses/by-sa/4.0/\"> creativecommons.org/licenses/by-sa/4.0/ </a></nowiki>." | ||
+ | |||
+ | === Ebook convert script example === | ||
+ | Here is a fully working example of the calibre <code>ebook-convert</code> script. This can be saved and run from a shell script file; no customizations are needed. | ||
+ | ebook-convert index.html report.pdf --output-profile tablet --extra-css "*{font-family:\"Georgia\";text-align:justify} p, dl, ol, ul, h1, h2, h3, h4, h5, h6{ max-width:80%; box-sizing: border-box;} p, li, dd, dt{line-height: 1.6em; margin-top:0.4em; margin-bottom:0.4em;} #frontmatter p {margin:2em auto; max-width:100%} div{ margin: 0pt; border: 0pt; padding:0pt;} img{margin-bottom:1em;} .fullwidth{width:100%; height:auto;} .thumbcaption{color:grey;} .thumb{page-break-inside:avoid} article table{font-size:small; page-break-inside:avoid; border-collapse:collapse; vertical-align:top; text-align:center; margin-bottom:2em;} th{border-bottom:solid 1pt black;border-top:solid 1pt black;} td{vertical-align:top} article td, article th{padding:0.4em 0.5em;text-align:center;} caption{color:grey; margin-bottom:1em; margin-top:1em; font-size:normal} .thumb{margin:1em auto;} .reference::before{content:\" \"} a{text-decoration:none;} h2{margin: 2em 0;} h1,h2,h3,h4,h5,h6{color:royalblue;} .calibre-pdf-toc .level-0, .calibre-pdf-toc .level-1 {font-size: initial;}" --chapter "//h:h1" --chapter-mark both --level1-toc "//h:h2" --level2-toc "//h:h3" --toc-threshold 12 --pdf-add-toc --toc-title "Contents" --base-font-size 12 --pdf-default-font-size 16 --pdf-footer-template "<nowiki><footer style=\"justify-content: space-between;margin:0 0.4em; \"> <p>_TITLE_</p></nowiki><nowiki><p><span>_PAGENUM_</span></nowiki><nowiki></p></nowiki> <nowiki><script> if (_PAGENUM_ == 0) { document.currentScript.parentNode.innerHTML = ''</nowiki>} else { document.currentScript.parentNode.querySelector('span').innerHTML = _PAGENUM_; } <nowiki></script></nowiki><nowiki></footer></nowiki>" --pdf-page-number-map "if (n < 2) 0; else n - 2;" --cover cover.png --pdf-hyphenate | ||
+ | |||
+ | ==Step 6. Distribute== | ||
+ | |||
+ | #Add to Zotero | ||
+ | ##Create a new "Report" item in the collection "*DFM Reports and publications > DFM Working Papers" of the DFM Zotero group library. | ||
+ | ##Fill in the metadata: Title, authors, report number, report type ("Working Paper"), series title ("Dried Fish Matters"), institution ("The University of Manitoba / <partner organization>", and date. | ||
+ | ##In the "Extra" field, enter the text "<code>cover: <filename></code>" where <filename> is the name of the cover image file on the DFM wiki. For example: <code>cover: File:Gujarat policy review report cover.svg</code> | ||
+ | ##Upload the PDF file as an attachment. | ||
+ | #Update the DFM Working Papers listing | ||
+ | ##Run the command: <code>python pwb.py zotero2wiki -key:QJiaTK7SzNDiuMELwlPggobh -user_id:2183860 -collection:BXHG7UDL -library:group -pagename:'DFM Working Papers'</code> | ||
+ | #Publish to the DFM website | ||
+ | ##Run the <code>wiki2html</code> bot script. (This can also be run from a shell script.) | ||
+ | ##Upload to the server. From the directory containing the wiki export, run the command: <code>rsync -avz ethrift@driedfishmatters.org/public_html/pub/</code> | ||
+ | |||
+ | === Wiki2html script === | ||
+ | python pwb.py wiki2html -category:Public -address:"<nowiki><p><b>Dried Fish Matters Project</b></nowiki><nowiki><BR></nowiki> Department of Anthropology, Faculty of Arts<nowiki><BR></nowiki> 432 Fletcher Argue Building, 15 Chancellor Circle<nowiki><BR></nowiki> The University of Manitoba, Winnipeg, MB, R3T 2N2<nowiki><BR></nowiki> CANADA<nowiki></p></nowiki><nowiki><p>dried.fish.matters@umanitoba.ca</p></nowiki>" -credits:"This website draws on research supported by the Social Sciences and Humanities Research Council of Canada." -homepage:"<nowiki>https://driedfishmatters.org</nowiki>" -logo_url:<nowiki>https://driedfishmatters.org/dfm/wp-content/uploads/2020/08/DFM-LOGO_500px-1.png</nowiki> | ||
+ | |||
+ | ==Step 7. Publicize== | ||
+ | |||
+ | #Post to the DFM blog - See examples of prior announcements. | ||
+ | #Post to Twitter | ||
+ | #Post to the DFM mailing list | ||
+ | #Send message to authors | ||
+ | |||
+ | == Additional Steps == | ||
+ | New updates to the website and Zotero requires the PDF file of the report to be uploaded to the website (https://driedfishmatters.org/dfm/wp-admin/media-new.php), instead of attaching the PDF unto the Zotero entry, attach the newly generated file link. |
Latest revision as of 12:12, 7 December 2023
This document describes a workflow for converting manuscripts received in Microsoft Word format to digital publications that are disseminated through DFM's various online platforms.
Overview
Target outputs
Given a source manuscript, we would like to end up with the following outputs:
- Editable, version-controlled source text of the working paper (report) on the DFM Wiki
- Copies of all images used in the report, along with description and licensing metadata, on the DFM Wiki
- Metadata for all bibliographic references in the report within the DFM Zotero library
- A web-browsable version of the report on the DFM public-facing website (driedfishmatters.org)
- A downloadable PDF version of the report in the DFM Zotero library
- A brief description of the report, with cover thumbnail/preview and links to web and PDF versions, in the Working Papers listing on the DFM Wiki and public website.
Rationale
If our only goal is to distribute a manuscript shared by a research team, the simplest approach is to use the "Export as PDF" function within Microsoft Word to generate a shareable document, which can then be disseminated through the DFM mailing list and website.
The various outputs listed above are intended to maximize options for the dissemination and re-use of project data across multiple platforms. For example, this allows us to locate and re-use images embedded in various reports through the image catalogue contained within the DFM Wiki. (If the same image is used within a subsequent report, our image fingerprinting tools should be able to locate the existing version in our catalogue.)
These goals support to the FAIR principles for data management, which are explicitly mentioned within the SSHRC Data Management Policy:
- Findability - data (including images and tabular data embedded within reports) can be searched and located in a catalogue; they have persistent identifiers
- Accessibility - users are able to access the data from a trusted source
- Interoperability - data can be used across different platforms and systems
- Reusability - data have usage licenses, provenance details, and technical features supporting reuse in future research or other works
In addition to supporting these principles, this workflow implements mechanisms to streamline the design and copy-editing of reports. For example, we end up with a series of reports with the same branding; consistent citation, table, and figure formatting; and valid reference data for in-text citations and image captions.
Step 1. Copy-edit and format the source document
Before converting a Word document to other formats, we need to ensure that the layout is clean and semantically valid. If authors have used direct formatting instead of semantically defined styles (e.g., boldface type for headings), the document will not convert well into wikitext, HTML, epub, or PDF. In this step we will conduct some basic copy-editing of the document.
- Correct formatting
- Go to File > Check for issues > Inspect document. Select everything then press the "Inspect" button. If there are any comments, tracked changes, document properties, etc., click "Remove All".
- Ensure that headings follow a nested outline: Heading 2 (chapter), Heading 3 (subsection), Heading 4 (sub-subsection).
- Remove extra line breaks (between paragraphs, before and after headings or tables, line breaks used instead of page breaks, etc.)
- Review figures
- If pictures are inside of frames, cut and paste back into the document so they are no longer contained within a frame. (Complex layouts in Word will break when converted to other formats.)
- Ensure that each image in the document has a caption that fully describes the image and includes attribution (source and license data). Place the caption text for images in the alt text field for each image (Right click and "Edit alt text"). A group of images with an individual caption should be split up, so that each image has an individual caption.
- Graphs and Smart Art need to be embedded as images, instead of as editable objects. Cut and paste back into the document, selecting "paste as image".
- Review citations
- Check all in-text citations to ensure they are linked to the DFM Zotero library; create or edit citations as needed. NOTE: It is possible that contributors will have used Zotero to include citations, but will have linked the citations to a private library rather than the DFM group library. This will NOT work as we have no way of retrieving the reference data from a private library. Citations will need to be transferred to the DFM group library and updated manually in the manuscript. Citations can also be updated in the DFM Wiki; see Help:Adding Zotero citations and Help:Importing text with Zotero citations from a word processor.
- Navigate to Zotero > Document Preferences > Switch to a different word processor. This will convert all the citations into hyperlinks.
Step 2. Upload to the wiki
The docx2wiki
bot script can update the image database with hash values for any new (unrecognized) images, upload new images from the current document to the wiki, then upload the document text. Alternatively, the text of the report can be copied and pasted manually into a wiki page, and the images uploaded through the Upload Wizard then inserted individually into the report wiki page along with their associated captions.
- Run the
docx2wiki
command.- Use the
-pagename
option to provide the target pagename on the wiki (i.e., the title of the document) - Use the
-input
option to provide the path to the docx manuscript. - Optionally, use the
-db
option to provide the path to an image fingerprint database.
- Use the
- If successful, you will see the message "
Page [[<pagename>]] saved
".
Here is an example of the command:
python pwb.py docx2wiki -pagename:"<REPORT TITLE>" -input:"../report/<FILENAME>.docx" -nohashes:true
The command should work fairly reliably, however it will give errors if there are any images that do not contain a caption set in the alt text field. Note also that images in unknown formats will be ignored; currently the only recognized formats are JPEG and PNG.
For a working example of this script, see the version on SharePoint.
Step 3. Review and clean up the wiki page
- Insert a metadata template
- At the top of the document, insert Template:Report metadata. Fill in the required fields: authors (separated with ampersands), abstract, series, number in the series, and institution.
- Check table formatting
- Manually check the tables for formatting errors. Sometimes Word can store incorrect table layout data if columns, rows, or cells have been merged and re-split, or otherwise edited in non-standard ways. If this is an issue, delete the table then copy and paste from the source document into the wiki visual editor.
- If the table has a caption, move the table caption text into the actual table caption. (Word formats table captions as paragraphs.)
- Convert table headers to "Header Cell". (Word formats table headers as regular cells.)
- Check citation formatting
- Check citations. If the citation data could not be retrieved from the DFM Zotero library (there will be an error printed in red), locate the broken citation and edit the Zotero citation template within the citation field as needed. Typically, this type of error will be caused by a citation that is missing from the DFM library. Occasionally the error message will indicate there was a communication error with the Zotero server; purging the page (or saving again after making further edits) may fix the issue.
- Add a "Notes" header at the bottom of the document.
- If there were any footnotes or endnotes in the Word manuscript, convert those to footnotes in the wiki using the tool Cite > Basic and copying/pasting the footnote text into a note at the correct location in the document.
Step 4. Create a cover
- Download one of the report covers from the DFM Wiki to use as a template. For example, the file linked in the thumbnail is the cover for the Myanmar Dried Fish Consumption Survey.
- Open the SVG file in Inkscape and modify the text (title and authors), background image, and partner organization logo. The background image in the template is cropped using a clipping shape; right-click then "Release Clip", insert a new image, send it below the shape (PGDN), and right-click then "Set Clip".
- Upload the source file to the DFM Wiki, placing it in Category:Report covers.
- From Inkscape, export the image as PNG, with a resolution of 96 dpi for web output. (This should give an image of 816x1056 pixels.)
Step 5. Generate PDF
It is possible to generate a very basic PDF using the wiki2pdf script on our server; click on the "Print view" link in the sidebar of a page in the wiki to create a basic printable version, then print/save as PDF from the browser. We have experimented with some different stylesheets to support printing to more of a book or report-style format, however it is difficult to automate the production of a document that contains a cover, front matter, and table of contents, and we need additional logic to process images and links.
The conversion to PDF for the most recent reports has been done as a semi-automated, two-stage process. First, we retrieve the page from the DFM Wiki along with all the original-resolution images. (If the Word manuscript contained low-resolution versions of images already in the DFM Wiki, the higher-resolution ones will be used instead.) Second, we run the source files through the open source Calibre conversion utility, with a supplied stylesheet, to create a formatted PDF document.
IMPORTANT: The version of Calibre in the Ubuntu repositories may give errors due to incorrect dependencies. Install directly from the project website, using the instructions at https://calibre-ebook.com/download_linux.
- Create a working folder.
- Copy the cover image (PNG format) to this folder and save with the filename
cover.png
. - Copy or download the files
SSHRC_CRSH_logo.svg
andCC-BY-SA_icon.svg
to this folder (or any other images that are referenced in the license and acknowledgement text for the report).
- Copy the cover image (PNG format) to this folder and save with the filename
- Run the
report
bot script.- Set
-title
to the page title corresponding to the report on the DFM Wiki. - Set
-outdir
to the directory path for the working folder. - Set
-zotero_library
to 2183860 - Set
-address
to the DFM project address, using<BR>
codes for line breaks - Set
-acknowledgements
to the SSHRC acknowledgement message, plus any additional text appropriate to include in the front matter. - Set
-license
to the licensing message for Creative Commons BY-SA 4.0 International or other appropriate license.
- Set
- Run the
ebook_convert
script.
Report script example
Here is a fully working example of the report
bot script. In general, only the -title
and -outdir
options need to be customized; the rest of the command can be used as-is.
python pwb.py report -title:"Dried fish in Cambodia: Literature review" -outdir:"/mnt/c/Users/Eric/Downloads/KHM/" -zotero_library:2183860 -address:"Dried Fish Matters Project<br> Department of Anthropology, Faculty of Arts <br> 432 Fletcher Argue Building, 15 Chancellor Circle<br> The University of Manitoba, Winnipeg, MB, R3T 2N2<br> CANADA<br> <br> dried.fish.matters@umanitoba.ca" -acknowledgements:"<img src=\"SSHRC_CRSH_logo.svg\"><br> This work draws on research supported by the Social Sciences and Humanities Research Council of Canada." -license:"<img src=\"CC-BY-SA_icon.svg\"><br> This work is licensed under the Creative Commons AttributionShareAlike 4.0 International License. To view a copy of this license, visit <a href=\"http://creativecommons.org/licenses/by-sa/4.0/\"> creativecommons.org/licenses/by-sa/4.0/ </a>."
Ebook convert script example
Here is a fully working example of the calibre ebook-convert
script. This can be saved and run from a shell script file; no customizations are needed.
ebook-convert index.html report.pdf --output-profile tablet --extra-css "*{font-family:\"Georgia\";text-align:justify} p, dl, ol, ul, h1, h2, h3, h4, h5, h6{ max-width:80%; box-sizing: border-box;} p, li, dd, dt{line-height: 1.6em; margin-top:0.4em; margin-bottom:0.4em;} #frontmatter p {margin:2em auto; max-width:100%} div{ margin: 0pt; border: 0pt; padding:0pt;} img{margin-bottom:1em;} .fullwidth{width:100%; height:auto;} .thumbcaption{color:grey;} .thumb{page-break-inside:avoid} article table{font-size:small; page-break-inside:avoid; border-collapse:collapse; vertical-align:top; text-align:center; margin-bottom:2em;} th{border-bottom:solid 1pt black;border-top:solid 1pt black;} td{vertical-align:top} article td, article th{padding:0.4em 0.5em;text-align:center;} caption{color:grey; margin-bottom:1em; margin-top:1em; font-size:normal} .thumb{margin:1em auto;} .reference::before{content:\" \"} a{text-decoration:none;} h2{margin: 2em 0;} h1,h2,h3,h4,h5,h6{color:royalblue;} .calibre-pdf-toc .level-0, .calibre-pdf-toc .level-1 {font-size: initial;}" --chapter "//h:h1" --chapter-mark both --level1-toc "//h:h2" --level2-toc "//h:h3" --toc-threshold 12 --pdf-add-toc --toc-title "Contents" --base-font-size 12 --pdf-default-font-size 16 --pdf-footer-template "<footer style=\"justify-content: space-between;margin:0 0.4em; \"> <p>_TITLE_</p><p><span>_PAGENUM_</span></p> <script> if (_PAGENUM_ == 0) { document.currentScript.parentNode.innerHTML = ''} else { document.currentScript.parentNode.querySelector('span').innerHTML = _PAGENUM_; } </script></footer>" --pdf-page-number-map "if (n < 2) 0; else n - 2;" --cover cover.png --pdf-hyphenate
Step 6. Distribute
- Add to Zotero
- Create a new "Report" item in the collection "*DFM Reports and publications > DFM Working Papers" of the DFM Zotero group library.
- Fill in the metadata: Title, authors, report number, report type ("Working Paper"), series title ("Dried Fish Matters"), institution ("The University of Manitoba / <partner organization>", and date.
- In the "Extra" field, enter the text "
cover: <filename>
" where <filename> is the name of the cover image file on the DFM wiki. For example:cover: File:Gujarat policy review report cover.svg
- Upload the PDF file as an attachment.
- Update the DFM Working Papers listing
- Run the command:
python pwb.py zotero2wiki -key:QJiaTK7SzNDiuMELwlPggobh -user_id:2183860 -collection:BXHG7UDL -library:group -pagename:'DFM Working Papers'
- Run the command:
- Publish to the DFM website
- Run the
wiki2html
bot script. (This can also be run from a shell script.) - Upload to the server. From the directory containing the wiki export, run the command:
rsync -avz ethrift@driedfishmatters.org/public_html/pub/
- Run the
Wiki2html script
python pwb.py wiki2html -category:Public -address:"<p><b>Dried Fish Matters Project</b><BR> Department of Anthropology, Faculty of Arts<BR> 432 Fletcher Argue Building, 15 Chancellor Circle<BR> The University of Manitoba, Winnipeg, MB, R3T 2N2<BR> CANADA</p><p>dried.fish.matters@umanitoba.ca</p>" -credits:"This website draws on research supported by the Social Sciences and Humanities Research Council of Canada." -homepage:"https://driedfishmatters.org" -logo_url:https://driedfishmatters.org/dfm/wp-content/uploads/2020/08/DFM-LOGO_500px-1.png
Step 7. Publicize
- Post to the DFM blog - See examples of prior announcements.
- Post to Twitter
- Post to the DFM mailing list
- Send message to authors
Additional Steps
New updates to the website and Zotero requires the PDF file of the report to be uploaded to the website (https://driedfishmatters.org/dfm/wp-admin/media-new.php), instead of attaching the PDF unto the Zotero entry, attach the newly generated file link.