Building combined documents in a GitLab pipeline

No problem here, but a learning with the help of the folks on Matrix.

Solution

I’m able to combine various HedgeDocs in a GitLab pipeline using pandoc like so:

image: 
  name: pandoc/latex:latest
  entrypoint: ["/bin/sh", "-c"]

build:
  script:
    - pandoc -f markdown https://hedgedoc.ru/I2lmzjqyjP7oA/download https://hedgedoc.ru/Maasd0C919rg/download -o report.pdf
  artifacts:
    paths:
      - "report.pdf"

First, I’m pulling the official pandoc image from Docker hub, redefining the entry point so that it takes regular command lines. Afterwards I pull to example pads that are EDITABLE and write the output to report.pdf. This file is kept as a pipeline artifact for download.

Scenario ideas

I always wanted to do this with student work: Let them write alone or in groups and regularly collect their work for a book or anthology. It’s also imaginable to have a group write parts for a research application and combine them in a Word file with the same kind of pipeline.

Thanks

Thanks to the community on Matrix that helped me getting this done!

1 Like

that is great! I didn’t realize one could do that!

That is awesome.

I have a workflow for authoring academic documents in markdown called sciquill, which hosts the markdown files on github and uses a github action to build various output types. I recently discovered hedgedoc and am thinking about how to merge the two. your idea is related.

Nice job! I have been doing something with practically the same purpose. Look:

The .gitlab-ci.yml is:

image: texlive/texlive

before_script:
  - export DEBIAN_FRONTEND=noninteractive
  - apt-get -y update && apt-get -y upgrade
  - apt-get -y install curl hunspell hunspell-es linkchecker neofetch pandoc ruby uuid-runtime
  - gem install bibtex-ruby httparty nokogiri
  - mkdir kindlegen && cd kindlegen && wget https://archive.org/download/kindlegen_linux_2.6_i386_v2_9.tar/kindlegen_linux_2.6_i386_v2_9.tar.gz && tar -xvzf kindlegen_linux_2.6_i386_v2_9.tar.gz && mv kindlegen /usr/local/bin/ && cd .. && rm -rf kindlegen
  # Tools for specific purposes, you can ignore them
  - wget https://gitlab.com/snippets/1917492/raw -O /usr/local/bin/baby-biber && chmod +755 /usr/local/bin/baby-biber
  - wget https://gitlab.com/snippets/1917490/raw -O /usr/local/bin/export-pdf && chmod +755 /usr/local/bin/export-pdf
  - wget https://gitlab.com/snippets/1917487/raw -O /usr/local/bin/texti && chmod +755 /usr/local/bin/texti
  # We prefer to do the ebook with this legacy tool for compatibility purposes
  - (cd ~ && mkdir .pecas && cd .pecas && git clone --depth 1 https://gitlab.com/programando-libreros/herramientas/pecas-legacy.git . && bash install.sh) && source ~/.profile

pages:
  stage: deploy
  script:
    - mkdir public/
    # Test 1: gather info about the software and hardware
    - cp index.html public/ && cd public/
    - printf "\n# neofetch\n" >> log.txt
    - neofetch | sed 's/\x1B\[[0-9;\?]*[a-zA-Z]//g' >> log.txt
    - printf "\n# uname -a\n" >> log.txt
    - uname -a >> log.txt
    - printf "\n# apt list --installed\n" >> log.txt
    - apt list --installed >> log.txt
    - printf "\n# ls /sbin\n" >> log.txt
    - ls /sbin >> log.txt
    - printf "\n# ls /bin\n" >> log.txt
    - ls /bin >> log.txt
    - printf "\n# ls /usr/bin\n" >> log.txt
    - ls /usr/bin >> log.txt
    - printf "\n# ls /usr/local/bin\n" >> log.txt
    - ls /usr/local/bin >> log.txt
    - printf "\n# tlmgr list --only-installed\n" >> log.txt
    - tlmgr list --only-installed >> log.txt
    - printf "\n# ruby --version\n" >> log.txt
    - ruby --version >> log.txt
    - printf "\n# gem list\n" >> log.txt
    - gem list >> log.txt
    - printf "\n# kindlegen\n" >> log.txt
    - kindlegen >> log.txt
    - printf "\n# texti\n" >> log.txt
    - texti >> log.txt
    - printf "\n# pc-doctor\n" >> log.txt
    - pc-doctor >> log.txt
    # Test 2: publish an existent repo
    - git clone --depth 1 https://gitlab.com/NikaZhenya/maestria-investigacion.git && cd maestria-investigacion/tesis && ./generate-all && cd ..
    - rm -rf .g* administrativo apuntes bibliografia protocolo && cd ..
    # Test 3: publish a HedgeDoc pad! (actually anything that has raw MD)
    - mkdir pad && cd pad && wget https://pad.programando.li/Pgs01Hr3QgWtgspU6YbhkA/download -O pad.md
    - pandoc pad.md -s -o index.html 
    - pandoc pad.md -o pad.pdf 
    - pandoc pad.md -o pad.epub 
    - pandoc pad.md -o pad.docx 
    #- kindlegen pad.epub # It works, but if it ends with warnings, the job fails
  artifacts:
    paths:
      - public
  only:
    - master

As yo can see, I decided to use the texlive/texlive container because our needs implies a heavy use of latex. I still haven’t add pandoc-citeproc, but it is because we are gonna deploy a container based on texlive container. I think like 40% of the time could be saved if we already have a container with all the needed tools (we are gonna probably add other publishing systems that use MD like jekyll, pelican and hugo)

Another thing to do that I am gonna work this weekend is enable a variable so it can be use with any MD url, like hedgedoc links.

The Test 2 could be of your interest, it is a complete research thesis (spanish) with a site and so on.

Cheers, nice to see ppl working on the same things!

The second publishing revolution has just began :slight_smile: