UNPKG

20.4 kBMarkdownView Raw
1# Dockter : a Docker image builder for researchers
2
3> ✨ Help us [choose a better name](https://github.com/stencila/dockter/issues/37) for this project! ✨
4
5[![Build status](https://travis-ci.org/stencila/dockter.svg?branch=master)](https://travis-ci.org/stencila/dockter)
6[![Code coverage](https://codecov.io/gh/stencila/dockter/branch/master/graph/badge.svg)](https://codecov.io/gh/stencila/dockter)
7[![Greenkeeper badge](https://badges.greenkeeper.io/stencila/dockter.svg)](https://greenkeeper.io/)
8[![NPM](http://img.shields.io/npm/v/@stencila/dockter.svg?style=flat)](https://www.npmjs.com/package/@stencila/dockter)
9[![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://stencila.github.io/dockter/)
10[![Chat](https://badges.gitter.im/stencila/stencila.svg)](https://gitter.im/stencila/stencila)
11
12Docker is a useful tool for creating reproducible computing environments. But creating truly reproducible Docker images can be difficult - even if you already know how to write a `Dockerfile`.
13
14Dockter makes it easier for researchers to create Docker images for their research projects. Dockter generates a `Dockerfile` and builds a image, for _your_ project, based on _your_ source code.
15
16> 🦄 Dockter is in early development. Features that are not yet implemented are indicated by unicorn emoji. Usually they have a link next to them, like this 🦄 [#2](https://github.com/stencila/dockter/issues/2), indicating the relevant issue where you can help make the feature a reality. It's [readme driven development](http://tom.preston-werner.com/2010/08/23/readme-driven-development.html) with calls to action to chase after mythical vaporware creatures! So hip.
17
18<!-- Automatically generated TOC. Don't edit, `make docs` instead>
19
20<!-- toc -->
21
22- [Features](#features)
23 * [Builds a Docker image for your project sources](#builds-a-docker-image-for-your-project-sources)
24 + [R](#r)
25 + [Python](#python)
26 + [Node.js](#nodejs)
27 + [JATS](#jats)
28 + [Jupyter](#jupyter)
29 * [Quicker re-installation of language packages](#quicker-re-installation-of-language-packages)
30 + [An example](#an-example)
31 * [Generates structured meta-data for your project](#generates-structured-meta-data-for-your-project)
32 * [Easy to pick up, easy to throw away](#easy-to-pick-up-easy-to-throw-away)
33- [Install](#install)
34 * [CLI](#cli)
35 + [Windows](#windows)
36 + [MacOS](#macos)
37 + [Linux](#linux)
38 * [Package](#package)
39- [Use](#use)
40 * [Compile a project](#compile-a-project)
41 * [Build a Docker image](#build-a-docker-image)
42 * [Execute a Docker image](#execute-a-docker-image)
43- [Contribute](#contribute)
44- [See also](#see-also)
45- [FAQ](#faq)
46- [Acknowledgments](#acknowledgments)
47
48<!-- tocstop -->
49
50## Features
51
52### Builds a Docker image for your project sources
53
54Dockter scans your project folder and builds a Docker image for it. If the the folder already has a `Dockerfile`, Dockter will build the image from that. If not, Dockter will scan the source code files in the folder and generate one for you. Dockter currently handles R, Python and Node.js source code. A project can have a mix of these languages.
55
56#### R
57
58If the folder contains a R package [`DESCRIPTION`](http://r-pkgs.had.co.nz/description.html) file then Dockter will install the R packages listed under `Imports` into the image. e.g.
59
60```
61Package: myrproject
62Version: 1.0.0
63Date: 2017-10-01
64Imports:
65 ggplot2
66```
67
68The `Package` and `Version` fields are required in a `DESCRIPTION` file. The `Date` field is used to define which CRAN snapshot to use. MRAN daily snapshots began [2014-09-08](https://cran.microsoft.com/snapshot/2014-09-08) so the date should be on or after that.
69
70If the folder does not contain a `DESCRIPTION` file then Dockter will scan all the R files (files with the extension `.R` or `.Rmd`) in the folder for package import or usage statements, like `library(package)` and `package::function()`, and create a `.DESCRIPTION` file for you.
71
72Dockter checks if any of your dependencies (or dependencies of dependencies, or dependencies of...) requires system packages (e.g. `libxml-dev`) and installs those too. No more trial and error of build, fail, add dependency, repeat... cycles!
73
74#### Python
75
76If the folder contains a [`requirements.txt`](https://pip.readthedocs.io/en/1.1/requirements.html) file, or a 🦄 [#4](https://github.com/stencila/dockter/issues/4) [`Pipfile`](https://github.com/pypa/pipfile), Dockter will copy it into the Docker image and use `pip` to install the specified packages.
77
78If the folder does not contain either of those files then Dockter will scan all the folder's `.py` files for `import` statements and create a `.requirements.txt` file for you.
79
80#### Node.js
81
82If the folder contains a [`package.json`](https://docs.npmjs.com/files/package.json) file, Dockter will copy it into the Docker image and use `npm` to install the specified packages.
83
84If the folder does not contain a `package.json` file, Dockter will scan all the folder's `.js` files for `require` calls and create a `.package.json` file for you.
85
86#### JATS
87
88If the folder contains any [JATS](https://en.wikipedia.org/wiki/Journal_Article_Tag_Suite) files (`.xml` files with `<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) ...`), 🦄 [#52](https://github.com/stencila/dockter/issues/52) Docker will scan reproducible elements defined in the [Dar JATS extension](https://github.com/substance/dar/blob/master/DarArticle.md) for any package import statements (e.g. Python `import`, R `library`, or Node.js `require`) and install the necessary packages into the image.
89
90#### Jupyter
91
92If the folder contains any Jupyter [`.ipynb`](http://jupyter.org/) files, 🦄 [#9](https://github.com/stencila/dockter/issues/9) Dockter will scan the code cells in those files for any package import statements (e.g. Python `import`, R `library`, or Node.js `require`) and install the necessary packages into the image. It will also 🦄 [#10](https://github.com/stencila/dockter/issues/10) add the necesary Jupyter kernels to the built Docker image.
93
94
95### Quicker re-installation of language packages
96
97If you have built a Docker image before, you'll know that it can be frustrating waiting for *all* your project's dependencies to reinstall when you simply add or remove one of them.
98
99The reason this happens is that, due to Docker's layered filesystem, when you update a requirements file, Docker throws away all the subsequent layers - including the one where you previously installed your dependencies. That means that all those packages need to get reinstalled.
100
101Dockter takes a different approach. It leaves the installation of language packages to the language package managers: Python's [`pip`](https://pypi.org/project/pip/) , Node.js's `npm`, and R's `install.packages`. These package managers are good at the job they were designed for - to check which packages need to be updated and to only update them. The result is much faster rebuilds, especially for R packages, which often involve compilation.
102
103Dockter does this by looking for a special `# dockter` comment in a `Dockerfile`. Instead of throwing away layers, it executes all instructions after this comment in the same layer - thus reusing packages that were previously installed.
104
105#### An example
106
107Here's a simple motivating [example](fixtures/tests/py-pandas). It's a Python project with a `requirements.txt` file which specifies that the project depends upon `pandas` which, to ensure reproducibility, is pinned to version `0.23.0`,
108
109```
110pandas==0.23.0
111```
112
113The project also has a `Dockerfile` which specifies which Python version we want to use, copies `requirements.txt` into the image, and uses `pip` to install the packages:
114
115```Dockerfile
116FROM python:3.7.0
117
118COPY requirements.txt .
119RUN pip install -r requirements.txt
120```
121
122You can build a Docker image for that project using Docker,
123
124```bash
125docker build .
126```
127
128Docker will download the base Python image (if you don't yet have it), download five packages (`pandas` and it's four dependencies) and install them. This took over 9 minutes when we ran it.
129
130Now, let's say that we want to get the latest version of `pandas` and increment the version in the `requirements.txt` file,
131
132```
133pandas==0.23.1
134```
135
136When we do `docker build .` again to update the image, Docker notices that the `requirements.txt` file has changed and so throws away that layer and all subsequent ones. This means that it will download and install *all* the necessary packages again, including the ones that we previously installed. For a more contrived illustration of this, simply add a space to one of the lines in the `requirements.txt` file and notice how the package install gets repeated all over again.
137
138Now, let's add a special `# dockter` comment to the Dockerfile before the `COPY` directive,
139
140```Dockerfile
141FROM python:3.7.0
142
143# dockter
144
145COPY requirements.xt .
146RUN pip install -r requirements.txt
147```
148
149The comment is ignored by Docker but tells `dockter` to run all subsequent instructions in a single filesystem layer,
150
151```bash
152dockter build .
153```
154
155Now, if you change the `requirements.txt` file, instead of reinstalling everything again, `pip` will only reinstall what it needs to - the updated `pandas` version. The output looks like:
156
157```
158Step 1/1 : FROM python:3.7.0
159 ---> a9d071760c82
160Successfully built a9d071760c82
161Successfully tagged dockter-5058f1af8388633f609cadb75a75dc9d:system
162Dockter 1/2 : COPY requirements.txt requirements.txt
163Dockter 2/2 : RUN pip install -r requirements.txt
164Collecting pandas==0.23.1 (from -r requirements.txt (line 1))
165
166 <snip>
167
168Successfully built pandas
169Installing collected packages: pandas
170 Found existing installation: pandas 0.23.0
171 Uninstalling pandas-0.23.0:
172 Successfully uninstalled pandas-0.23.0
173Successfully installed pandas-0.23.1
174
175```
176
177
178### Generates structured meta-data for your project
179
180Dockter uses [JSON-LD](https://json-ld.org/) as it's internal data structure. When it parses your project's source code it generates a JSON-LD tree using a vocabularies from [schema.org](https://schema.org) and [CodeMeta](https://codemeta.github.io/index.html).
181
182For example, It will parse a `Dockerfile` into a schema.org [`SoftwareSourceCode`](https://schema.org/SoftwareSourceCode) node extracting meta-data about the Dockerfile.
183
184Dockter also fetches meta data on your project's dependencies, which could be used to generate a complete software citation for your project.
185
186```json
187{
188 "name": "myproject",
189 "datePublished": "2017-10-19",
190 "description": "Regression analysis for my data",
191 "softwareRequirements": [
192 {
193 "description": "\nFunctions to Accompany J. Fox and S. Weisberg,\nAn R Companion to Applied Regression, Third Edition, Sage, in press.",
194 "name": "car",
195 "urls": [
196 "https://r-forge.r-project.org/projects/car/",
197 "https://CRAN.R-project.org/package=car",
198 "http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/index.html"
199 ],
200 "authors": [
201 {
202 "name": "John Fox",
203 "familyNames": [
204 "Fox"
205 ],
206 "givenNames": [
207 "John"
208 ]
209 },
210```
211
212### Easy to pick up, easy to throw away
213
214Dockter is designed to make it easier to get started creating Docker images for your project. But it's also designed not to get in your way or restrict you from using bare Docker. You can easily, and individually, override any of the steps that Dockter takes to build an image.
215
216- *Code analysis*: To stop Dockter doing code analysis and take over specifying your project's package dependencies, just remove the leading '.' from the `.DESCRIPTION`, `.requirements.txt` or `.package.json` file that Dockter generates.
217
218- *Dockerfile generation*: Dockter aims to generate readable Dockerfiles that conform to best practices. They include comments on what each section does and are a good way to start learning how to write your own Dockerfiles. To stop Dockter generating a `.Dockerfile`, and start editing it yourself, just rename it to `Dockerfile`.
219
220- *Image build*: Dockter manage builds use a special comment in the `Dockerfile`, so you can stop using Dockter altogether and build the same image using Docker (it will just take longer if you change you project dependencies).
221
222
223## Install
224
225Dockter is available as pre-compiled, standalone command line tool (CLI), or as a Node.js package. In both cases, if you want to use Dockter to build Docker images, you will need to [install Docker](https://docs.docker.com/install/) if you don't already have it.
226
227### CLI
228
229#### Windows
230
231To install the latest release of the `dockter` command line tool, download `dockter-win-x64.zip` for the [latest release](https://github.com/stencila/dockter/releases/) and place it somewhere on your `PATH`.
232
233#### MacOS
234
235To install the latest release of the `dockter` command line tool to `/usr/local/bin` just,
236
237```bash
238curl -L https://unpkg.com/@stencila/dockter/install-latest-macos.sh | bash
239```
240
241Or, if you'd prefer to do things manually, download `dockter-macos-x64.tar.gz` for the [latest release](https://github.com/stencila/dockter/releases/) and then,
242
243```bash
244tar xvf dockter-macos-x64.tar.gz
245sudo mv -f dockter /usr/local/bin
246```
247
248#### Linux
249
250To install the latest release of the `dockter` command line tool to `~/.local/bin/` just,
251
252```bash
253curl -L https://unpkg.com/@stencila/dockter/install-latest-linux.sh | bash
254```
255
256Or, if you'd prefer to do things manually, or place Dockter elewhere, download `dockter-linux-x64.tar.gz` for the [latest release](https://github.com/stencila/dockter/releases/) and then,
257
258```bash
259tar xvf dockter-linux-x64.tar.gz
260mv -f dockter ~/.local/bin/ # or wherever you like
261```
262
263### Package
264
265If you want to integrate Dockter into another application or package, it is also available as a Node.js package :
266
267```bash
268npm install @stencila/dockter
269```
270
271## Use
272
273The command line tool has three primary commands `compile`, `build` and `execute`. To get an overview of the commands available use the `--help` option i.e.
274
275```bash
276dockter --help
277```
278
279To get more detailed help on a particular command, also include the command name e.g
280
281```bash
282dockter compile --help
283```
284
285### Compile a project
286
287The `compile` command compiles a project folder into a specification of a software environment. It scans the folder for source code and package requirement files, parses them, and creates an `.environ.jsonld` file. This file contains the information needed to build a Docker image for your project.
288
289For example, let's say your project folder has a single R file, `main.R` which uses the R package `lubridate` to print out the current time:
290
291```R
292lubridate::now()
293```
294
295Let's compile that project and inspect the compiled software environment. Change into the project directory and run the `compile` command.
296
297```bash
298dockter compile
299```
300
301You should find three new files in the folder created by Dockter:
302
303- `.DESCRIPTION`: A R package description file containing a list of the R packages required and other meta-data
304
305- `.envrion.jsonld`: A JSON-LD document containing structure meta-data on your project and all of its dependencies
306
307- `.Dockerfile`: A `Dockerfile` generated from `.environ.jsonld`
308
309To stop Dockter generating any of these files and start editing it yourself, remove the leading `.` from the name of the file you want to take over creating.
310
311### Build a Docker image
312
313Usually, you'll compile and build a Docker image for your project in one step using the `build` command. This command runs the `compile` command and builds a Docker image from the generated `.Dockerfile` (or handwritten `Dockerfile`):
314
315```bash
316dockter build
317```
318
319After the image has finished building you should have a new docker image on your machine, called `rdate`:
320
321```bash
322> docker images
323REPOSITORY TAG IMAGE ID CREATED SIZE
324rdate latest 545aa877bd8d About a minute ago 766MB
325```
326
327If you want to build your image with bare Docker rename `.Dockerfile` to `Dockerfile` and run `docker build .` instead. This might be a good approach when you have finished the exploratory phase of your project (i.e. there is litte or no churn in your package dependencies) and want to create a more final image.
328
329### Execute a Docker image
330
331You can use Docker to run the created image. Or use Dockter's `execute` command to compile, build and run your image in one step:
332
333```bash
334> dockter execute
3352018-10-23 00:58:39
336```
337
338## Contribute
339
340We 💕 contributions! All contributions: ideas 💡, bug reports 🐛, documentation 🗎, code 💾. See [CONTRIBUTING.md](CONTRIBUTING.md) for more details.
341
342## See also
343
344There are several other projects that create Docker images from source code and/or requirements files including:
345
346- [`alibaba/derrick`](https://github.com/alibaba/derrick)
347- [`jupyter/repo2docker`](https://github.com/jupyter/repo2docker)
348- [`Gueils/whales`](https://github.com/Gueils/whales)
349- [`o2r-project/containerit`](https://github.com/o2r-project/containerit)
350- [`openshift/source-to-image`](https://github.com/openshift/source-to-image)
351- [`ViDA-NYU/reprozip`](https://github.com/ViDA-NYU/reprozip])
352
353Dockter is similar to `repo2docker`, `containerit`, and `reprozip` in that it is aimed at researchers doing data analysis (and supports R) whereas most other tools are aimed at software developers (and don't support R). Dockter differs to these projects principally in that it:
354
355- performs static code analysis for multiple languages to determine package requirements.
356
357- uses package databases to determine package system dependencies and generate linked meta-data (`containerit` does this for R).
358
359- quicker installation of language package dependencies (which can be useful during research projects where dependencies often change).
360
361- by default, but optionally, installs Stencila packages so that Stencila client interfaces can execute code in the container.
362
363`reprozip` and its extension `reprounzip-docker` may be a better choice if you want to share your existing local environment as a Docker image with someone else.
364
365`containerit` might suit you better if you only need support for R and don't want managed packaged installation
366
367`repo2docker` is likely to be better choice if you want to run Jupyter notebooks or RStudio in your container and don't need source code scanning to detect your requirements
368
369If you don't want to build a Docker image and just want a tool that helps determining the package dependencies of your source code check out:
370
371- Node.js: [`detective`](https://github.com/browserify/detective)
372- Python: [`modulefinder`](https://docs.python.org/3.7/library/modulefinder.html)
373- R: [`requirements`](https://github.com/hadley/requirements)
374
375
376## FAQ
377
378*Why go to the effort of generating a JSON-LD intermediate representation instead of writing a Dockerfile directly?*
379
380Having an intermediate representation of the software environment allows this data to be used for other purposes (e.g. software citations, publishing, archiving). It also allows us to reuse much of this code for build targets other than Docker (e.g. Nix) and sources other than code files (e.g. a GUI).
381
382*Why is Dockter a Node.js package?*
383
384We've implemented this as a Node.js package for easier integration into Stencila's Node.js based desktop and cloud deployments.
385
386*Why is Dockter implemented in Typescript?*
387
388We chose Typescript because it's type-checking and type-annotations reduce the number of runtime errors and improves developer experience.
389
390*I'd love to help out! Where do I start?*
391
392See [CONTRIBUTING.md](CONTRIBUTING.md) (OK, so this isn't asked *that* frequently. But it's worth a try eh :woman_shrugging:.)
393
394
395## Acknowledgments
396
397Dockter was inspired by similar tools for researchers including [`binder`](https://github.com/binder-project/binder), [`repo2docker`](https://github.com/jupyter/repo2docker) and [`containerit`](https://github.com/o2r-project/containerit). It relies on many great open source projects, in particular:
398
399 - [`crandb`](https://github.com/metacran/crandb)
400 - [`dockerode`](https://www.npmjs.com/package/dockerode)
401 - [`docker-file-parser`](https://www.npmjs.com/package/docker-file-parser)
402 - [`pypa`](https://warehouse.pypa.io)
403 - [`sysreqsdb`](https://github.com/r-hub/sysreqsdb)
404 - and of course, [Docker](https://www.docker.com/)