UNPKG

16.2 kBMarkdownView Raw
1# πŸ“‘ Schema
2
3**Extensions to schema.org to support semantic, composable, parameterize-able and executable documents**
4
5<br>
6
7[![Build Status](https://dev.azure.com/stencila/stencila/_apis/build/status/stencila.schema?branchName=master)](https://dev.azure.com/stencila/stencila/_build/latest?definitionId=9&branchName=master)
8[![Code coverage](https://codecov.io/gh/stencila/schema/branch/master/graph/badge.svg)](https://codecov.io/gh/stencila/schema)
9[![Netlify](https://img.shields.io/netlify/b0e0d714-29f1-4ad1-8a7d-1af7799fb85b)](https://app.netlify.com/sites/stencila-schema/deploys)
10[![Community](https://img.shields.io/badge/join-community-green.svg)](https://discord.gg/uFtQtk9)
11
12| | | |
13| --------------------- | ----------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
14| JSON-LD | [![Context](https://img.shields.io/badge/json--ld-%40context-success)](https://schema.stenci.la/stencila.jsonld) | [![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://schema.stenci.la) |
15| JSON Schema | [![Schema](https://img.shields.io/badge/json%20schema-v1-success)](https://unpkg.com/browse/@stencila/schema@1/dist/) | [![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://schema.stenci.la) |
16| TypeScript/JavaScript | [![NPM](https://img.shields.io/npm/v/@stencila/schema.svg?style=flat)](https://www.npmjs.com/package/@stencila/schema) | [![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://stencila.github.io/schema/ts/docs) |
17| Python | [![PyPI](https://img.shields.io/pypi/v/stencila-schema.svg)](https://pypi.org/project/stencila-schema) | [![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://stencila.github.io/schema/python/docs) |
18| R | [![CRAN](https://www.r-pkg.org/badges/version/stencilaschema)](https://cran.r-project.org/web/packages/stencilaschema/) | [![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://stencila.github.io/schema/r/docs) |
19| Rust | [![Crates.io](https://img.shields.io/crates/v/stencila-schema)](https://crates.io/crates/stencila-schema) | [![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://docs.rs/stencila-schema/latest/stencila_schema/) |
20
21<br>
22
23## πŸ‘‹ Introduction
24
25This is the Stencila Schema, an extension to [schema.org](https://schema.org) to support semantic, composable, parameterize-able and executable documents (we call them _stencils_ for short). It also provides implementations of schema.org types (and our extensions) for several languages including JSON Schema, Typescript, Python and R. It is a central part of our platform that is used widely throughout our open-source tools as the data model for executable documents.
26
27### Why an extension to schema.org?
28
29Schema.org is _"a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond."_. Schema.org is is used by most major search engines to provide richer, more semantic, search results. More and more web sites are using the schema.org vocabulary and there is increasing uptake in the research community e.g. bioschemas.org, codemeta.github.io
30
31The [schema.org vocabulary](https://schema.org/docs/full.html) encompasses many varied concepts and topics. Of particular relevance to Stencila are types for research outputs such as [`ScholarlyArticle`](https://schema.org/CreativeWork), [`Dataset`](https://schema.org/Dataset) and [`SoftwareSourceCode`](https://schema.org/SoftwareSourceCode) and their associated meta data e.g. [`Person`](https://schema.org/Person), [`Organization`](https://schema.org/Organization), and [`Organization`](https://schema.org/Organization).
32
33However, schema.org does not provide types for the _content_ of research articles. This is where our extensions come in. This schema adds types (and some properties to existing types) to be able to represent a complete executable, research article. These extensions types include "static" _nodes_ such as [`Paragraph`](https://schema.stenci.la/paragraph), [`Heading`](https://schema.stenci.la/heading) and [`Figure`](https://schema.stenci.la/figure), and "dynamic" nodes involved in execution such as [`CodeChunk`](https://schema.stenci.la/codechunk) and [`Parameter`](https://schema.stenci.la/parameter).
34
35### It's about names, not formats
36
37An important aspect of schema.org and similar vocabularies are that they really just define a shared way of naming things. They are format agnostic. As schema.org says, it can be used with _"many different encodings, including RDFa, Microdata and JSON-LD"_.
38
39We extend this philosophy to the encoding of executable articles, allowing them to be encoded in several existing document formats. For example, the following very small [`Article`](https://schema.stenci.la/article), containing only one [`Paragraph`](https://schema.stenci.la/paragraph), and with no metadata, can be represented in Markdown:
40
41```md
42Hello world!
43```
44
45as YAML,
46
47```yaml
48type: Article
49content:
50 - type: Paragraph
51 content:
52 - Hello world!
53```
54
55as a Jupyter Notebook,
56
57```json
58{
59 "nbformat": 4,
60 "nbformat_minor": 4,
61 "metadata": {
62 "title": ""
63 },
64 "cells": [
65 {
66 "cell_type": "markdown",
67 "metadata": {},
68 "source": ["Hello world!"]
69 }
70 ]
71}
72```
73
74as JSON-LD,
75
76```json
77{
78 "@context": "https://schema.stenci.la/v1/jsonld/",
79 "type": "Article",
80 "content": [
81 {
82 "type": "Paragraph",
83 "content": ["Hello world!"]
84 }
85 ]
86}
87```
88
89or as HTML with Microdata,
90
91```html
92<article itemscope="" itemtype="https://schema.org/Article">
93 <p itemscope="" itemtype="https://schema.stenci.la/Paragraph">Hello world!</p>
94</article>
95```
96
97This repository does not deal with format conversion per se. Please see [Encoda](https://github.com/stencila/encoda) for that. However, when developing our schema.org extensions, we aimed to not reinvent the wheel and maintain consistency and compatibility with existing _schemas_ for representing document content. Those include:
98
99- [JATS XML](https://jats.nlm.nih.gov/)
100- [MDAST](https://github.com/syntax-tree/mdast)
101- [Open Document Format](http://docs.oasis-open.org/office/v1.2/OpenDocument-v1.2-part1.html)
102- [Pandoc Types](https://github.com/jgm/pandoc-types)
103
104### But, sometimes (often) we need more than just names
105
106Despite its name, schema.org does not define strong rules around the _shape_ of data, as say a database schema or XML schema does. All the properties of schema.org types are optional, and although they have "expected types", this is not enforced. In addition, properties can be singular values or array, but always have a singular name. For example, a `Article` has a `author` property which could be undefined, a string, a `Person` or an `Organization`, or an array of `Person` or `Organization` items.
107
108This flexibility makes a lot of sense for the primary purpose of schema.org: semantic annotation of other content. However, for use as an internal data model, as in Stencila, it can result in a lot of defensive code to check exactly which of these alternatives a property value is. And writing more code than you need to is A Bad Thingβ„’.
109
110Instead, we wanted a schema that placed some restrictions on the shape of executable documents. This has flow on benefits for developer experience such as type inference and checking. To achieve this the Stencila Schema defines schema.org types using JSON Schema. Yes, that's a lot of "schemas", but bear with us...
111
112### Using JSON Schema for validation and type safety
113
114[JSON Schema](https://json-schema.org/) is _"a vocabulary that allows you to annotate and validate JSON documents"_. It is a draft internet standard, which like schema.org has a growing adoption e.g. [schemastore.org](https://www.schemastore.org/json/).
115
116In Stencila Schema, when we define a type of document node, either a schema.org type, or an extension, we define it,
117
118- as a JSON Schema document, with restrictions on the marginality, type and shape of it's properties
119- using schema.org type and property names, pluralized as appropriate to avoid confusion
120
121For example, an `Article` is defined to have an optional `authors` property (note the `s` this time) which is always an array whose items are either a `Person` or `Organization`.
122
123```json
124{
125 "title": "Article",
126 "@id": "schema:Article",
127 "description": "An article, including news and scholarly articles.",
128 "properties": {
129 "authors": {
130 "@id": "schema:author",
131 "description": "The authors of this creative work.",
132 "type": "array",
133 "items": {
134 "anyOf": [
135 {
136 "$ref": "Person.schema.json"
137 },
138 {
139 "$ref": "Organization.schema.json"
140 }
141 ]
142 }
143 }
144...
145
146```
147
148_To keep things simpler, this is a stripped down version of the actual[`Person.schema.json`](https://schema.stenci.la/Person.schema.json)._
149
150With a JSON Schema, we are able to:
151
152- use a JSON Schema validator to check that content meets the schema
153- generate types (i.e. `interface` and `class` elements) matching the schema in other languages.
154
155### But, JSON Schema can be a pain to write
156
157JSON can be quite fiddly to write by hand. And JSON Schema lacks a way to easily express parent-child relationships between types. For these reasons, we define types using YAML with custom keywords such as `extends` and generate JSON Schema and ultimately bindings for each language from those.
158
159## πŸ“œ Documentation
160
161Documentation is available at https://schema.stenci.la/.
162
163Alternatively, you may want to directly consult the type definitions (`*.yaml` files) and documentation (`*.md` files) in the [`schema`](schema) directory.
164
165## πŸš€ Usage
166
167### JSON-LD context
168
169A JSON-LD `@context` is generated from the JSON Schema sources and published at https://schema.stenci.la/stencila.jsonld.
170
171Individual files are published for each extension type e.g. https://schema.stenci.la/CodeChunk.jsonld and extension property e.g. https://schema.stenci.la/rowspan.jsonld
172
173### Programming language bindings
174
175Binding for this schema, in the form of installable packages, are currently generated for:
176
177- [Python](https://stencila.github.io/schema/python/docs)
178- [R](https://stencila.github.io/schema/r/docs)
179- [Typescript](https://stencila.github.io/schema/ts/docs)
180
181Depending on the capabilities of the host language, these packages expose type definitions as well as utility functions for constructing valid Stencila Schema nodes. Each packages has its own documentation auto-generated from the code.
182
183## πŸ›  Contributing
184
185We πŸ’• contributions! All contributions: ideas πŸ€”, examples πŸ’‘, bug reports πŸ›, documentation πŸ“–, code πŸ’», questions πŸ’¬.
186
187Please see [CONTRIBUTING.md](CONTRIBUTING.md) for a guide on how to contribute to the schema definitions. See the `README.md` files of each language sub-folder e.g. [`python`](python) for advice on development of language bindings and [issue](https://github.com/stencila/schema/issues/256) for how to add you or others to the following _important_ table:
188
189<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
190<!-- prettier-ignore-start -->
191<!-- markdownlint-disable -->
192<table>
193 <tr>
194 <td align="center"><a href="http://has100ideas.com"><img src="https://avatars0.githubusercontent.com/u/57006?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Mac Cowell</b></sub></a><br /><a href="https://github.com/stencila/schema/commits?author=100ideas" title="Code">πŸ’»</a> <a href="#ideas-100ideas" title="Ideas, Planning, & Feedback">πŸ€”</a></td>
195 <td align="center"><a href="http://toki.io"><img src="https://avatars1.githubusercontent.com/u/10161095?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Jacqueline</b></sub></a><br /><a href="https://github.com/stencila/schema/commits?author=jwijay" title="Code">πŸ’»</a> <a href="https://github.com/stencila/schema/commits?author=jwijay" title="Documentation">πŸ“–</a> <a href="#ideas-jwijay" title="Ideas, Planning, & Feedback">πŸ€”</a></td>
196 <td align="center"><a href="https://github.com/beneboy"><img src="https://avatars1.githubusercontent.com/u/292725?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Ben Shaw</b></sub></a><br /><a href="https://github.com/stencila/schema/commits?author=beneboy" title="Code">πŸ’»</a> <a href="#ideas-beneboy" title="Ideas, Planning, & Feedback">πŸ€”</a> <a href="#infra-beneboy" title="Infrastructure (Hosting, Build-Tools, etc)">πŸš‡</a> <a href="https://github.com/stencila/schema/commits?author=beneboy" title="Documentation">πŸ“–</a></td>
197 <td align="center"><a href="http://ketch.me"><img src="https://avatars2.githubusercontent.com/u/1646307?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Alex Ketch</b></sub></a><br /><a href="https://github.com/stencila/schema/commits?author=alex-ketch" title="Code">πŸ’»</a> <a href="https://github.com/stencila/schema/commits?author=alex-ketch" title="Documentation">πŸ“–</a> <a href="#design-alex-ketch" title="Design">🎨</a></td>
198 <td align="center"><a href="https://github.com/nokome"><img src="https://avatars0.githubusercontent.com/u/1152336?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Nokome Bentley</b></sub></a><br /><a href="https://github.com/stencila/schema/commits?author=nokome" title="Code">πŸ’»</a> <a href="https://github.com/stencila/schema/commits?author=nokome" title="Documentation">πŸ“–</a> <a href="#ideas-nokome" title="Ideas, Planning, & Feedback">πŸ€”</a></td>
199 <td align="center"><a href="https://github.com/asisiuc"><img src="https://avatars0.githubusercontent.com/u/17000527?v=4?s=100" width="100px;" alt=""/><br /><sub><b>asisiuc</b></sub></a><br /><a href="https://github.com/stencila/schema/commits?author=asisiuc" title="Code">πŸ’»</a> <a href="#ideas-asisiuc" title="Ideas, Planning, & Feedback">πŸ€”</a></td>
200 <td align="center"><a href="https://github.com/apawlik"><img src="https://avatars2.githubusercontent.com/u/2358535?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Aleksandra Pawlik</b></sub></a><br /><a href="https://github.com/stencila/schema/commits?author=apawlik" title="Code">πŸ’»</a> <a href="https://github.com/stencila/schema/commits?author=apawlik" title="Documentation">πŸ“–</a> <a href="#ideas-apawlik" title="Ideas, Planning, & Feedback">πŸ€”</a></td>
201 </tr>
202 <tr>
203 <td align="center"><a href="https://vsoch.github.io"><img src="https://avatars0.githubusercontent.com/u/814322?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Vanessasaurus</b></sub></a><br /><a href="#ideas-vsoch" title="Ideas, Planning, & Feedback">πŸ€”</a> <a href="https://github.com/stencila/schema/commits?author=vsoch" title="Code">πŸ’»</a></td>
204 <td align="center"><a href="https://github.com/rgieseke"><img src="https://avatars.githubusercontent.com/u/198537?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Robert Gieseke</b></sub></a><br /><a href="#ideas-rgieseke" title="Ideas, Planning, & Feedback">πŸ€”</a> <a href="https://github.com/stencila/schema/commits?author=rgieseke" title="Code">πŸ’»</a> <a href="https://github.com/stencila/schema/commits?author=rgieseke" title="Documentation">πŸ“–</a></td>
205 </tr>
206</table>
207
208<!-- markdownlint-restore -->
209<!-- prettier-ignore-end -->
210
211<!-- ALL-CONTRIBUTORS-LIST:END -->
212
213## πŸ™ Acknowledgments
214
215Thanks to the developers of the existing schemas and open source tools we use, or have been inspired by, including:
216
217- [BioSchemas](https://bioschemas.org/)
218- [CiTO](http://www.sparontologies.net/ontologies/cito)
219- [CodeMeta](https://codemeta.github.io)
220- [JSON Schema](https://json-schema.org/)
221- [Schema.org](https://schema.org)