# MWoffliner

MWoffliner is a tool for making a local offline HTML snapshot of any
online [MediaWiki](https://mediawiki.org) instance. It goes through
all online articles (or a selection if specified) and create the
corresponding [ZIM](https://openzim.org) file. It has mainly been
tested against Wikimedia projects like
[Wikipedia](https://wikipedia.org) and
[Wiktionary](https://wiktionary.org) --- but it should also work for
any recent MediaWiki.

Read [CONTRIBUTING.md](./CONTRIBUTING.md) to know more about
MWoffliner development.

User Help is available in the for a a
[FAQ](https://github.com/openzim/mwoffliner/wiki/Frequently-Asked-Questions).

[![NPM](https://nodei.co/npm/mwoffliner.png)](https://www.npmjs.com/package/mwoffliner)

[![npm](https://img.shields.io/npm/v/mwoffliner.svg)](https://www.npmjs.com/package/mwoffliner)
[![node](https://img.shields.io/node/v/mwoffliner.svg)](https://www.npmjs.com/package/mwoffliner)
[![Docker](https://ghcr-badge.egpl.dev/openzim/mwoffliner/latest_tag?label=container)](https://ghcr.io/openzim/mwoffliner)
[![Build Status](https://github.com/openzim/mwoffliner/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/openzim/mwoffliner/actions/workflows/ci.yml?query=branch%3Amain)
[![codecov](https://codecov.io/gh/openzim/mwoffliner/branch/main/graph/badge.svg)](https://codecov.io/gh/openzim/mwoffliner)
[![CodeFactor](https://www.codefactor.io/repository/github/openzim/mwoffliner/badge)](https://www.codefactor.io/repository/github/openzim/mwoffliner)
[![License](https://img.shields.io/npm/l/mwoffliner.svg)](LICENSE)
[![Join Slack](https://img.shields.io/badge/Join%20us%20on%20Slack%20%23mwoffliner-2EB67D)](https://slack.kiwix.org)

## Features

- Scrape with or without image thumbnail
- Scrape with or without audio/video multimedia content
- S3 cache (optional)
- Image size optimiser / Webp converter
- Scrape all articles in namespaces or title list based
- Specify additional/non-main namespaces to scrape

Run `mwoffliner --help` to get all the possible options.

## Prerequisites

- [Docker](https://docs.docker.com/engine/install/) (or Docker-based engine)
- amd64 architecture

## Installation

The recommended way to install and run `mwoffliner` is using the pre-built Docker container:

```sh
docker pull ghcr.io/openzim/mwoffliner
```

<details>
<summary>Run software locally / Build from source</summary>

### Prerequisites for local execution

- *NIX Operating System (GNU/Linux, macOS, ...)
- [Redis](https://redis.io/)
- [NodeJS](https://nodejs.org/en/) version 24 (we support only one single Node.JS version, other versions might work or not)
- [Libzim](https://github.com/openzim/libzim) (On GNU/Linux & macOS we automatically download it)
- Various build tools which are probably already installed on your
  machine (packages `libjpeg-dev`, `libglu1`, `autoconf`, `automake`, `gcc` on
  Debian/Ubuntu)

... and an online MediaWiki with its API available.

### Installation methods

#### Build your own container

1. Clone the repository locally:

   ```sh
   git clone https://github.com/openzim/mwoffliner.git && cd mwoffliner
   ```

1. Build the image:

   ```sh
   docker build . -f docker/Dockerfile -t ghcr.io/openzim/mwoffliner
   ```

#### Run the software locally using NPM

> [!WARNING]
> Local installation requires several system dependencies (see above). Using the Docker image is strongly recommended to avoid setup issues.

1. Install latest released MWoffliner version from NPM (use `-g` to install globally):

   ```sh
   npm i -g mwoffliner
   ```

   > [!WARNING] 
   > Note that you might need to run this command with the `sudo` command, depending
   how your `npm` / OS is configured. `npm` permission checking can be a bit annoying for a
   newcomer. Please read the documentation carefully if you hit problems: https://docs.npmjs.com/cli/v7/using-npm/scripts#user

</details>

## Usage

### Using Docker (Recommended)

```sh
# Get help
docker run -v $(pwd)/out:/out -ti ghcr.io/openzim/mwoffliner mwoffliner --help
```

```sh
# Create a ZIM for https://bm.wikipedia.org
docker run -v $(pwd)/out:/out -ti ghcr.io/openzim/mwoffliner \
       mwoffliner --mwUrl=https://bm.wikipedia.org --adminEmail=foo@bar.net
```

<details>
<summary>Using NPM / Local Install</summary>

```sh
# Get help
mwoffliner --help
```

```sh
# Create a ZIM for https://bm.wikipedia.org
mwoffliner --mwUrl=https://bm.wikipedia.org --adminEmail=foo@bar.net
```

</details>

To use MWoffliner with a S3 cache, you should provide a S3 URL like
this:
```sh
--optimisationCacheUrl="https://wasabisys.com/?bucketName=my-bucket&keyId=my-key-id&secretAccessKey=my-sac"
```

## Contribute

If you've retrieved mwoffliner source code (e.g. with a git clone of our repo), you can then install and run it locally (including with your local modifications):

```bash
npm i
npm run mwoffliner -- --help
```

Detailed [contribution documentation and guidelines](CONTRIBUTING.md) are available.

## API

MWoffliner provides also an API and therefore can be used as a NodeJS
library. Here a stub example that could go in your index.mjs file:
```javascript
import * as mwoffliner from 'mwoffliner';

const parameters = {
    mwUrl: "https://es.wikipedia.org",
    adminEmail: "foo@bar.net",
    verbose: true,
    format: "nopic",
    articleList: "./articleList"
};
mwoffliner.execute(parameters); // returns a Promise
```

## Background

Complementary information about MWoffliner:

* MediaWiki software is used by thousands of wikis, the most
  famous ones being the Wikimedia ones, including [Wikipedia](https://wikipedia.org).
* MediaWiki is a PHP wiki runtime engine.
* Wikitext is the name of the markup language that MediaWiki uses.
* MediaWiki includes a parser for WikiText into HTML, and this
  parser creates the HTML pages displayed in your browser.
* Have a look at the scraper [functional architecture](docs/functional_architecture.md)

License
-------

[GPLv3](https://www.gnu.org/licenses/gpl-3.0) or later, see
[LICENSE](LICENSE) for more details.

Acknowledgements
--------

This project received funding through [NGI Zero Core](https://nlnet.nl/core), a fund established by [NLnet](https://nlnet.nl/) with financial support from the European Commission's [Next Generation Internet](https://ngi.eu/) program. Learn more at the [NLnet project page](https://nlnet.nl/project/MWOffliner).

[<img width="20%" alt="NLnet foundation logo" src="https://github.com/user-attachments/assets/22233242-ec49-4540-a0af-b70725cedbee" />](https://nlnet.nl/)
[<img width="20%" alt="NGI Zero Logo" src="https://github.com/user-attachments/assets/1bbbda57-dc6f-4902-ae29-236e5e89228f" />](https://nlnet.nl/core)
