# hyperlog-index

Forking indexes for [hyperlog](https://npmjs.com/package/hyperlog)

Built on the map/reduce pattern; hyperlog-index will call a map function on every hyperlog insert, building the index incrementally.

# example

## forking key/value store

Using hyperlog-index, we can easily build a key/value store backed to a hyperlog
that implements a [multi-value register conflict strategy](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type#Others):

``` js
var level = require('level')
var indexer = require('hyperlog-index')
var hyperlog = require('hyperlog')
var sub = require('subleveldown')
var mkdirp = require('mkdirp')

var minimist = require('minimist')
var argv = minimist(process.argv.slice(2), {
  default: { d: '/tmp/kv.db' }
})
mkdirp.sync(argv.d)

var hdb = level(argv.d + '/h')
var idb = level(argv.d + '/i')
var log = hyperlog(hdb, { valueEncoding: 'json' })
var db = sub(idb, 'x', { valueEncoding: 'json' })

var dex = indexer({
  log: log,
  db: sub(idb, 'i'),
  map: function (row, next) {
    // This method reduces our new state. In this example, db is used for the state.
    db.get(row.value.k, function (err, doc) {
      if (!doc) doc = {}
      row.links.forEach(function (link) {
        delete doc[link]
      })
      doc[row.key] = row.value.v
      db.put(row.value.k, doc, next)
    })
  }
})

if (argv._[0] === 'get') {
  dex.ready(function () {
    db.get(argv._[1], function (err, values) {
      if (err) console.error(err)
      else console.log(values)
    })
  })
} else if (argv._[0] === 'put') {
  // Structure `doc` as expected by `map` above
  var doc = { k: argv._[1], v: argv._[2] }
  dex.ready(function () {
    db.get(doc.k, function (err, values) {
      // Link the new entry to the "parents", from the current index, if any
      log.add(Object.keys(values || {}), doc, function (err, node) {
        if (err) console.error(err)
      })
    })
  })
} else if (argv._[0] === 'sync') {
  var r = log.replicate()
  process.stdin.pipe(r).pipe(process.stdout)
  r.on('end', function () { process.stdin.pause() })
}
```

Each key maps to an object of hashes to values:

```
$ node kv.js -d /tmp/db1 put A beep
$ node kv.js -d /tmp/db1 put A boop
$ node kv.js -d /tmp/db1 get A
{ '06e4130fc5f2392cb8bdb065d18eaa523d716f2c61b4877853340a5cc727fb42': 'boop' }
```

Meanwhile, a second database may have additional edits:

```
$ node kv.js -d /tmp/db2 put A whatever
$ node kv.js -d /tmp/db2 put B hey
```

When these two databases are merged together, the key at `A` has two values:

```
$ dupsh 'node kv.js -d /tmp/db1 sync' 'node kv.js -d /tmp/db2 sync'
$ node kv.js -d /tmp/db1 get A
{ '06e4130fc5f2392cb8bdb065d18eaa523d716f2c61b4877853340a5cc727fb42': 'boop',
  cba756b45e279ae5c3f3ebc8cfe0d50e1f2205e37a4443ce9e0e5a41491c234c: 'whatever' }
```

The `B` key has only a single element:

```
$ node kv.js -d /tmp/db1 get B
{ '53a374617fb8839b6f19646d6658188a4fc08d19f35c084dab835847532a3468': 'hey' }
```

This is because `put` does the linking of new nodes to old ones, which is not done in merge.
New updates that link at both existing keys will merge into a single key:

```
$ node kv.js -d /tmp/db1 put A whatboop
$ node kv.js -d /tmp/db1 get A
{ '85915730b3e7a4f715057e74af79b564a5be2ec14d334d344cb84d1544ec6107': 'whatboop' }
```

and these merges can be communicated over replication:

```
$ dupsh 'node kv.js -d /tmp/db1 sync' 'node kv.js -d /tmp/db2 sync'
$ node kv.js -d /tmp/db2 get A
{ '85915730b3e7a4f715057e74af79b564a5be2ec14d334d344cb84d1544ec6107': 'whatboop' }
```

And the index can be destroyed (and recalculated) at any time:

```
$ rm -rf /tmp/db1/i
$ node kv.js -d /tmp/db1 get A
{ '85915730b3e7a4f715057e74af79b564a5be2ec14d334d344cb84d1544ec6107': 'whatboop' }
```

This is a useful strategy when you need to update the code in your indexes.

**Note:** If you run the included example, the value is assumed to be a json object. 
The command line `put` format will be more like this:

```
$ node example/kv.js -d /tmp/db1 put A '{"baap":"boop"}'
```

**Note:** If you are primarily interested in a key/value index, like in this
example - check out [hyperkv](https://www.npmjs.com/package/hyperkv)


# api

``` js
var indexer = require('hyperlog-index')
```

## var dex = indexer(opts)

Create a new hyperlog index instance `dex` from:

* `opts.log` - a hyperlog instance (required)
* `opts.db` - a level instance (required)
* `opts.map` - an indexing function `function (row, next) {}`

You can have as many indexes as you like on the same log, just create more `dex`
instances on sublevels.

## opts.map(row, next)

The indexing function `fn` runs for each `row`. The indexing function should
write its computed indexes to durable storage and call `next(err)` when it is
finished.

## dex.ready(fn)

Registers the callback `fn()` to fire when the indexes have "caught up" to the
latest known change in the hyperlog. The `fn()` function fires exactly once. You
may call `dex.ready()` multiple times with different functions.

## dex.pause()

Pause calculating the indexes. `dex.ready()` will not fire until the indexes
have been resumed.

## dex.resume()

Resume calculation of the indexes after `dex.pause()`.

## dex.on('error', function (err) {})

If the underlying system generates an error, you can catch it here.

# install

With [npm](https://npmjs.org) do:

```
npm install hyperlog-index
```

# license

MIT
