# `/saxen/` parser

[![Build Status](https://travis-ci.com/nikku/saxen.svg?branch=master)](https://travis-ci.com/nikku/saxen)
[![Codecov](https://img.shields.io/codecov/c/github/nikku/saxen.svg)](https://codecov.io/gh/nikku/saxen)


A tiny, super fast, namespace aware [sax-style](https://en.wikipedia.org/wiki/Simple_API_for_XML) XML parser written in plain JavaScript.


## Features

* (optional) entity decoding and attribute parsing
* (optional) namespace aware
* element / attribute normalization in namespaced mode
* tiny (`2.6Kb` minified + gzipped)
* [pretty damn fast](https://github.com/nikku/js-sax-parser-tests)


## Usage

```javascript
var {
  Parser
} = require('saxen');

var parser = new Parser();

// enable namespace parsing: element prefixes will
// automatically adjusted to the ones configured here
// elements in other namespaces will still be processed
parser.ns({
  'http://foo': 'foo',
  'http://bar': 'bar'
});

parser.on('openTag', function(elementName, attrGetter, decodeEntities, selfClosing, getContext) {

  elementName;
  // with prefix, i.e. foo:blub

  var attrs = attrGetter();
  // { 'bar:aa': 'A', ... }
});

parser.parse('<blub xmlns="http://foo" xmlns:bar="http://bar" bar:aa="A" />');
```


## Supported Hooks

We support the following parse hooks:

* `openTag(elementName, attrGetter, decodeEntities, selfClosing, contextGetter)`
* `closeTag(elementName, decodeEntities, selfClosing, contextGetter)`
* `error(err, contextGetter)`
* `warn(warning, contextGetter)`
* `text(value, decodeEntities, contextGetter)`
* `cdata(value, contextGetter)`
* `comment(value, decodeEntities, contextGetter)`
* `attention(str, decodeEntities, contextGetter)`
* `question(str, contextGetter)`

In contrast to `error`, `warn` receives recoverable errors, such as malformed attributes.

In [proxy mode](#proxy-mode), `openTag` and `closeTag` a view of the current element replaces the raw element name. In addition element attributes are not passed as a getter to `openTag`. Instead, they get exposed via the `element.attrs`:

* `openTag(element, decodeEntities, selfClosing, contextGetter)`
* `closeTag(element, selfClosing, contextGetter)`


## Namespace Handling

In namespace mode, the parser will adjust tag and attribute namespace prefixes before
passing the elements name to `openTag` or `closeTag`. To do that, you need to
configure default prefixes for wellknown namespaces:

```javascript
parser.ns({
  'http://foo': 'foo',
  'http://bar': 'bar'
});
```

To skip the adjustment and still process namespace information:

```javascript
parser.ns();
```


## Proxy Mode

In this mode, the first argument passed to `openTag` and `closeTag` is an object that exposes more internal XML parse state. This needs to be explicity enabled by instantiating the parser with `{ proxy: true }`.

```javascript
// instantiate parser with proxy=true
var parser = new Parser({ proxy: true });

parser.ns({
  'http://foo-ns': 'foo'
});

parser.on('openTag', function(el, decodeEntities, selfClosing, getContext) {
  el.originalName; // root
  el.name; // foo:root
  el.attrs; // { 'xmlns:foo': ..., id: '1' }
  el.ns; // { xmlns: 'foo', foo: 'foo', foo$uri: 'http://foo-ns' }
});

parser.parse('<root xmlns:foo="http://foo-ns" id="1" />')
```

Proxy mode comes with a performance penelty of roughly five percent.

__Caution!__ For performance reasons the exposed element is a simple view into the current parser state. Because of that, it will change with the parser advancing and cannot be cached. If you would like to retain a persistent copy of the values, create a shallow clone:

```javascript
parser.on('openTag', function(el) {
  var copy = Object.assign({}, el);
  // copy, ready to keep around
});
```


## Non-Features

`/saxen/` lacks some features known in other XML parsers such as [sax-js](https://github.com/isaacs/sax-js):

* no support for parsing loose documents, such as arbitrary HTML snippets
* no support for text trimming
* no automatic entity decoding
* no automatic attribute parsing

...and that is ok ❤.


## Credits

We build on the awesome work done by [easysax](https://github.com/vflash/easysax).

`/saxen/` is named after [Sachsen](https://en.wikipedia.org/wiki/Saxony), a federal state of Germany. So geht sächsisch!

## LICENSE

MIT
