Reflib
======
Reference library processing for Node.

---

**NOTE: This library is now depreciated in favour of [@IEBH/RefLib](https://github.com/IEBH/RefLib) which is ES6 and Browser compatible, has numerous fixes and is much more efficient**

---


This is the internal component to parse and output reference libraries. If you would like something with a user interface you may wish to look at one of the following:

* [reflib-cli](https://github.com/hash-bang/Reflib-CLI) - The command line interface to Reflib
* [Systematic Review Accelerator](http://crebp-sra.com) - Online tools to manage reference libraries
* [reflib-util](https://github.com/hash-bang/Reflib-utils) - Utilities to work with Reflib references or libraries

This module is the main API, for individual drivers [see the relevent NPM module](https://www.npmjs.com/search?q=reflib).

RefLib currently supports the following format for read / write operations:

* [EndNote XML](https://github.com/hash-bang/Reflib-EndNoteXML)
* [CSV](https://github.com/hash-bang/Reflib-CSV)
* [JSON](https://github.com/hash-bang/Reflib-JSON)
* [MEDLINE / PubMed](https://github.com/hash-bang/Reflib-MEDLINE)
* [RIS](https://github.com/hash-bang/Reflib-RIS)
* [TSV](https://github.com/hash-bang/Reflib-TSV)


This module is a ported version of the original [Reflib for PHP](https://github.com/hash-bang/RefLib)


API
===

parse(driver, content, [options], [callback])
---------------------------------------------
The main parser function. This will take a string, buffer or stream to process and return an emitter which should call `ref` for each reference found.

```javascript
var reflib = require('reflib');
reflib.parse('endnotexml', fs.readFileSync('./test/data/endnote.xml'))
	.on('error', function(err) {
		console.log('ERROR:', err);
	})
	.on('ref', function(ref) {
		console.log('FOUND A REFERENCE', ref);
	})
	.on('progress', function(current, max) {
		console.log('Reading position', current);
	})
	.on('end', function() {
		console.log('All done');
	});
```

The `options` parameter is an optional object of properties.

| Option          | Type    | Default | Description                                                                                       |
|-----------------|---------|---------|---------------------------------------------------------------------------------------------------|
| `fixes`         | Object  | `{}`    | Object containing fixes behaviour to apply to each returned reference                             |
| `fixes.authors` | Boolean | `false` | Apply the behaviour of `reflib.fix.authors(ref)` before returning the reference via event handler |
| `fixes.dates`   | Boolean | `false` | Apply the behaviour of `reflib.fix.dates(ref)` before returning the reference via event handler   |
| `fixes.pages`   | Boolean | `false` | Apply the behaviour of `reflib.fix.pages(ref)` before returning the reference via event handler   |


For example, the below imports a file while enabling all fixes:

```javascript
reflib.parse('endnotexml', fs.readFileSync('./test/data/endnote.xml'), {
	fixes: {
		authors: true,
		dates: true,
		pages: true,
	},
}).on('ref', function(ref) { /* ... */ });
```

If the final, optional `callback` parameter is specified the *entire* library will be returned as an array in the form `callback(error, references)`. Due to the shear size of some libraries this method is **not** recommended unless you know your RAM can safely hold this potentially huge arrray.

```javascript
reflib.parse('endnotexml', fs.readFileSync('./test/data/endnote.xml'), function(err, refs) {
	console.log('Error is', err);
	console.log('Refs are', refs);
});
```

parseFile(path, [options], [callback])
--------------------------------------
This is a shortcut of the `identify()` and `parse()` methods together to have RefLib read and process a file:

```javascript
var reflib = require('reflib');
reflib.parseFile('./test/data/endnote.xml')
	.on('error', function(err) {
		console.log('ERROR:', err);
	})
	.on('ref', function(ref) {
		console.log('FOUND A REFERENCE', ref);
	})
	.on('progress', function(current, max) {
		console.log('Reading position', current);
	})
	.on('end', function() {
		console.log('All done');
	});
```

See the `parse()` function for a description of supported options.

If the final, optional `callback` is specified the function returns in the same way as `parse()`.

NOTE: In order to correctly fire the `progress` event `parseFile()` defaults to using `fs.readFile` instead of `fs.createReadStream()` this is because buffers have a *known* length and streams have an *unknown* length. If you wish to read very large files you may wish to use the `parse()` event with `fs.createReadStream()` manually.

NOTE: Use `reflib.promises.parseFile()` for the promisable version of this function.


output(options)
---------------
Output a reference library.

The options object must at least contain `stream` and `content` properties. Other options supported are:

| Option           | Type                   | Description                                                                                     |
|------------------|------------------------|-------------------------------------------------------------------------------------------------|
| stream           | Stream.Writable stream | The stream object to output content into                                                        |
| format           | String                 | The driver to use when formatting the data                                                      |
| content          | Array or Object or Callback | The reference library to output. If an array each item is used in turn, if an object a single item is output, if a callback this is called with the arguments (next, batchNo) until it returns null. The callback function can return a single object or an array |
| defaultType      | String                 | Some libraries must have a reference type for each reference, if that is omitted use this value |
| encode           | Callback               | Overridable callback to use on each reference output                                            |
| escape           | Callback               | Overridable callback to use when encoding text                                                  |
| fields           | Undefined, string, array, true | If undefined only supported fields are output, if an array only those specified fields are output, if true all fields even those not recognised are output. If the input is a string it is split into an array as a CSV |


See the output tests of individual drivers for more examples.


outputFile(path, refs, [options], [callback])
----------------------------------
This is a shortcut of the `identify()` and `output()` methods together to have RefLib setup a stream and dump refs into a file.

`refs` can be an array of references, a single object or a callback to provide references. See the `output()` function for more information.

```javascript
var reflib = require('reflib');
reflib.outputFile('./test/data/endnote.xml', refs)
	.on('error', function(err) {
		console.log('ERROR:', err);
	})
	.on('end', function() {
		console.log('All done');
	});
```

The final `callback` parameter is optional. If it is specified it is attached automatically as a listener on the 'error' and 'end' events.

NOTE: Use `reflib.promises.outputFile()` for the promisable version of this function.


identify(path)
--------------
Function to return the supported driver from a file name.

```javascript
reflib.identify('./test/data/endnote.xml');
// -> 'endnotexml'
```

refTypes
--------
A collection of all supported reference types.

NOTE: This is based off the EndNote specification. If anything is missing please contact the author.

```javascript
var reflib = require('reflib');

console.log(reflib.types)
// e.g
// [..., {id: 'journalArticle', title: 'Journal Article'}, ...]
```

promises
--------
Object containing Promise compatible versions for all the internal functionality. e.g. `reflib.promises.parseFile()`


supported
---------
A collection of all supported drivers.

```javascript
var reflib = require('reflib');

console.log(reflib.supported)
// e.g
// {id: 'endnotexml', name: 'EndNote XML file', ext: ['.xml'], driver [object]}
```

fix.authors(reference)
----------------------
Verify that the author information for an incomming reference is correct.

This function will attempt to split mangled author fields up if the `authors` field contains exactly one entry which itself contains the `;` character. Some databases don't split this field up correctly and this fix will attempt to correct the array contents to what it should be.


fix.dates(reference)
--------------------
Attempt to correct the date format of incomming references.

This function has the following behaviour:

1. If the reference has a complete date format (e.g. 15/02/2016) the fields, `date`, `month` and `year` will be created
2. If the reference is missing the full date but contains a `year` and `month` those two fields will be stored with `date` removed
3. If the reference only has a `month` field that will be stored with `date` removed
4. If the reference only has a `year` field that will be stored with `year` removed

In all cases `date` will be a JavaScript date object, `year` will be a JavaScript four digit number, `month` will be the three letter, capitalized month format (e.g. `Jan`, `Dec`).


fix.pages(reference)
--------------------
Attempt to reformat different reference page formats into absolute ones.

For example `123-4` becomes `123-124`


Reference format
================
The following documents the individual reference format used by Reflib.


Reference fields
----------------
Each reference is made up of the following fields. Each field is optional and may or may not be supported by each Reflib driver.

| Field             | Type               | Description              | Aliases |
|-------------------|--------------------|--------------------------|---------|
| recNumber         | Number             | The sorting number of the reference |
| type              | String             | A supported [reference type](#reference-types) (e.g. journalArticle) |
| title             | String             | The reference's main title |
| journal           | String             | The reference's secondary title, this is usually the journal for most published papers |
| authors           | Array (of Strings) | An array of each Author in the originally specified format |
| date              | Date or String     | Depending on how much information can be extracted this could either be a year (e.g. '2015'), a date (e.g. '12th Feb') or a full JS date (if [Moment](http://momentjs.com) understands its format) |
| urls              | Array (of Strings) | An array of each URL for the reference |
| pages             | String             | The page reference, usually in the format `123-4` |
| volume            | String             |
| number            | String             |
| isbn              | String             | | ISSN |
| abstract          | String             |
| label             | String             |
| caption           | String             |
| notes             | String             |
| address           | String             |
| researchNotes     | String             |
| keywords          | Array (of Strings) | Any tags that apply to the reference | tags |
| accessDate        | String             |
| accession         | String             |
| doi               | String             |
| section           | String             |
| language          | String             |
| researchNotes     | String             |
| databaseProvider  | String             |
| database          | String             |
| workType          | String             |
| custom1           | String             |
| custom2           | String             |
| custom3           | String             |
| custom4           | String             |
| custom5           | String             |
| custom6           | String             |
| custom7           | String             |

Reference Types
---------------
A reference type can be one of the following. Each is translated from and to its individual drivers own supported format (for example if using EndNoteXML 'dataset' is translated to 'Dataset.' with EndNote ID 59 automatically).

```
aggregatedDatabase
ancientText
artwork
audiovisualMaterial
bill
blog
book
bookSection
case
catalog
chartOrTable
classicalWork
computerProgram
conferencePaper
conferenceProceedings
dataset
dictionary
editedBook
electronicArticle
electronicBook
electronicBookSection
encyclopedia
equation
figure
filmOrBroadcast
generic
governmentDocument
grant
hearing
journalArticle
legalRuleOrRegulation
magazineArticle
manuscript
map
music
newspaperArticle
onlineDatabase
onlineMultimedia
pamphlet
patent
personalCommunication
report
serial
standard
statute
thesis
unknown
unpublished
web
```

Credits
=======
Developed in part for the [Bond University Institute for Evidence-Based Healthcare](https://iebh.bond.edu.au).
Please contact [the author](mailto:matt_carter@bond.edu.au) with any issues.
