Chapter 4. Core APIs

There are a lot of APIs in Node, but some of them are more important than others. These core APIs will form the backbone of any Node app, and you’ll find yourself using them again and again.

Events

The first API we are going to look at is the Events API. This is because, while abstract, it is a fundamental piece of making every other API work. By having a good grip on this API, you’ll be able to use all the other APIs effectively.

If you’ve ever programmed JavaScript in the browser, you’ll have used events before. However, the event model used in the browser comes from the DOM rather than JavaScript itself, and a lot of the concepts in the DOM don’t necessarily make sense out of that context. Let’s look at the DOM model of events and compare it to the implementation in Node.

The DOM has a user-driven event model based on user interaction, with a set of interface elements arranged in a tree structure (HTML, XML, etc.). This means that when a user interacts with a particular part of the interface, there is an event and a context, which is the HTML/XML element on which the click or other activity took place. That context has a parent and potentially children. Because the context is within a tree, the model includes the concepts of bubbling and capturing, which allow elements either up or down the tree to receive the event that was called.

For example, in an HTML list, a click event on an <li> can be captured by a listener on the <ul> that is its parent. Conversely, a click on the <ul> can be bubbled down to a listener on the <li>. Because JavaScript objects don’t have this kind of tree structure, the model in Node is much simpler.

EventEmitter

Because the event model is tied to the DOM in browsers, Node created the EventEmitter class to provide some basic event functionality. All event functionality in Node revolves around EventEmitter because it is also designed to be an interface class for other classes to extend. It would be unusual to call an EventEmitter instance directly.

EventEmitter has a handful of methods, the main two being on and emit. The class provides these methods for use by other classes. The on method creates an event listener for an event, as shown in Example 4-1.

Example 4-1. Listening for an event with the on method

server.on('event', function(a, b, c) {
  //do things
});

The on method takes two parameters: the name of the event to listen for and the function to call when that event is emitted. Because EventEmitter is an interface pseudoclass, the class that inherits from EventEmitter is expected to be invoked with the new keyword. Let’s look at Example 4-2 to see how we create a new class as a listener.

Example 4-2. Creating a new class that supports events with EventEmitter

var utils = require('utils'),
    EventEmitter = require('events').EventEmitter;

var Server = function() {
  console.log('init');
};

utils.inherits(Server, EventEmitter);

var s = new Server();

s.on('abc', function() {
  console.log('abc');
});

We begin this example by including the utils module so we can use the inherits method. inherits provides a way for the EventEmitter class to add its methods to the Server class we created. This means all new instances of Server can be used as EventEmitters.

We then include the events module. However, we want to access just the specific EventEmitter class inside that module. Note how EventEmitter is capitalized to show it is a class. We didn’t use a createEventEmitter method, because we aren’t planning to use an EventEmitter directly. We simply want to attach its methods to the Server class we are going to make.

Once we have included the modules we need, the next step is to create our basic Server class. This offers just one simple function, which logs a message when it is initialized. In a real implementation, we would decorate the Server class prototype with the functions that the class would use. For the sake of simplicity, we’ve skipped that. The important step is to use sys.inherits to add EventEmitter as a superclass of our Server class.

When we want to use the Server class, we instantiate it with new Server(). This instance of Server will have access to the methods in the superclass (EventEmitter), which means we can add a listener to our instance using the on method.

Right now, however, the event listener we added will never be called, because the abc event isn’t fired. We can fix this by adding the code in Example 4-3 to emit the event.

Example 4-3. Emitting an event

s.emit('abc');

Firing the event listener is as simple as calling the emit method that the Server instance inherited from EventEmitter. It’s important to note that these events are instance-based. There are no global events. When you call the on method, you attach to a specific EventEmitter-based object. Even the various instances of the Server class don’t share events. s from the code in Example 4-3 will not share the same events as another Server instance, such as one created by var z = new Server();.

Callback Syntax

An important part of using events is dealing with callbacks. Chapter 3 looks at best practices in much more depth, but we’ll look here at the mechanics of callbacks in Node. They use a few standard patterns, but first let’s discuss what is possible.

When calling emit, in addition to the event name, you can also pass an arbitrary list of parameters. Example 4-4 includes three such parameters. These will be passed to the function listening to the event. When you receive a request event from the http server, for example, you receive two parameters: req and res. When the request event was emitted, those parameters were passed as the second and third arguments to the emit.

Example 4-4. Passing parameters when emitting an event

s.emit('abc', a, b, c);

It is important to understand how Node calls the event listeners because it will affect your programming style. When emit() is called with arguments, the code in Example 4-5 is used to call each event listener.

Example 4-5. Calling event listeners from emit

if (arguments.length <= 3) {
  // fast case
  handler.call(this, arguments[1], arguments[2]);
} else {
  // slower
  var args = Array.prototype.slice.call(arguments, 1);
  handler.apply(this, args);
}

This code uses both of the JavaScript methods for calling a function from code. If emit() is passed with three or fewer arguments, the method takes a shortcut and uses call. Otherwise, it uses the slower apply to pass all the arguments as an array. The important thing to recognize here, though, is that Node makes both of these calls using the this argument directly. This means that the context in which the event listeners are called is the context of EventEmitter—not their original context. Using Node REPL, you can see what is happening when things get called by EventEmitter (Example 4-6).

Example 4-6. The changes in context caused by EventEmitter

> var EventEmitter = require('events').EventEmitter,
...     util = require('util');
> 
> var Server = function() {};
> util.inherits(Server, EventEmitter);
> Server.prototype.outputThis= function(output) {
...   console.log(this);
...   console.log(output); 
... };
[Function]
> 
> Server.prototype.emitOutput = function(input) { 
...   this.emit('output', input);
... };
[Function]
> 
> Server.prototype.callEmitOutput = function() {
...   this.emitOutput('innerEmitOutput');
... };
[Function]
> 
> var s = new Server();
> s.on('output', s.outputThis);
{ _events: { output: [Function] } }
> s.emitOutput('outerEmitOutput');
{ _events: { output: [Function] } }
outerEmitOutput
> s.callEmitOutput();
{ _events: { output: [Function] } }
innerEmitOutput
> s.emit('output', 'Direct');
{ _events: { output: [Function] } }
Direct
true
>

The sample output first sets up a Server class. It includes functions to emit the output event. The outputThis method is attached to the output event as an event listener. When we emit the output event from various contexts, we stay within the scope of the EventEmitter object, so the value of this that s.outputThis has access to is the one belonging to the EventEmitter. Consequently, the this variable must be passed in as a parameter and assigned to a variable if we wish to make use of it in event callback functions .

HTTP

One of the core tasks of Node.js is to act as a web server. This is such a key part of the system that when Ryan Dahl started the project, he rewrote the HTTP stack for V8 to make it nonblocking. Although both the API and the internals for the original HTTP implementation have morphed a lot since it was created, the core activities are still the same. The Node implementation of HTTP is nonblocking and fast. Much of the code has moved from C into JavaScript.

HTTP uses a pattern that is common in Node. Pseudoclass factories provide an easy way to create a new server.^[7] The http.createServer() method provides us with a new instance of the HTTP Server class, which is the class we use to define the actions taken when Node receives incoming HTTP requests. There are a few other main pieces of the HTTP module and other Node modules in general. These are the events the Server class fires and the data structures that are passed to the callbacks. Knowing about these three types of class allows you to use the HTTP module well.

Acting as an HTTP server is probably the most common current use case for Node. In Chapter 1, we set up an HTTP server and used it to serve a very simple request. However, HTTP is a lot more multifaceted than that. The server component of the HTTP module provides the raw tools to build complex and comprehensive web servers. In this chapter, we are going to explore the mechanics of dealing with requests and issuing responses. Even if you end up using a higher-level server such as Express, many of the concepts it uses are extensions of those defined here.

As we’ve already seen, the first step in using HTTP servers is to create a new server using the http.createServer() method. This returns a new instance of the Server class, which has only a few methods because most of the functionality is going to be provided through using events. The http server class has six events and three methods. The other thing to notice is how most of the methods are used to initialize the server, whereas events are used during its operation.

Let’s start by creating the smallest basic HTTP server code we can in Example 4-7.

Example 4-7. A simple, and very short, HTTP server

require('http').createServer(function(req,res){res.writeHead(200, {}); 
res.end('hello world');}).listen(8125);

This example is not good code. However, it illustrates some important points. We’ll fix the style shortly. The first thing we do is require the http module. Notice how we can chain methods to access the module without first assigning it to a variable. Many things in Node return a function,^[8] which allows us to invoke those functions immediately. From the included http module, we call createServer. This doesn’t have to take any arguments, but we pass it a function to attach to the request event. Finally, we tell the server created with createServer to listen on port 8125.

We hope you never write code like this in real situations, but it does show the flexibility of the syntax and the potential brevity of the language. Let’s be a lot more explicit about our code. The rewrite in Example 4-8 should make it a lot easier to understand and maintain.

Example 4-8. A simple, but more descriptive, HTTP server

var http = require('http');
var server = http.createServer();
var handleReq = function(req,res){
  res.writeHead(200, {});
  res.end('hello world');
};
server.on('request', handleReq);
server.listen(8125);

This example implements the minimal web server again. However, we’ve started assigning things to named variables. This not only makes the code easier to read than when it’s chained, but also means you can reuse it. For example, it’s not uncommon to use http more than once in a file. You want to have both an HTTP server and an HTTP client, so reusing the module object is really helpful. Even though JavaScript doesn’t force you to think about memory, that doesn’t mean you should thoughtlessly litter unnecessary objects everywhere. So rather than use an anonymous callback, we’ve named the function that handles the request event. This is less about memory usage and more about readability. We’re not saying you shouldn’t use anonymous functions, but if you can lay out your code so it’s easy to find, that helps a lot when maintaining it.

Note

Remember to look at Part I of the book for more help with programming style. Chapters 1 and 2 deal with programming style in particular.

Because we didn’t pass the request event listener as part of the factory method for the http Server object, we need to add an event listener explicitly. Calling the on method from EventEmitter does this. Finally, as with the previous example, we call the listen method with the port we want to listen on. The http class provides other functions, but this example illustrates the most important ones.

The http server supports a number of events, which are associated with either the TCP or HTTP connection to the client. The connection and close events indicate the buildup or teardown of a TCP connection to a client. It’s important to remember that some clients will be using HTTP 1.1, which supports keepalive. This means that their TCP connections may remain open across multiple HTTP requests.

The request, checkContinue, upgrade, and clientError events are associated with HTTP requests. We’ve already used the request event, which signals a new HTTP request.

The checkContinue event indicates a special event. It allows you to take more direct control of an HTTP request in which the client streams chunks of data to the server. As the client sends data to the server, it will check whether it can continue, at which point this event will fire. If an event handler is created for this event, the request event will not be emitted.

The upgrade event is emitted when a client asks for a protocol upgrade. The http server will deny HTTP upgrade requests unless there is an event handler for this event.

Finally, the clientError event passes on any error events sent by the client.

The HTTP server can throw a few events. The most common one is request, but you can also get events associated with the TCP connection for the request as well as other parts of the request life cycle.

When a new TCP stream is created for a request, a connection event is emitted. This event passes the TCP stream for the request as a parameter. The stream is also available as a request.connection variable for each request that happens through it. However, only one connection event will be emitted for each stream. This means that many requests can happen from a client with only one connection event.

HTTP Clients

Node is also great when you want to make outgoing HTTP connections. This is useful in many contexts, such as using web services, connecting to document store databases, or just scraping websites. You can use the same http module when doing HTTP requests, but should use the http.ClientRequest class. There are two factory methods for this class: a general-purpose one and a convenience method. Let’s take a look at the general-purpose case in Example 4-9.

Example 4-9. Creating an HTTP request

var http = require('http');

var opts = {
  host: 'www.google.com'
  port: 80,
  path: '/',
  method: 'GET'
};

var req = http.request(opts, function(res) {
  console.log(res);
  res.on('data', function(data) {
    console.log(data);
  });
});

req.end();

The first thing you can see is that an options object defines a lot of the functionality of the request. We must provide the host name (although an IP address is also acceptable), the port, and the path. The method is optional and defaults to a value of GET if none is specified. In essence, the example is specifying that the request should be an HTTP GET request to http://www.google.com/ on port 80.

The next thing we do is use the options object to construct an instance of http.ClientRequest using the factory method http.request(). This method takes an options object and an optional callback argument. The passed callback listens to the response event , and when a response event is received, we can process the results of the request. In the previous example, we simply output the response object to the console. However, it’s important to notice that the body of the HTTP request is actually received via a stream in the response object. Thus, you can subscribe to the data event of the response object to get the data as it becomes available (see the section Readable streams for more information).

The final important point to notice is that we had to end() the request. Because this was a GET request, we didn’t write any data to the server, but for other HTTP methods, such as PUT or POST, you may need to. Until we call the end() method, request won’t initiate the HTTP request, because it doesn’t know whether it should still be waiting for us to send data.

Making HTTP GET requests

Since GET is such a common HTTP use case, there is a special factory method to support it in a more convenient way, as shown in Example 4-10.

Example 4-10. Simple HTTP GET requests

var http = require('http');

var opts = {
  host: 'www.google.com'
  port: 80,
  path: '/',
};

var req = http.get(opts, function(res) {
  console.log(res);
  res.on('data', function(data) {
    console.log(data);
  });
});

This example of http.get() does exactly the same thing as the previous example, but it’s slightly more concise. We’ve lost the method attribute of the config object, and left out the call request.end() because it’s implied.

If you run the previous two examples, you are going to get back raw Buffer objects. As described later in this chapter, a Buffer is a special class defined in Node to support the storage of arbitrary, binary data. Although it’s certainly possible to work with these, you often want a specific encoding, such as UTF-8 (an encoding for Unicode characters). You can specify this with the response.setEncoding() method (see Example 4-11).

Example 4-11. Comparing raw Buffer output to output with a specified encoding

> var http = require('http');
> var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) { 
... console.log(res); 
... res.on('data', function(c) { console.log(c); }); 
... });
> <Buffer 3c 21 64 6f 63 74 79 70

...

65 2e 73 74>
<Buffer 61 72 74 54 69

...

69 70 74 3e>

>
> var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) { 
... res.setEncoding('utf8'); 
... res.on('data', function(c) { console.log(c); }); 
... });
> <!doctype html><html><head><meta http-equiv="content-type

...

load.t.prt=(f=(new Date).getTime());
})();
</script>

>

In the first case, we do not pass ClientResponse.setEncoding(), and we get chunks of data in Buffers. Although the output is abridged in the printout, you can see that it isn’t just a single Buffer, but that several Buffers have been returned with data. In the second example, the data is returned as UTF-8 because we specified res.setEncoding('utf8'). The chunks of data returned from the server are still the same, but are given to the program as strings in the correct encoding rather than as raw Buffers. Although the printout may not make this clear, there is one string for each of the original Buffers.

Uploading data for HTTP POST and PUT

Not all HTTP is GET. You might also need to call POST, PUT, and other HTTP methods that alter data on the other end. This is functionally the same as making a GET request, except you are going to write some data upstream, as shown in Example 4-12.

Example 4-12. Writing data to an upstream service

var options = {
  host: 'www.example.com',
  port: 80,
  path: '/submit',
  method: 'POST'
};

var req = http.request(options, function(res) {
  res.setEncoding('utf8');
  res.on('data', function (chunk) {
    console.log('BODY: ' + chunk);
  });
});

req.write("my data");
req.write("more of my data");

req.end();

This example is very similar to Example 4-10, but uses the http.ClientRequest.write() method. This method allows you to send data upstream, and as explained earlier, it requires you to explicitly call http.ClientRequest.end() to indicate you’re finished sending data. Whenever ClientRequest.write() is called, the data is sent upstream (it isn’t buffered), but the server will not respond until ClientRequest.end() is called.

You can stream data to a server using ClientRequest.write() by coupling the writes to the data event of a Stream. This is ideal if you need to, for example, send a file from disk to a remote server over HTTP.

The ClientResponse object

The ClientResponse object stores a variety of information about the request. In general, it is pretty intuitive. Some of its obvious properties that are often useful include statusCode (which contains the HTTP status) and header (which is the response header object). Also hung off of ClientResponse are various streams and properties that you may or may not want to interact with directly.

URL

The URL module provides tools for easily parsing and dealing with URL strings. It’s extremely useful when you have to deal with URLs. The module offers three methods: parse, format, and resolve. Let’s start by looking at Example 4-13, which demonstrates parse using Node REPL.

Example 4-13. Parsing a URL using the URL module

> var URL = require('url');
> var myUrl = "http://www.nodejs.org/some/url/?with=query&param=that&are=awesome
#alsoahash";
> myUrl
'http://www.nodejs.org/some/url/?with=query&param=that&are=awesome#alsoahash'
> parsedUrl = URL.parse(myUrl);
{ href: 'http://www.nodejs.org/some/url/?with=query&param=that&are=awesome#alsoahash'
, protocol: 'http:'
, slashes: true
, host: 'www.nodejs.org'
, hostname: 'www.nodejs.org'
, hash: '#alsoahash'
, search: '?with=query&param=that&are=awesome'
, query: 'with=query&param=that&are=awesome'
, pathname: '/some/url/'
}
> parsedUrl = URL.parse(myUrl, true);
{ href: 'http://www.nodejs.org/some/url/?with=query&param=that&are=awesome#alsoahash'
, protocol: 'http:'
, slashes: true
, host: 'www.nodejs.org'
, hostname: 'www.nodejs.org'
, hash: '#alsoahash'
, search: '?with=query&param=that&are=awesome'
, query:
   { with: 'query'
   , param: 'that'
   , are: 'awesome'
   }, pathname: '/some/url/'
}
>

The first thing we do, of course, is require the URL module. Note that the names of modules are always lowercase. We’ve created a url as a string containing all the parts that will be parsed out. Parsing is really easy: we just call the parse method from the URL module on the string. It returns a data structure representing the parts of the parsed URL. The components it produces are:

href
protocol
host
auth
hostname
port
pathname
search
query
hash

The href is the full URL that was originally passed to parse. The protocol is the protocol used in the URL (e.g., http://, https://, ftp://, etc.). host is the fully qualified hostname of the URL. This could be as simple as the hostname for a local server, such as print server, or a fully qualified domain name such as www.google.com. It might also include a port number, such as 8080, or username and password credentials like un:pw@ftpserver.com. The various parts of the hostname are broken down further into auth, containing just the user credentials; port, containing just the port; and hostname, containing the hostname portion of the URL. An important thing to know about hostname is that it is still the full hostname, including the top-level domain (TLD; e.g., .com, .net, etc.) and the specific server. If the URL were http://sport.yahoo.com/nhl, hostname would not give you just the TLD (yahoo.com) or just the host (sport), but the entire hostname (sport.yahoo.com). The URL module doesn’t have the capability to split the hostname down into its components, such as domain or TLD.

The next set of components of the URL relates to everything after the host. The pathname is the entire filepath after the host. In http://sports.yahoo.com/nhl, it is /nhl. The next component is the search component, which stores the HTTP GET parameters in the URL. For example, if the URL were http://mydomain.com/?foo=bar&baz=qux, the search component would be ?foo=bar&baz=qux. Note the inclusion of the ?. The query parameter is similar to the search component. It contains one of two things, depending on how parse was called.

parse takes two arguments: the url string and an optional Boolean that determines whether the queryString should be parsed using the querystring module, discussed in the next section. If the second argument is false, query will just contain a string similar to that of search but without the leading ?. If you don’t pass anything for the second argument, it defaults to false.

The final component is the fragment portion of the URL. This is the part of the URL after the #. Commonly, this is used to refer to named anchors in HTML pages. For instance, http://abook.com/#chapter2 might refer to the second chapter on a web page hosting a whole book. The hash component in this case would contain #chapter2. Again, note the included # in the string. Some sites, such as http://twitter.com, use more complex fragments for AJAX applications, but the same rules apply. So the URL for the Twitter mentions account, http://twitter.com/#!/mentions, would have a pathname of / but a hash of #!/mentions.

querystring

The querystring module is a very simple helper module to deal with query strings. As discussed in the previous section, query strings are the parameters encoded at the end of a URL. However, when reported back as just a JavaScript string, the parameters are fiddly to deal with. The querystring module provides an easy way to create objects from the query strings. The main methods it offers are parse and decode, but some internal helper functions, —such as escape, unescape, unescapeBuffer, encode, and stringify, are also exposed. If you have a query string, you can use parse to turn it into an object, as shown in Example 4-14.

Example 4-14. Parsing a query string with the querystring module in Node REPL

> var qs = require('querystring');
> qs.parse('a=1&b=2&c=d');
{ a: '1', b: '2', c: 'd' }
>

Here, the class’s parse function turns the query string into an object in which the properties are the keys and the values correspond to the ones in the query string. You should notice a few things, though. First, the numbers are returned as strings, not numbers. Because JavaScript is loosely typed and will coerce a string into a number in a numerical operation, this works pretty well. However, it’s worth bearing in mind for those times when that coercion doesn’t work.

Additionally, it’s important to note that you must pass the query string without the leading ? that demarks it in the URL. A typical URL might look like http://www.bobsdiscount.com/?item=304&location=san+francisco. The query string starts with a ? to indicate where the filepath ends, but if you include the ? in the string you pass to parse, the first key will start with a ?, which is almost certainly not what you want.

This library is really useful in a bunch of contexts because query strings are used in situations other than URLs. When you get content from an HTTP POST that is x-form-encoded, it will also be in query string form. All the browser manufacturers have standardized around this approach. By default, forms in HTML will send data to the server in this way also.

The querystring module is also used as a helper module to the URL module. Specifically, when decoding URLs, you can ask URL to turn the query string into an object for you rather than just a string. That’s described in more detail in the previous section, but the parsing that is done uses the parse method from querystring.

Another important part of querystring is encode (Example 4-15). This function takes a query string’s key-value pair object and stringifies it. This is really useful when you’re working with HTTP requests, especially POST data. It makes it easy to work with a JavaScript object until you need to send the data over the wire and then simply encode it at that point. Any JavaScript object can be used, but ideally you should use an object that has only the data that you want in it because the encode method will add all properties of the object. However, if the property value isn’t a string, Boolean, or number, it won’t be serialized and the key will just be included with an empty value.

Example 4-15. Encoding an object into a query string

> var myObj = {'a':1, 'b':5, 'c':'cats', 'func': function(){console.log('dogs')}}
> qs.encode(myObj);
'a=1&b=5&c=cats&func='
>

I/O

I/O is one of the core pieces that makes Node different from other frameworks. This section explores the APIs that provide nonblocking I/O in Node.

Streams

Many components in Node provide continuous output or can process continuous input. To make these components act in a consistent way, the stream API provides an abstract interface for them. This API provides common methods and properties that are available in specific implementations of streams. Streams can be readable, writable, or both. All streams are EventEmitter instances, allowing them to emit events.

Readable streams

The readable stream API is a set of methods and events that provides access to chunks of data as they are sent by an underlying data source. Fundamentally, readable streams are about emitting data events. These events represent the stream of data as a stream of events. To make this manageable, streams have a number of features that allow you to configure how much data you get and how fast.

The basic stream in Example 4-16 simply reads data from a file in chunks. Every time a new chunk is made available, it is exposed to a callback in the variable called data. In this example, we simply log the data to the console. However, in real use cases, you might either stream the data somewhere else or spool it into bigger pieces before you work on it. In essence, the data event simply provides access to the data, and you have to figure out what to do with each chunk.

Example 4-16. Creating a readable file stream

var fs = require('fs');
var filehandle = fs.readFile('data.txt', function(err, data) {
  console.log(data)
});

Let’s look in more detail at one of the common patterns used in dealing with streams. The spooling pattern is used when we need an entire resource available before we deal with it. We know it’s important not to block the event loop for Node to perform well, so even though we don’t want to perform the next action on this data until we’ve received all of it, we don’t want to block the event loop. In this scenario (Example 4-17), we use a stream to get the data, but use the data only when enough is available. Typically this means when the stream ends, but it could be another event or condition.

Example 4-17. Using the spooling pattern to read a complete stream

          //abstract stream
var spool = "";
stream.on('data', function(data) {
  spool += data;
});
stream.on('end', function() {
  console.log(spool);
});

Filesystem

The filesystem module is obviously very helpful because you need it in order to access files on disk. It closely mimics the POSIX style of file I/O. It is a somewhat unique module in that all of the methods have both asynchronous and synchronous versions. However, we strongly recommend that you use the asynchronous methods, unless you are building command-line scripts with Node. Even then, it is often much better to use the async versions, even though doing so adds a little extra code, so that you can access multiple files in parallel and reduce the running time of your script.

The main issue that people face while dealing with asynchronous calls is ordering, and this is especially true with file I/O. It’s common to want to do a number of moves, renames, copies, reads, or writes at one time. However, if one of the operations depends on another, this can create issues because return order is not guaranteed. This means that the first operation in the code could happen after the second operation in the code. Patterns exist to make ordering easy. We talked about them in detail in Chapter 3, but we’ll provide a recap here.

Consider the case of reading and then deleting a file (Example 4-18). If the delete (unlink) happens before the read, it will be impossible to read the contents of the file.

Example 4-18. Reading and deleting a file asynchronously—but all wrong

var fs = require('fs');

fs.readFile('warandpeace.txt', function(e, data) {
  console.log('War and Peace: ' + data);
});

fs.unlink('warandpeace.txt');

Notice that we are using the asynchronous methods, and although we have created callbacks, we haven’t written any code that defines in which order they get called. This often becomes a problem for programmers who are not used to programming in event loops. This code looks OK on the surface and sometimes it will work at runtime, but sometimes it won’t. Instead, we need to use a pattern in which we specify the ordering we want for the calls. There are a few approaches. One common approach is to use nested callbacks. In Example 4-19, the asynchronous call to delete the file is nested within the callback to the asynchronous function that reads the file.

Example 4-19. Reading and deleting a file asynchronously using nested callbacks

var fs = require('fs');

fs.readFile('warandpeace.txt', function(e, data) {
  console.log('War and Peace: ' + data);
  fs.unlink('warandpeace.txt');
});

This approach is often very effective for discrete sets of operations. In our example with just two operations, it’s easy to read and understand, but this pattern can potentially get out of control.

Buffers

Although Node is JavaScript, it is JavaScript out of its usual environment. For instance, the browser requires JavaScript to perform many functions, but manipulating binary data is rarely one of them. Although JavaScript does support bitwise operations, it doesn’t have a native representation of binary data. This is especially troublesome when you also consider the limitations of the number type system in JavaScript, which might otherwise lend itself to binary representation. Node introduces the Buffer class to make up for this shortfall when you’re working with binary data.

Buffers are an extension to the V8 engine, which means that they have their own set of pitfalls. Buffers are actually a direct allocation of memory, which may mean a little or a lot, depending on your experience with lower-level computer languages. Unlike the data types in JavaScript, which abstract some of the ugliness of storing data, Buffer provides direct memory access, warts and all. Once a Buffer is created, it is a fixed size. If you want to add more data, you must clone the Buffer into a larger Buffer. Although some of these features may seem frustrating, they allow Buffer to perform at the speed necessary for many data operations on the server. It was a conscious design choice to trade off some programmer convenience for performance.

A quick primer on binary

We thought it was important to include this quick primer on working with binary data for those who haven’t done much of it, or as a refresher for those of us who haven’t in a long time (which was true for us when we started working with Node). Computers, as almost everyone knows, work by manipulating states of “on” and “off.” We call this a binary state because there are only two possibilities. Everything in computers is built on top of this, which means that working directly with binary can often be the fastest method on the computer. To do more complex things, we collect “bits” (each representing a single binary state) into groups of eights, often called an octet or, more commonly, a byte.^[9] This allows us to represent bigger numbers than just 0 or 1.

By creating sets of 8 bits, we are able to represent any number from 0 to 255. The rightmost bit represents 1, but then we double the value of the number represented by each bit as we move left. To find out what number it represents, we simply sum the numbers in column headers (Example 4-20).

Example 4-20. Representing 0 through 255 in a byte

128 64 32 16 8 4 2 1
--- -- -- -- - - - -
0   0  0  0  0 0 0 0 = 0

128 64 32 16 8 4 2 1
--- -- -- -- - - - -
1   1  1  1  1 1 1 1 = 255

128 64 32 16 8 4 2 1
--- -- -- -- - - - -
1   0  0  1  0 1 0 1 = 149

You’ll also see the use of hexadecimal notation, or “hex,” a lot. Because bytes need to be easily described and a string of eight 0s and 1s isn’t very convenient, hex notation has become popular. Binary notation is base 2, in that there are only two possible states per digit (0 or 1). Hex uses base 16, and each digit in hex can have a value from 0 to F, where the letters A through F (or their lowercase equivalents) stand for 10 through 15, respectively. What’s very convenient about hex is that with two digits we can represent a whole byte. The right digit represents 1s, and the left digit represents 16s. If we wanted to represent decimal 149, it is (16 x 9) + (5 x 1), or the hex value 95.

Example 4-21. Representing 0 through 255 with hex notation

Hex to Decimal:

0 1 2 3 4 5 6 7 8 9 A  B  C  D  E  F
- - - - - - - - - - -- -- -- -- -- --
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15


Counting in hex:

16 1
-- -
0  0 = 0

16 1
-- -
F  F = 255

16 1
-- -
9  5 = 149

In JavaScript, you can create a number from a hex value using the notation 0x in front of the hex value. For instance, 0x95 is decimal 149. In Node, you’ll commonly see Buffers represented by hex values in console.log() output or Node REPL. Example 4-22 shows how you could store 3-octet values (such as an RGB color value) as a Buffer.

Example 4-22. Creating a 3-byte Buffer from an array of octets

> new Buffer([255,0,149]);
<Buffer ff 00 95>
>

So how does binary relate to other kinds of data? Well, we’ve seen how binary can represent numbers. In network protocols, it’s common to specify a certain number of bytes to convey some information, using particular bits in fixed places to indicate specific things. For example, in a DNS request, the first two bytes are used as a number for a transaction ID, whereas the next byte is treated as individual bits, each used to indicate whether a specific feature of DNS is being used in this request.

The other extremely common use of binary is to represent strings. The two most common “encoding” formats for strings are ASCII and UTF (typically UTF-8). These encodings define how the bits should be converted into characters. We’re not going to go into too much of the gory detail, but essentially, encodings work by having a lookup table that maps the character to a specific number represented in bytes. To convert the encoding, the computer has to simply convert from the number to the character by looking it up in a conversion table.

ASCII characters (some of which are nonvisible “control characters,” such as Return) are always exactly 7 bits each, so they can be represented by values from 0 to 127. The eighth bit in a byte is often used to extend the character set to represent various choices of international characters (such as ȳ or ȱ).

UTF is a little more complex. Its character set has a lot more characters, including many international ones. Each character in UTF-8 is represented by at least 1 byte, but sometimes up to 4. Essentially, the first 128 values are good old ASCII, whereas the others are pushed further down in the map and represented by higher numbers. When a less common character is referenced, the first byte uses a number that tells the computer to check out the next byte to find the real address of the character starting on the second sheet of its map. If the character isn’t on the second sheet of the map, the second byte tells the computer to look at the third, and so on. This means that in UTF-8, the length of a string measured in characters isn’t necessarily the same as its length in bytes, as is always true with ASCII.

Binary and strings

It is important to remember is that once you copy things to a Buffer, they will be stored as their binary representations. You can always convert the binary representation in the buffer back into other things, such as strings, later. So a Buffer is defined only by its size, not by the encoding or any other indication of its meaning.

Given that Buffer is opaque, how big does it need to be in order to store a particular string of input? As we’ve said, a UTF character can occupy up to 4 bytes, so to be safe, you should define a Buffer to be four times the size of the largest input you can accept, measured in UTF characters. There may be ways you can reduce this burden; for instance, if you limit your input to European languages, you’ll know there will be at most 2 bytes per character.

Using Buffers

Buffers can be created using three possible parameters: the length of the Buffer in bytes, an array of bytes to copy into the Buffer, or a string to copy into the Buffer. The first and last methods are by far the most common. There aren’t too many instances where you are likely to have a JavaScript array of bytes.^[10]

Creating a Buffer of a particular size is a very common scenario and easy to deal with. Simply put, you specify the number of bytes as your argument when creating the Buffer (Example 4-23).

Example 4-23. Creating a Buffer using byte length

> new Buffer(10);
<Buffer e1 43 17 05 01 00 00 00 41 90>
>

As you can see from the previous example, when we create a Buffer we get a matching number of bytes. However, because the Buffer is just getting an allocation of memory directly, it is uninitialized and the contents are left over from whatever happened to occupy them before. This is unlike all the native JavaScript types, which initialize all memory so that when you create a new primitive or object, it doesn’t assign whatever was already in the memory space to the primitive or object you just created. Here is a good way to think about it. If you go to a busy cafe and you want a table, the fastest way to get one is to sit down as soon as some other people vacate one. However, although it’s fast, you are left with all their dirty dishes and the detritus from their meals. You might prefer to wait for one of the staff to clear the table and wipe it down before you sit. This is a lot like Buffers versus native types. Buffers do very little to make things easy for you, but they do give you direct and fast access to memory. If you want to have a nicely zeroed set of bits, you’ll need to do it yourself (or find a helper library).

Creating a Buffer using byte length is most common when you are working with things such as network transport protocols that have very specifically defined structures. When you know exactly how big the data is going to be (or you know exactly how big it could be) and you want to allocate and reuse a Buffer for performance reasons, this is the way to go.

Probably the most common way to use a Buffer is to create it with a string of either ASCII or UTF-8 characters. Although a Buffer can hold any data, it is particularly useful for I/O with character data because the constraints we’ve already seen on Buffer can make their operations much faster than operations on regular strings. So when you are building really highly scalable apps, it’s often worth using Buffers to hold strings. This is especially true if you are just shunting the strings around the application without modifying them. Therefore, even though strings exist as primitives in JavaScript, it’s still very common to keep strings in Buffers in Node.

When we create a Buffer with a string, as shown in Example 4-24, it defaults to UTF-8. That is, if you don’t specify an encoding, it will be considered a UTF-8 string. That is not to say that Buffer pads the string to fit any Unicode character (blindly allocating 4 bytes per character), but rather that it will not truncate characters. In this example, we can see that when taking a string with just lowercase alpha characters, the Buffer uses the same byte structure, whatever the encoding, because they all fall in the same range. However, when we have an “é,” it’s encoded as 2 bytes in the default UTF-8 case or when we specify UTF-8 explicitly. If we specify ASCII, the character is truncated to a single byte.

Example 4-24. Creating Buffers using strings

> new Buffer('foobarbaz');
<Buffer 66 6f 6f 62 61 72 62 61 7a>
> new Buffer('foobarbaz', 'ascii');
<Buffer 66 6f 6f 62 61 72 62 61 7a>
> new Buffer('foobarbaz', 'utf8');
<Buffer 66 6f 6f 62 61 72 62 61 7a>
> new Buffer('é');
<Buffer c3 a9>
> new Buffer('é', 'utf8');
<Buffer c3 a9>
> new Buffer('é', 'ascii');
<Buffer e9>
>

Working with strings

Node offers a number of operations to simplify working with strings and Buffers. First, you don’t need to compute the length of a string before creating a Buffer to hold it; just assign the string as the argument when creating the Buffer. Alternatively, you can use the Buffer.byteLength() method. This method takes a string and an encoding and returns the string’s length in bytes, rather than in characters as String.length does.

You can also write a string to an existing Buffer. The Buffer.write() method writes a string to a specific index of a Buffer. If there is room in the Buffer starting from the specified offset, the entire string will be written. Otherwise, characters are truncated from the end of the string to fit the Buffer. In either case, Buffer.write() will return the number of bytes that were written. In the case of UTF-8 strings, if a whole character can’t be written to the Buffer, none of the bytes for that character will be written. In Example 4-25, because the Buffer is too small for even one non-ASCII character, it ends up empty.

Example 4-25. Buffer.write( ) and partial characters

> var b = new Buffer(1);
> b
<Buffer 00>
> b.write('a');
1
> b
<Buffer 61>
> b.write('é');
0
> b
<Buffer 61>
>

In a single-byte Buffer, it’s possible to write an “a” character, and doing so returns 1, indicating that 1 byte was written. However, trying to write a “é” character fails because it requires 2 bytes, and the method returns 0 because nothing was written.

There is a little more complexity to Buffer.write(), though. If possible, when writing UTF-8, Buffer.write() will terminate the character string with a NUL character.^[11] This is much more significant when writing into the middle of a larger Buffer.

In Example 4-26, after creating a Buffer that is 5 bytes long (which could have been done directly using the string), we write the character f to the entire Buffer. f is the character code 0x66 (102 in decimal). This makes it easy to see what happens when we write the characters “ab” to the Buffer starting with an offset of 1. The zeroth character is left as f. At positions 1 and 2, the characters themselves are written, 61 followed by 62. Then Buffer.write() inserts a terminator, in this case a null character of 0x00.

Example 4-26. Writing a string into a Buffer including a terminator

> var b = new Buffer(5);
> b.write('fffff');
5
> b
<Buffer 66 66 66 66 66>
> b.write('ab', 1);
2
> b
<Buffer 66 61 62 00 66>
>

console.log

Borrowed from the Firebug debugger in Firefox, the simple console.log command allows you to easily output to stdout without using any modules (Example 4-27). It also offers some pretty-printing functionality to help enumerate objects.

Example 4-27. Outputting with console.log

> foo = {};
{}
> foo.bar = function() {1+1};
[Function]
> console.log(foo);
{ bar: [Function] }
>

^[7] When we talk about a pseudoclass, we are referring to the definition found in Douglas Crockford’s JavaScript: The Good Parts (O’Reilly). From now on, we will use “class” to refer to a “pseudoclass.”

^[8]This works in JavaScript because it supports first-class functions.

^[9]There is no “standard” size of byte, but the de facto size that virtually everyone uses nowadays is 8 bits. Therefore, octets and bytes are equivalent, and we’ll be using the more common term byte to mean specifically an octet.

^[10]It’s very memory-inefficient, for one thing. If you store each byte as a number, for instance, you are using a 64-bit memory space to represent 8 bits.

^[11]This generally just means a binary 0.