This chapter covers a number of APIs that you’ll almost certainly use regularly but aren’t used as much as those discussed in Chapter 4.
Programmers, like end users, normally want to
refer to things by their domain names instead of their IP addresses. The
DNS module provides this lookup facility to you, but it is also used under
the hood whenever you are able to use a domain name—for example, in
HTTP clients.
The dns
module consists of two main methods and a number of convenience
methods. The two main methods are resolve(), which turns a domain name into a DNS
record, and reverse(), which turns an IP address into a domain. All of the other methods
in the dns module are more specialized
forms of these methods.
dns.resolve() takes three arguments:
This can include subdomains, such as
www.yahoo.com. The www is technically a hostname, but the
system will resolve it for you.
This requires a little more
understanding of DNS. Most people are familiar with the “address” or
A record type. This type of record maps an IPv4 domain to a domain
name (as defined in the previous item). The “canonical name,” or
CNAME, records allow you to create an alias of an A
record or another CNAME. For example, www.example.com might be a CNAME of the A
record at example.com. MX records
point to the mail server for a domain for the use of
SMTP. When you email person@domain.com, the MX record for
domain.com tells your email
server where to send their mail. Text records, or TXT, are notes attached to a domain. They have been
used for all kinds of functions. The final type supported by this
library is service, or SRV, records, which provide information on the services
available at a particular domain.
This returns the response from the DNS server. The prototype will be shown in Example 5-2.
As shown in Example 5-1, calling dns.resolve() is easy, although the callback may
be slightly different from other callbacks you’ve used so far.
We called dns.resolve() with the domain and a record type
of A, along with a trivial callback
that prints results. The first argument of the callback is an error
object. If an error occurs, the object will be non-null, and we can
consult it to see what went wrong. The second argument is a list of the
records returned by the query.
There are convenience methods for all the
types of records listed earlier. For example, rather than calling resolve('example.com', 'MX',
callback), you can
call resolveMx('example.com',
callback) instead (see Example 5-2). The
API also provides resolve4() and
resolve6() methods, which resolve
IPv4 and IPv6 address records, respectively.
Because resolve() usually returns a list containing many
IP addresses, there is also a convenience method called dns.lookup() that
returns just one IP address from an A record query (see Example 5-3). The method takes a domain, an IP family
(4 or 6), and a callback. However,
unlike .dns.resolve(), it always
returns a single address. If you don’t pass an address, it defaults to the
network interface’s current setting.
Cryptography is used in lots of places for a variety of tasks. Node uses the OpenSSL library as the basis of its cryptography. This is because OpenSSL is already a well-tested, hardened implementation of cryptographic algorithms. But you have to compile Node with OpenSSL support in order to use the methods in this section.
The cryptograph module enables a number of different tasks. First, it powers the SSL/TLS parts of Node. Second, it contains hashing algorithms such as MD5 or SHA-1 that you might want to use in your application. Third, it allows you to use HMAC.[12] There are some encryption methods to cipher the data with to ensure it is encrypted. Finally, HMAC contains other public key cryptographic functions to sign data and verify signatures.
Each of the functions that cryptography does is contained within a class (or classes), which we’ll look at in the following sections.
Hashes are used for a few important functions, such as obfuscating data in a way
that allows it to be validated or providing a small checksum for a much
larger piece of data. To use hashes in Node, you should create a
Hash object using the factory method
crypto.createHash(). This returns a new Hash
instance using a specified hashing algorithm. Most popular algorithms
are available. The exact ones depend on your version of OpenSSL, but
common ones are:
These algorithms all have different advantages and disadvantages. MD5, for example, is used in many applications but has a number of known flaws, including collision issues.[13] Depending on your application, you can pick either a widely deployed algorithm such as MD5 or (preferably) the newer SHA1, or a less universal but more hardened algorithm such as RIPEMD, SHA256, or SHA512.
Once you have data in the hash, you can use
it to create a digest by calling with
the hash data (Example 5-4). You can keep
updating a hash.update()Hash with more data until
you want to output it; the data you add to the hash is simply
concatenated to the data passed in previous calls. To output the hash,
call the
method. This will output the digest of the data that was input into the
hash with hash.digest(). No
more data can be added after you call hash.update().hash.digest()
Notice that the output of the digest is a
bit weird. That’s because it’s the binary representation. More commonly,
a digest is printed in hex. We can do that by adding 'hex' as a parameter to , as in
Example 5-5.hash.digest
Example 5-5. The lifespan of hashes and getting hex output
> var md5 = crypto.createHash('md5');
> md5.update('foo');
{}
> md5.digest();
'¬½\u0018ÛLÂø\\íïeOÌĤØ'
> md5.digest('hex');
Error: Not initialized
at [object Context]:1:5
at Interface.<anonymous> (repl.js:147:22)
at Interface.emit (events.js:42:17)
at Interface._onLine (readline.js:132:10)
at Interface._line (readline.js:387:8)
at Interface._ttyWrite (readline.js:564:14)
at ReadStream.<anonymous> (readline.js:52:12)
at ReadStream.emit (events.js:59:20)
at ReadStream._emitKey (tty_posix.js:280:10)
at ReadStream.onData (tty_posix.js:43:12)
> var md5 = crypto.createHash('md5');
> md5.update('foo');
{}
> md5.digest('hex');
'acbd18db4cc2f85cedef654fccc4a4d8'
>When we call
again, we get an error. This is because once hash.digest() is
called, the hash.digest()Hash object is finalized
and cannot be reused. We need to create a new instance of Hash and use that instead. This time we get
the hex output that is often more useful. The options for
output are hash.digest()binary (default), hex, and base64.
Because data in calls
is concatenated, the code samples in Example 5-6
are identical.hash.update()
It is also important to know that although
looks
a lot like a stream, it isn’t really. You can easily hook a stream to
hash.update(), but
you can’t use hash.update()stream.pipe().
HMAC combines the hashing algorithms with a cryptographic key in order
to stop a number of attacks on the integrity of the signature. This
means that HMAC uses both a hashing algorithm (such as the ones
discussed in the previous section) and an encryption key. The HMAC API
in Node is virtually identical to the Hash API. The only difference is that the
creation of an hmac object requires a
key as well as a hash algorithm.
crypto.createHmac() returns an instance of Hmac,
which offers update() and
digest() methods that work
identically to the Hash methods we
saw in the previous section.
The key required to create an Hmac object is a PEM-encoded key, passed as a string. As shown in Example 5-7, it is easy to create a key on the command
line using OpenSSL.
This example creates an RSA in PEM format
and puts it into a file, in this case called key.pem. We also could have called the same
functionality directly from Node using the process module (discussed later in this chapter) if we omitted the
-out key.pem option; with this
approach, we would get the results on an stdout stream. Instead we are
going to import the key from the file and use it to create an Hmac object and a
digest (Example 5-8).
This example uses fs.readFileSync()
because a lot of the time, loading keys will be a server setup task. As
such, it’s fine to load the keys synchronously (which might slow down
server startup time) because you aren’t serving clients yet, so blocking
the event loop is OK. In general, other than the use of the encryption
key, using an Hmac example is exactly
like using a Hash.
The public key cryptography functions are split into four classes: Cipher, Decipher, Sign, and Verify. Like all the other classes in crypto, they have factory methods. Cipher encrypts data, Decipher
decrypts data, Sign creates
a cryptographic signature for data, and Verify validates cryptographic signatures.
For the HMAC operations, we used a private key. For the operations in this section, we are going to use both the public and private keys. Public key cryptography has matched sets of keys. One, the private key, is kept by the owner and is used to decrypt and sign data. The other, the public key, is made available to other parties. The public key can be used to encrypt data that only the private key owner can read, or to verify the signature of data signed with the private key.
Let’s extract the public key of the private key we generated to do the HMAC digests (Example 5-9). Node expects public keys in certificate format, which requires you to input additional “information.” But you can leave all the information blank if you like.
Example 5-9. Extracting a public key certificate from a private key
Enki:~ $openssl req -key key.pem -new -x509 -out cert.pemYou are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]: State or Province Name (full name) [Some-State]: Locality Name (eg, city) []: Organization Name (eg, company) [Internet Widgets Pty Ltd]: Organizational Unit Name (eg, section) []: Common Name (eg, YOUR name) []: Email Address []: Enki:~ $ls cert.pemcert.pem Enki:~ $
We simply ask OpenSSL to read in the private key, and then output the public key
into a new file called cert.pem in
X509 certificate format. All of the operations in crypto expect keys in PEM format.
The Cipher class provides a wrapper for encrypting data using a private
key. The factory method to create a cipher takes an algorithm and the
private key. The algorithms supported come from those compiled into
your OpenSSL implementation:
Many modern cryptographic algorithms use
block ciphers. This means that the output is always in
standard-size “blocks.” The block sizes vary between algorithms:
blowfish, for example, uses
40-byte blocks. This is significant when you are using the Cipher API because the API will always
output fixed-size blocks. This helps prevent information from being
leaked to an attacker about the data being encrypted or the specific
key being used to do the encryption.
Like Hash and Hmac, the Cipher API also uses the update() method to input data. However, update()
works differently when used in a cipher. First, cipher.update() returns a block of encrypted
data if it can. This is where block size becomes important. If the
amount of data in the cipher plus the amount of data passed to
cipher.update() is enough to create
one or more blocks, the encrypted data will be returned. If there
isn’t enough to form a block, the input will be stored in the cipher.
Cipher also has a new method,
cipher.final(), which replaces the digest()
method. When cipher.final() is
called, any remaining data in the cipher will be returned encrypted,
but with enough padding to make sure the block size is reached (see
Example 5-10).
Example 5-10. Ciphers and block size
> var crypto = require('crypto');
> var fs = require('fs');
>
> var pem = fs.readFileSync('key.pem');
> var key = pem.toString('ascii');
>
> var cipher = crypto.createCipher('blowfish', key);
>
> cipher.update(new Buffer(4), 'binary', 'hex');
''
> cipher.update(new Buffer(4), 'binary', 'hex');
'ff57e5f742689c85'
> cipher.update(new Buffer(4), 'binary', 'hex');
''
> cipher.final('hex')
'96576b47fe130547'
>To make the example easier to read, we
specified the input and output formats. The input and output formats
are both optional and will be assumed to be binary unless specified.
For this example, we specified a binary input format because we’re
passing a new Buffer (containing whatever random
junk was in memory), along with hex output to produce something easier
to read. You can see that the first time we call cipher.update(), with 4 bytes of data, we
get back an empty string. The second time, because we have enough data
to generate a block, we get the encrypted data back as hex. When we
call cipher.final(), there isn’t
enough data to create a full block, so the output is padded and a full
(and final) block is returned. If we sent more data than would fit in
a single block, cipher.final()
would output as many blocks as it could before padding. Because
cipher.final() is just for
outputting existing data, it doesn’t accept an input format.
The Decipher class is almost the exact inverse of the Cipher class. You can pass encrypted data to
a Decipher object using decipher.update(), and it will stream the data into blocks until it can
output the unencrypted data. You might think that since cipher.update() and cipher.final() always give fixed-length
blocks, you would have to give perfect blocks to Decipher, but luckily it will buffer the
data. Thus, you can pass it data you got off some other I/O transport,
such as the disk or network, even though this might give you block
sizes different from those used by the encryption algorithm.
Let’s take a look at Example 5-11, which demonstrates encrypting data and then decrypting it.
Example 5-11. Encrypting and decrypting text
> var crypto = require('crypto');
> var fs = require('fs');
>
> var pem = fs.readFileSync('key.pem');
> var key = pem.toString('ascii');
>
> var plaintext = new Buffer('abcdefghijklmnopqrstuv');
> var encrypted = "";
> var cipher = crypto.createCipher('blowfish', key);
> ..
> encrypted += cipher.update(plaintext, 'binary', 'hex');
> encrypted += cipher.final('hex');
>
> var decrypted = "";
> var decipher = crypto.createDecipher('blowfish', key);
> decrypted += decipher.update(encrypted, 'hex', 'binary');
> decrypted += decipher.final('binary');
>
> var output = new Buffer(decrypted);
>
> output
<Buffer 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76>
> plaintext
<Buffer 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76>
>It is important to make sure both the
input and output formats match up for both the plain text and the
encrypted data. It’s also worth noting that in order to get a
Buffer, you’ll have to make one from the strings
returned by Cipher and Decipher.
Signatures verify that some data has been authenticated by the signer
using the private key. However, unlike with HMAC, the public key can
be used to authenticate the signature. The API for Sign is nearly identical to that for HMAC
(see Example 5-12). crypto.createSign() is used to make a sign
object. createSign() takes only the signing
algorithm. sign.update() allows
you to add data to the sign object.
When you want to create the signature, call sign.sign() with a private key to sign the data.
The Verify API uses a method like the ones we’ve just discussed (see Example 5-13), verify.update(), to add data—and when you have added all the data to be
verified against the signature, verify.verify() validates the signature. It takes the cert (the public key), the signature, and
the format of the signature.
Example 5-13. Verifying signatures
> var crypto = require('crypto');
> var fs = require('fs');
>
> var privatePem = fs.readFileSync('key.pem');
> var publicPem = fs.readFileSync('cert.pem');
> var key = privatePem.toString();
> var pubkey = publicPem.toString();
>
> var data = "abcdef"
>
> var sign = crypto.createSign('RSA-SHA256');
> sign.update(data);
{}
> var sig = sign.sign(key, 'hex');
>
> var verify = crypto.createVerify('RSA-SHA256');
> verify.update(data);
{}
> verify.verify(pubkey, sig, 'hex');
1Although Node abstracts a lot of things from the operating system, you are still running in an operating system and may want to interact more directly with it. Node allows you to interact with system processes that already exist, as well as create new child processes to do work of various kinds. Although Node itself is generally a “fat” thread with a single event loop, you are free to start other processes (threads) to do work outside of the event loop.
The process module enables you to get information about and change the
settings of the current Node process. Unlike most modules, the process module is global and is always
available as the variable process.
process
is an instance of EventEmitter,
so it provides events based on systems calls to the Node
process. The exit event provides a final hook before the Node process exits (see
Example 5-14). Importantly, the event loop will
not run after the exit event, so
only code without callbacks will be executed.
Because the loop isn’t going to run again,
the setTimeout() code will never be
evaluated.
An extremely useful event provided by process
is uncaughtException (Example 5-15). After you’ve spent any time with Node,
you’ll find that exceptions that hit the main event loop will kill
your Node process. In many use cases, especially servers that are
expected to never be down, this is unacceptable. The uncaughtException event provides an
extremely brute-force way of catching these exceptions. It’s really a
last line of defense, but it’s extremely useful for that
purpose.
Let’s break
down what’s happening. First, we create an event listener for uncaughtException. This is not a smart handler;
it simply outputs the exception to stdout. If this Node script were
running as a server, stdout could easily be used to save the log into
a file and capture these errors. However, because it captures the
event for a nonexistent function, Node will not exit, but the standard
flow is still disrupted. We know that all the JavaScript runs once,
and then any callbacks will be run each time their event listener
emits an event. In this scenario, because nonexistentFunc() will throw an exception,
no code following it will be called. However, any code that has
already been run will continue to run. This means that setTimeout() will still call. This is
significant when you’re writing servers. Let’s consider some more code
in this area, shown in Example 5-16.
This code creates a simple HTTP server and then listens for any uncaught exceptions at the process level. In our HTTP server, the callback deliberately calls a bad function after we’ve sent the HTTP response. Example 5-17 shows the console output for this script.
Example 5-17. Output of Example 5-16
Enki:~ $ node ex-test.js
{ stack: [Getter/Setter],
arguments: [ 'badLoggingCall' ],
type: 'not_defined',
message: [Getter/Setter] }
{ stack: [Getter/Setter],
arguments: [ 'badLoggingCall' ],
type: 'not_defined',
message: [Getter/Setter] }
{ stack: [Getter/Setter],
arguments: [ 'badLoggingCall' ],
type: 'not_defined',
message: [Getter/Setter] }
{ stack: [Getter/Setter],
arguments: [ 'badLoggingCall' ],
type: 'not_defined',
message: [Getter/Setter] }When we start the example script, the
server is available, and we have made a number of HTTP requests to it.
Notice that the server doesn’t shut down at any point. Instead, the
errors are logged using the function attached to the uncaughtException
event. However, we are still serving complete HTTP requests. Why? Node
deliberately prevented the callback in process from proceeding and calling console.log(). The error affected only the
process we spawned and the server kept running, so any other code was
unaffected by the exception encapsulated in one specific code
path.
It’s important to understand the way that listeners are implemented in Node. Let’s take a look at Example 5-18.
Example 5-18. The abbreviated listener code for EventEmitter
EventEmitter.prototype.emit = function(type) {
...
var handler = this._events[type];
...
} else if (isArray(handler)) {
var args = Array.prototype.slice.call(arguments, 1);
var listeners = handler.slice();
for (var i = 0, l = listeners.length; i < l; i++) {
listeners[i].apply(this, args);
}
return true;
...
};After an event is emitted, one of the
checks in the runtime handler is to see whether there is an array of
listeners. If there is more than one listener, the runtime calls the
listeners by looping through the array in order. This means that the
first attached listener will be called first with apply(), then the second, and so on. What’s
important to note here is that all listeners on
the same event are part of the same code path. So an uncaught
exception in one callback will stop execution for all other callbacks
on the same event. However, an uncaught exception in one instance of
an event won’t affect other events.
We also get access to a number of system
events through process. When the
process gets a signal, it is exposed to Node via events emitted by
process. An operating system can
generate a lot of POSIX system events, which can be found in the
sigaction(2) manpage. Really common ones
include SIGINT, the interrupt signal. Typically, a SIGINT is what happens when you press
Ctrl-C in the terminal on a running process. Unless you handle the
signal events via process, Node
will just perform the default action; in the case of a SIGINT, the
default is to immediately kill the process. You can change default
behavior (except for a couple of signals that can never get caught)
through the process.on() method
(Example 5-19).
To make sure Node doesn’t exit on its own, we read from stdin (described in Operating system input/output) so the Node process continues to run. If you Ctrl-C the program while it’s running, the operating system (OS) will send a SIGINT to Node, which will be caught by the SIGINT event handler. Here, instead of exiting, we log to the console instead.
Process contains a lot
of meta-information about the Node process. This can be very helpful
when you need to manage your Node environment from within the process.
There are a number of properties that contain immutable (read-only)
information about Node, such as:
process.versionContains the version number of the instance of Node you are running.
process.installPrefixContains the install path (/usr/local, ~/local, etc.) used during installation.
process.platformLists the platform on which
Node is currently running. The output will specify
the kernel (linux2, darwin, etc.) rather than “Redhat
ES3,” “Windows 7,” “OSX 10.7,” etc.
process.uptime()Contains the number of seconds the process has been running.
There are also a number of things that you
can get and set about the Node process. When the process runs, it does
so with a particular user and group. You can get these and set them with process.getgid(), process.setgid(), process.getuid(), and process.setuid(). These can be very useful for making sure
that Node is running in a secure way. It’s worth noting that the set
methods take either the numerical ID of the group or username or the
group/username itself. However, if you pass the group or username, the
methods do a blocking lookup to turn the group/username into an ID,
which takes a little time.
The process
ID, or PID, of the running Node instance is also available as
the process.pid property. You can set the title that
Node displays to the system using the process.title property. Whatever is set
in this property will be displayed in the ps command. This
can be extremely useful when you are running multiple Node processes
in a production environment. Instead of having a lot of processes
called node, or possibly node app.js, you can set names intelligently
for easy reference. When one process is hogging CPU or RAM, it’s great
to have a quick idea of which one is doing so.
Other available information includes process.execPath, which shows the execution
path of the current Node binary (e.g., /usr/local/bin/node). The current working
directory (to which all files opened will be relative) is accessible
with process.cwd(). The working directory is the directory you were in when Node
was started. You can change it using process.chdir() (this will throw an exception if the directory is unreadable
or doesn’t exist). You can also get the memory usage of the current
Node process using process.memoryUsage(). This returns an object specifying the size of the memory
usage in a couple of ways: rss
shows how much RAM is being used, and vsize shows the total memory used, including
both RAM and swap. You’ll also get some V8 stats: heapTotal and heapUsed show how much memory V8 has
allocated and how much it is actively using.
There are a number of places where you can
interact with the OS (besides making changes to the Node process in
which the program is running) from process. One of the main ones is having
access to the standard OS I/O streams. stdin is the default input stream to the process, stdout is the
process’s output stream, and stderr is its error stream. These are exposed with process.stdin,
process.stdout, and process.stderr, respectively. process.stdin is a readable stream, whereas
process.stdout and process.stderr are writable streams.
stdin is a really useful device for interprocess communication. It’s
used to facilitate things such as piping in the shell. When we type
cat file.txt | node program.js,
it will be the stdin stream that receives the data from the cat command.
Because process is always available, the process.stdin stream is always initialized
in any Node process. But it starts out in a paused state, where Node
can write to it but you can’t read from it. Before attempting to
read from stdin, call its resume() method (see Example 5-20). Until then, Node will just fill the
read buffer for the stream and then stop until you are ready to deal
with it. This approach avoids data loss.
We ask process.stdin to resume(), set the encoding to UTF-8, and
then set a listener to push any data sent to process.stdout. When the process.stdin sends the end event, we pass that on to the process.stdout stream. We could also
easily do this with the stream pipe()
method, as in Example 5-21, because stdin and
stdout are both real streams.
stderr is used to output exceptions and problems with program execution.
In POSIX systems, because it is a separate stream, output logs and
error logs can be easily redirected to different destinations. This
can be very desirable, but in Node it comes with a couple of
caveats. When you write to stderr, Node guarantees that the write
will happen. However, unlike a regular stream, this is done as a
blocking call. Typically, calls to Steam.write()
return a Boolean value indicating whether Node was able to write to
the kernel buffer. With process.stderr this will always be true,
but it might take a while to return, unlike the regular write(). Typically, it will be very fast,
but the kernel buffer may sometimes be full and hold up your
program. This means that it is generally inadvisable to write a lot
to stderr in a production system, because it may block real
work.
One final thing to note is that process.stderr is always a UTF-8 stream.
Any data you write to process.stderr will be interpreted as
UTF-8 without you having to set an encoding. Moreover, you are not
able to change the encoding here.
Another place where Node programmers
often touch the operating system is to retrieve the arguments passed
when their program is started. argv is an array containing the
command-line arguments, starting with the node command itself (see Examples 5-22 and 5-23).
Example 5-23. Running Example 5-22
There are few things to notice here.
First, the process.argv array
is simply a split of the command line based on
whitespace. If there are many characters of whitespace between two
arguments, they count as only a single split. The check for
whitespace is written as \s+ in a
regular expression (regex). This doesn’t count for whitespace in
quotes, however. Quotes can be used to keep tokens together. Also,
notice how the first file argument is expanded. This means you can
pass a relative file argument on the command line, and it will appear as
its absolute pathname in argv.
This is also true for special characters, such as using ~ to refer to the home directory. Only the
first argument is expanded this way.
argv
is extremely helpful for writing command-line scripts, but it’s
pretty raw. There are a number of community projects that extend its
support to help you easily write command-line applications,
including support for automatically enabling features, writing
inline help systems, and other more advanced features.
If you’ve done work with JavaScript in browsers, you’ll be familiar
with setTimeout(). In Node, we have
a much more direct way to access the event loop and defer work that is
extremely useful. process.nextTick() creates a callback to be executed on the next “tick,” or
iteration of the event loop. While it is implemented as a queue, it
will supersede other events. Let’s explore that a little bit in Example 5-24.
This example creates an HTTP server. The
request event listener on the server creates a
callback using process.nextTick().
No matter how many requests we make to the HTTP server, the “tick”
will always occur on the next pass of the event loop. Unlike other
callbacks, nextTick() callbacks are
not a single event and thus are not subject to the usual callback
exception brittleness, as shown in Examples 5-25 and 5-26.
Example 5-26. Results of Example 5-25
Despite the deliberate error, unlike other
event callbacks on a single event, each of the ticks is isolated.
Let’s walk through the code. First, we set an exception handler to
catch any exceptions. Next, we set a number of callbacks on process.nextTick(). Each of these callbacks
outputs to the console; however, the second has a deliberate error.
Finally, we log a message to the console. When Node runs the program,
it evaluates all the code, which includes outputting 'End of
1st loop'. Then it calls the callbacks on nextTick() in order. First
'tick' is outputted, and then we throw an error.
This is because we hit our deliberate mistake on the next tick. The
error causes process to emit() an uncaughtException event, which runs our function to output the error to the
console. Because we threw an error, 'tock' was not
outputted to the console. However, 'tick tock'
still is. This is because every time nextTick() is called, each callback is
created in isolation. You could consider the execution of events to be
emit(), which is called inline in
the current pass of event loop; nextTick(), which is called at the beginning
of the event loop in preference to other events; and finally, other
events in order at the beginning of the event loop.
The child_process module allows you to create child processes of your main Node
process. Because Node has only one event loop in a single process,
sometimes it is helpful to create child processes. For example, you
might do this to make use of more cores of your CPU, because a single
Node process can use only one of the cores. Or, you could use child_process to launch other programs and let
Node interact with them. This is extremely helpful when you’re writing
command-line scripts.
There are two main methods in child_process. spawn() creates a child process with its own stdin, stdout, and stderr
file descriptors. exec() creates
a child process and returns the result as a callback when
the process is complete. This is an extremely versatile way to create
child processes, a way that is still nonblocking but doesn’t require you
to write extra code in order to steam forward.
All child processes have some common
properties. They each contain properties for stdin, stdout, and stderr,
which we discussed in Operating system input/output. There is
also a pid property that contains the OS process ID of the child. Children
emit the exit event when they exit.
Other data events are available via the stream
methods of child_process.stdin,
child_process.stdout, and child_process.stderr.
Let’s start with exec() as the most straightforward use case.
Using exec(), you can create a
process that will run some program (possibly another Node program) and
then return the results for you in a callback (Example 5-27).
When you call exec(), you can pass a shell command for the
new process to run. Note that the entire command is a string. If you
need to pass arguments to the shell command, they should be
constructed into the string. In the example, we passed ls the -l
argument to get the long form of the output. You can also include
complicated shell features, such as | to pipe commands. Node will return the
results of the final command in the pipeline.
The callback function receives three
arguments: an error object, the result of stdout, and the result of
stderr. Notice that just calling ls
will run it in the current working directory of Node, which you can
retrieve by running process.cwd().
It’s important to understand the
difference between the first and third arguments. The error object
returned will be null unless an
error status code is returned from the child process or there was
another exception. When the child process exits, it passes a status up
to the parent process. In Unix, for example, this is 0 for success and
an 8-bit number greater than 0 for an error. The error object is also
used when the command called doesn’t meet the constraints that Node
places on it. When an error code is returned from the child process,
the error object will contain the error code and stderr. However, when
a process is successful, there may still be data on stderr.
exec()
takes an optional second argument with an options
object. By default, this object contains the properties shown in Example 5-28.
encodingtimeoutThe number of milliseconds the process can run before Node kills it.
killSignalThe signal to use to terminate the
process in case of a time or Buffer size
overrun.
maxBufferThe maximum number of kilobytes that stdout or stderr each may grow to.
setsidWhether to create a new session inside Node for the process.
cwdThe initial working directory for
the process (where null uses Node’s current
working directory).
envThe process’s environment variables. All environment variables are also inherited from the parent.
Let’s set some of the options to put
constraints on a process. First, let’s try restricting the
Buffer size of the response, as demonstrated in
Example 5-29.
In this example, you can see that when we
set a tiny maxBuffer (just 1
kilobyte), running ls quickly
exhausted the available space and threw an error. It’s important to
check for errors so that you can deal with them in a sensible way. You
don’t want to cause an actual exception by trying to access resources
that are unavailable because you’ve restricted the child_process. If the child_process returns with an error, its
stdin and stdout properties will be unavailable and attempts to access them will
throw an exception.
It’s also possible to stop a Child after a set amount of time, as shown
in Example 5-30.
This example defines a deliberately
long-running process (counting from 1 to 100,000 in a shell script),
but we also set a short timeout.
Notice that we also specified a killSignal. By default, the kill signal
is SIGTERM, but we used SIGKILL to show the feature.[14] When we get the error back, notice there is a killed property that tells us that Node
killed the process and that it didn’t exit voluntarily. This is also
true for the previous example. Because it didn’t exit on its own,
there isn’t a code property or some
of the other properties of a system error.
spawn()
is very similar to exec().
However, it is a more general-purpose method that
requires you to deal with streams and their callbacks yourself. This
makes it a lot more powerful and flexible, but it also means that more
code is required to do the kind of one-shot system calls we
accomplished with exec(). This
means that spawn() is most often
used in server contexts to create subcomponents of a server and is the
most common way people make Node work with multiple cores on a single
machine.
Although it performs the same function as
exec(), the API for spawn() is slightly different (see Examples
5-31
and 5-32). The first argument is still the
command to start the process with, but unlike exec(), it is not a command string; it’s
just the executable. The process’s arguments are passed in an array as
the (optional) second argument to spawn(). It’s like an inverse of process.argv: instead of the command being split() across spaces, you provide an array
to be join()ed with spaces.
Finally, spawn() also takes an options array as the
final argument. Some of these options are the same as exec(), but we’ll cover that in more detail
shortly.
In this example, we’re using the Unix
program cat, which simply echoes
back whatever input it gets. You can see that, unlike exec(), we don’t issue a callback to
spawn() directly. That’s because we
are expecting to use the Streams
provided by the Child class to get
and send data. We named the variable with the instance of Child “cat,” and so we can access cat.stdout to set events on the stdout
stream of the child process. We set a listener on cat.stdout to watch for any data events, and
we set a listener on the child
itself in order to watch for the exit event. We can send our new child data using stdin by accessing its
child.stdin stream. This is just a regular writable stream. However,
as a behavior of the cat program,
when we close stdin, the process exits. This might not be true for all
processes, but it is true for cat,
which exists only to echo back data.
The options that can be passed to spawn() aren’t exactly the same as exec(). This is because you are expected to
manage more things by hand with spawn(). The env, setsid, and cwd properties are all options for spawn(), as are uid and gid, which set the user ID and the group ID,
respectively. Like process, setting
the uid or the gid to a username or a group name will block
briefly while the user or group is looked up. There is one more option
for spawn() that doesn’t exist for
exec(): you can set custom file
descriptors that will be given to the new child process. Let’s take
some time to cover this topic because it’s a little complex.
A file descriptor in Unix is a way of keeping track of which programs are doing what with which files. Because Unix lets many programs run at the same time, there needs to be a way to make sure that when they interact with the filesystem they don’t accidentally overwrite someone else’s changes. The file descriptor table keeps track of all the files that a process wants to access. The kernel might lock a particular file to stop two programs from writing to the file at the same time, as well as other management functions. A process will look at its file descriptor table to find the file descriptor representing a particular file and pass that to the kernel to access the file. The file descriptor is simply an integer.
The important thing is that the name
“file descriptor” is a little deceptive because it doesn’t represent
only pure files; network and other sockets are also allocated file
descriptors. Unix has interprocess communications (IPC) sockets that
let processes talk to each other. We’ve been calling them stdin,
stdout, and stderr. This is interesting because spawn() lets us specify file descriptors
when starting a new child process. This means that instead of the OS
assigning a new file descriptor, we can ask child processes to share
an existing file descriptor with the parent process. That file
descriptor might be a network socket to the Internet or just the
parent’s stdin, but the point is that we have a powerful way of
delegating work to child processes.
How does this work in practice? When
passing the options object to spawn(), we can specify customFds to pass our own three file
descriptors to the child instead of
them creating a stdin, stdout, and stderr file descriptor (Examples 5-33 and
5-34).
Example 5-34. Running the previous example and piping in data to stdin
Enki:~ $echo "foo"foo Enki:~ $echo "foo" | nodereadline.js:80 tty.setRawMode(true); ^ Error: ENOTTY, Inappropriate ioctl for device at new Interface (readline.js:80:9) at Object.createInterface (readline.js:38:10) at new REPLServer (repl.js:102:16) at Object.start (repl.js:218:10) at Function.runRepl (node.js:365:26) at startup (node.js:61:13) at node.js:443:3 Enki:~ $echo "foo" | catfoo Enki:~ $echo "foo" | node fds.jsfoo Enki:~ $
The file descriptors 0, 1, and
2 represent stdin, stdout, and
stderr, respectively. In this example, we create a child and pass it stdin, stdout, and stderr
from the parent Node process. We can test this wiring using the
command line. The echo command
outputs a string “foo.” If we pass that directly to node with a pipe (stdout to stdin), we get
an error. We can, however, pass it to the cat command, which echoes it back. Also, if
we pipe to the Node process running our script, it echoes back. This
is because we’ve hooked up the stdin, stdout, and stderr of the Node
process directly to the cat command
in our child process. When the main Node process gets data on stdin,
it gets passed to the cat child
process, which echoes it back on the shared stdout. One thing to note
is that once you wire up the Node process this way, the child process
loses its child.stdin, child.stdout, and child.stderr file descriptor references.
This is because once you pass the file descriptors to the process,
they are duplicated and the kernel handles the data passing.
Consequently, Node isn’t in between the process and the file
descriptors (FDs), so you cannot add events to those streams (see
Examples 5-35 and 5-36).
Example 5-36. Results of the test
Enki:~ $ echo "foo" | node fds.js
node.js:134
throw e; // process.nextTick error, or 'error' event on first tick
foo
^
TypeError: Cannot call method 'on' of null
at Object.<anonymous> (/Users/croucher/fds.js:3:14)
at Module._compile (module.js:404:26)
at Object..js (module.js:410:10)
at Module.load (module.js:336:31)
at Function._load (module.js:297:12)
at Array.<anonymous> (module.js:423:10)
at EventEmitter._tickCallback (node.js:126:26)
Enki:~ $When custom file descriptors are
specified, the streams are literally set to null and are completely inaccessible from
the parent. It is still preferable in many cases, though, because
routing through the kernel is much faster than using something like
stream.pipe() with Node to connect
the streams together. However, stdin, stdout, and stderr aren’t the
only file descriptors worth connecting to child processes. A very
common use case is connecting network sockets to a number of children,
which allows for multicore utilization.
Say we are creating a website, a game server, or anything that has to deal with a bunch of traffic. We have this great server that has a bunch of processors, each of which has two or four cores. If we simply started a Node process running our code, we’d have just one core being used. Although CPU isn’t always the critical factor for Node, we want to make sure we get as close to the CPU bound as we can. We could start a bunch of Node processes with different ports and load-balance them with Nginx or Apache Traffic Server. However, that’s inelegant and requires us to use more software. We could create a Node process that creates a bunch of child processes and routes all the requests to them. This is a bit closer to our optimal solution, but with this approach we just created a single point of failure because only one Node process routes all the traffic. This isn’t ideal. This is where passing custom FDs comes into its own. In the same way that we can pass the stdin, stdout, and stderr of a master process, we can create other sockets and pass those in to child processes. However, because we are passing file descriptors instead of messages, the kernel will deal with the routing. This means that although the master Node process is still required, it isn’t bearing the load for all the traffic.
assert is a core library that provides the basis for testing code. Node’s
assertions works pretty much like the same feature in other languages and
environments: they allow you to make claims about objects and function
calls and send out messages when the assertions are violated. These
methods are really easy to get started with and provide a great way to
unit test your code’s features. Node’s own tests are written with assert.
Most assert
methods come in pairs: one method providing the positive test and the
other providing the negative one. For
instance, Example 5-37 shows equal() and notEqual(). The methods take two arguments: the first is the expected
value, and the second is the actual value.
Example 5-37. Basic assertions
> var assert = require('assert');
> assert.equal(1, true, 'Truthy');
> assert.notEqual(1, true, 'Truthy');
AssertionError: Truthy
at [object Context]:1:8
at Interface.<anonymous> (repl.js:171:22)
at Interface.emit (events.js:64:17)
at Interface._onLine (readline.js:153:10)
at Interface._line (readline.js:408:8)
at Interface._ttyWrite (readline.js:585:14)
at ReadStream.<anonymous> (readline.js:73:12)
at ReadStream.emit (events.js:81:20)
at ReadStream._emitKey (tty_posix.js:307:10)
at ReadStream.onData (tty_posix.js:70:12)
>The most obvious thing here is that when an
assert method doesn’t pass, it throws
an exception. This is a fundamental principle in the test suites. When a
test suite runs, it should just run, without throwing an exception. If
that is the case, the test is successful.
There are just a few assertions. equal() and notEqual() check for the == equality
and != inequality
operators. This means they test weakly for truthy and
falsy values, as Crockford termed them. In brief,
when tested as a Boolean, falsy values consist of false, 0, empty strings (i.e., ""), null,
undefined, and NaN. All other values are truthy. A string such
as "false" is truthy. A string
containing "0" is also truthy. As such,
equal() and notEqual() are fine to compare simple values
(strings, numbers, etc.) with each other, but you should be careful
checking against Booleans to ensure you got the result you wanted.
The stringEqual() and notStrictEqual() methods test equality with ===
and !==, which will ensure that only
actual values of true and false are treated as true and false,
respectively. The ok() method, shown in
Example 5-38, is a shorthand for testing whether
something is truthy, by comparing the value with true using ==.
Example 5-38. Testing whether something is truthy with assert.ok( )
> assert.ok('This is a string', 'Strings that are not empty are truthy');
> assert.ok(0, 'Zero is not truthy');
AssertionError: Zero is not truthy
at [object Context]:1:8
at Interface.<anonymous> (repl.js:171:22)
at Interface.emit (events.js:64:17)
at Interface._onLine (readline.js:153:10)
at Interface._line (readline.js:408:8)
at Interface._ttyWrite (readline.js:585:14)
at ReadStream.<anonymous> (readline.js:73:12)
at ReadStream.emit (events.js:81:20)
at ReadStream._emitKey (tty_posix.js:307:10)
at ReadStream.onData (tty_posix.js:70:12)
>Often the things you want to compare aren’t
simple values, but objects. JavaScript doesn’t have a way to let objects
define equality operators on themselves, and even if it did, people often wouldn’t define the operators.
So the deepEqual() and
notDeepEqual() methods provide a
way of deeply comparing object values. Without going into too many of the
gory details, these methods perform a few checks. If any check fails, the
test throws an exception. The first test checks whether the values simply
match with the === operator. Next, the
values are checked to see whether they are Buffers and, if so, they are checked for their
length, and then checked byte by byte. Next, if the object types don’t
match with the == operator, they can’t
be equal. Finally, if the arguments are objects, more extensive tests are
done, comparing the prototypes of the two objects and the number of
properties, and then recursively performing deepEqual() on each property.
The important point here is that deepEqual() and notDeepEqual() are extremely helpful and
thorough, but also potentially expensive. You should try to use them only
when needed. Although these methods will attempt to do the most efficient
tests first, it can still take a bit longer to find an inequality. If you
can provide a more specific reference, such as the property of an object
rather than the whole object, you can significantly improve the
performance of your tests.
The next assert methods are throws() and doesNotThrow(). These check whether a particular block of code does or
doesn’t throw an exception. You can check for a specific exception or just
whether any exception is thrown. The methods are pretty straightforward,
but have a few options that are worth reviewing.
It might be easy to overlook these tests, but handling exceptions is an essential part of writing robust JavaScript code, so you should use the tests to make sure the code you write throws exceptions in all the correct places. Chapter 3 offers more information on how to deal with exceptions well.
To pass blocks of code to throws() and doesNotThrow(), wrap them in functions that take
no arguments (see Example 5-39). The exception
being tested for is optional. If one isn’t passed, throws() will just check whether any exception
happened, and doesNotThrow() will
ensure that an exception hasn’t been thrown. If a specific error is
passed, throws() will check that the
specified exception and only that exception was thrown. If any other
exceptions are thrown or the exception isn’t thrown, the test will not
pass. For doesNotThrow(), when an error
is specified, it will continue without error if any exception other than
the one specified in the argument is thrown. If an exception matching the
specified error is thrown, it will cause the test to fail.
Example 5-39. Using assert.throws( ) and assert.doesNotThrow( ) to check for exception handling
> assert.throws(
... function() {
... throw new Error("Seven Fingers. Ten is too mainstream.");
... });
> assert.doesNotThrow(
... function() {
... throw new Error("I lived in the ocean way before Nemo");
... });
AssertionError: "Got unwanted exception (Error).."
at Object._throws (assert.js:281:5)
at Object.doesNotThrow (assert.js:299:11)
at [object Context]:1:8
at Interface.<anonymous> (repl.js:171:22)
at Interface.emit (events.js:64:17)
at Interface._onLine (readline.js:153:10)
at Interface._line (readline.js:408:8)
at Interface._ttyWrite (readline.js:585:14)
at ReadStream.<anonymous> (readline.js:73:12)
at ReadStream.emit (events.js:81:20)
>There are four ways to specify the type of error to look for or avoid. Pass one of the following:
The function should take the exception error as its single
argument. In the function, compare the exception actually thrown to
the one you expect to find out whether there is a match. Return
true if there is a match and
false otherwise.
The library will compare the regex to the error message to
find a match using the regex.test()
method in JavaScript.
The library will directly compare the string to the error message.
The library will perform a typeof test on the exception. If this test
throws an error with the typeof
constructor, then the exception matches. This can be used to make
throws() and doesNotThrow() very flexible.
The vm, or
Virtual Machine, module allows you to run arbitrary chunks of code and get a
result back. It has a number of features that allow you to change the
context in which the code runs. This can be useful to act as a kind of
faux sandbox. However, the code is still running in the same Node process,
so you should be cautious. vm is
similar to eval(), but offers some more
features and a better API for managing code. It doesn’t have the ability
to interact with the local scope in the way that eval() does,
however.
There are two ways to run code with vm. Running the code “inline” is similar to
using eval(). The second way is to
precompile the code into a vm.Script
object. Let’s have a look at Example 5-40, which demonstrates running code inline
using vm.
So far, vm
looks a lot like eval(). We pass some
code to it, and we get a result back. However, vm doesn’t interact with local scope in the same
way that eval() does. Code run with
eval() will behave as if it were truly
inline and replaces the eval() method
call. But calls to vm methods will not
interact with the local scope. So eval() can change the surrounding context,
whereas vm cannot, as shown in Example 5-41.
Example 5-41. Accessing the local scope to show the differences between vm and eval( )
> var vm = require('vm'),
... e = 0,
... v = 0;
> eval(e=e+1);
1
> e
1
> vm.runInThisContext('v=v+1');
ReferenceError: v is not defined
at evalmachine.<anonymous>:1:1
at [object Context]:1:4
at Interface.<anonymous> (repl.js:171:22)
at Interface.emit (events.js:64:17)
at Interface._onLine (readline.js:153:10)
at Interface._line (readline.js:408:8)
at Interface._ttyWrite (readline.js:585:14)
at ReadStream.<anonymous> (readline.js:73:12)
at ReadStream.emit (events.js:81:20)
at ReadStream._emitKey (tty_posix.js:307:10)
>
> vm.runInThisContext('v=0');
0
> vm.runInThisContext('v=v+1');
1
>
0We’ve created two variables, e and v. When
we use the e variable with eval(), the end result of the statement applies
back to the main context. However, when we try the same thing with
v and vm.runInThisContext(), we get an exception
because we refer to v on the right side
of the equals sign, and that variable is not defined. Whereas eval() runs in the local scope, vm does not.
The vm subsystem actually
maintains its own local context that persists from one invocation of
vm to another. Thus, if we create
v within the scope of the vm, the variable subsequently is available to
later vm invocations, maintaining the
state in which the first vm left it.
However, the variable from the vm has
no impact on v in the local scope of
the main event loop.
It’s also possible to pass a preexisting
context to vm. This context will be
used in place of the default context.
Example 5-42 uses
vm.runInNewContext(), which takes a context object as a second argument. The scope
of that object becomes the context for the code we run with vm. If we continue to pass it from object to
object, the context will be modified. However, the context is also
available to the global scope.
You can also compile vm.Script objects (Example 5-43). These save a
piece of code that you can then run repeatedly. At runtime, you can choose
the context to be applied. This is helpful when you are repeatedly running
the same code against multiple contexts.
Example 5-43. Compiling code into a script with vm
> var vm = require('vm');
> var fs = require('fs');
>
> var code = fs.readFileSync('example.js');
> code.toString();
'console.log(output);\n'
>
> var script = vm.createScript(code);
> script.runInNewContext({output:"Kick Ass"});
ReferenceError: console is not defined
at undefined:1:1
at [object Context]:1:8
at Interface.<anonymous> (repl.js:171:22)
at Interface.emit (events.js:64:17)
at Interface._onLine (readline.js:153:10)
at Interface._line (readline.js:408:8)
at Interface._ttyWrite (readline.js:585:14)
at ReadStream.<anonymous> (readline.js:73:12)
at ReadStream.emit (events.js:81:20)
at ReadStream._emitKey (tty_posix.js:307:10)
> script.runInNewContext({"console":console,"output":"Kick Ass"});
Kick AssThis example reads in a JavaScript file that
contains the simple command console.log(output);. we compile this into a
script object, which means we can then
run script.runInNewContext() on the script and pass in a context. We deliberately
triggered an error to show that, just as when running vm.runInNewContext(), you need to pass in the
objects to which you refer (such as the console object); otherwise, even basic global
functions are not available. It’s also worth noting that the exception is
thrown from undefined:1:1.
All the vm run commands take a
filename as an optional final argument. It doesn’t change the
functionality, but allows you to set the name of the file that appears in
a message when an error is thrown. This is useful if you load a lot of
files from disk and run them because it tells you which piece of code
threw an error. The parameter is totally arbitrary, so you could use
whatever string is meaningful to help you debug the code.
[12] Hash-based Message Authentication Code (HMAC) is a crytographic way of verifying data. It is often used like hashing algorithms to verify that two pieces of data match, but it also verifies that the data hasn’t been tampered with.
[13] It’s possible to deliberately make two pieces of data with the same MD5 checksum, which for some purposes can make the algorithm less desirable. More modern algorithms are less prone to this, although people are finding similar problems with SHA1 now.
[14] SIGKILL can be invoked in the shell through kill
-9.