1 |
|
2 | This document describes some examples of where various features
|
3 | of pull streams are used in simple real-world examples.
|
4 |
|
5 | Much of the focus here is handling the error cases. Indeed,
|
6 | distributed systems are _all about_ handling the error cases.
|
7 |
|
8 | # A simple source that ends correctly. (read, end)
|
9 |
|
10 | A normal file (source) is read, and sent to a sink stream
|
11 | that computes some aggregation upon that input such as
|
12 | the number of bytes, or number of occurances of the `\n`
|
13 | character (i.e. the number of lines).
|
14 |
|
15 | The source reads a chunk of the file at each time it's called,
|
16 | there is some optimium size depending on your operating system,
|
17 | file system, physical hardware,
|
18 | and how many other files are being read concurrently.
|
19 |
|
20 | When the sink gets a chunk, it iterates over the characters in it
|
21 | counting the `\n` characters. When the source returns `end` to the
|
22 | sink, the sink calls a user provided callback.
|
23 |
|
24 | # A source that may fail. (read, err, end)
|
25 |
|
26 | A file is downloaded over http and written to a file.
|
27 | The network should always be considered to be unreliable,
|
28 | and you must design your system to recover if the download
|
29 | fails. (For example if the wifi were to cut out).
|
30 |
|
31 | The read stream is just the http download, and the sink
|
32 | writes it to a temporary file. If the source ends normally,
|
33 | the temporary file is moved to the correct location.
|
34 | If the source errors, the temporary file is deleted.
|
35 |
|
36 | (You could also write the file to the correct location,
|
37 | and delete it if it errors, but the temporary file method has the advantage
|
38 | that if the computer or process crashes it leaves only a temporary file
|
39 | and not a file that appears valid. Stray temporary files can be cleaned up
|
40 | or resumed when the process restarts.)
|
41 |
|
42 | # A sink that may fail
|
43 |
|
44 | If we read a file from disk, and upload it, then the upload is the sink that may error.
|
45 | The file system is probably faster than the upload and
|
46 | so it will mostly be waiting for the sink to ask for more data.
|
47 | Usually the sink calls `read(null, cb)` and the source retrives chunks of the file
|
48 | until the file ends. If the sink errors, it then calls `read(true, cb)`
|
49 | and the source closes the file descriptor and stops reading.
|
50 | In this case the whole file is never loaded into memory.
|
51 |
|
52 | # A sink that may fail out of turn.
|
53 |
|
54 | A http client connects to a log server and tails a log in realtime.
|
55 | (Another process will write to the log file,
|
56 | but we don't need to worry about that.)
|
57 |
|
58 | The source is the server's log stream, and the sink is the client.
|
59 | First the source outputs the old data, this will always be a fast
|
60 | response, because that data is already at hand. When the old data is all
|
61 | written then the output rate may drop significantly because the server (the source) will
|
62 | wait for new data to be added to the file. Therefore,
|
63 | it becomes much more likely that the sink will error (for example if the network connection
|
64 | drops) while the source is waiting for new data. Because of this,
|
65 | it's necessary to be able to abort the stream reading (after you called
|
66 | read, but before it called back). If it was not possible to abort
|
67 | out of turn, you'd have to wait for the next read before you can abort
|
68 | but, depending on the source of the stream, the next read may never come.
|
69 |
|
70 | # A through stream that needs to abort.
|
71 |
|
72 | Say we wish to read from a file (source), parse each line as JSON (through),
|
73 | and then output to another file (sink).
|
74 | If the parser encounters illegal JSON then it will error and,
|
75 | if this parsing is a fatal error, then the parser needs to abort the pipeline
|
76 | from the middle. Here the source reads normaly, but then the through fails.
|
77 | When the through finds an invalid line, it should first abort the source,
|
78 | and then callback to the sink with an error. This way,
|
79 | by the time the sink receives the error, the entire stream has been cleaned up.
|
80 |
|
81 | (You could abort the source and error back to the sink in parallel.
|
82 | However, if something happened to the source while aborting, for the user
|
83 | discover this error they would have to call the source again with another callback, as
|
84 | situation would occur only rarely users would be inclined to not handle it leading to
|
85 | the possiblity of undetected errors.
|
86 | Therefore, as it is better to have one callback at the sink, wait until the source
|
87 | has finished cleaning up before callingback to the pink with an error.)
|
88 |
|
89 | In some cases you may want the stream to continue, and the the through stream can just ignore
|
90 | an any lines that do not parse. An example where you definately
|
91 | want a through stream to abort on invalid input would be an encrypted stream, which
|
92 | should be broken into chunks that are encrypted separately.
|