UNPKG

pull-stream/docs/examples.md

Version:

4.59 kBMarkdownView Raw

1
2This document describes some examples of where various features
3of pull streams are used in simple real-world examples.
4
5Much of the focus here is handling the error cases. Indeed,
6distributed systems are _all about_ handling the error cases.
7
8# A simple source that ends correctly. (read, end)
9
10A normal file (source) is read, and sent to a sink stream
11that computes some aggregation upon that input such as 
12the number of bytes, or number of occurances of the `\n`
13character (i.e. the number of lines).
14
15The source reads a chunk of the file at each time it's called,
16there is some optimium size depending on your operating system,
17file system, physical hardware,
18and how many other files are being read concurrently.
19
20When the sink gets a chunk, it iterates over the characters in it
21counting the `\n` characters. When the source returns `end` to the
22sink, the sink calls a user provided callback.
23
24# A source that may fail. (read, err, end)
25
26A file is downloaded over http and written to a file.
27The network should always be considered to be unreliable,
28and you must design your system to recover if the download
29fails. (For example if the wifi were to cut out).
30
31The read stream is just the http download, and the sink
32writes it to a temporary file. If the source ends normally,
33the temporary file is moved to the correct location.
34If the source errors, the temporary file is deleted.
35
36(You could also write the file to the correct location,
37and delete it if it errors, but the temporary file method has the advantage
38that if the computer or process crashes it leaves only a temporary file
39and not a file that appears valid. Stray temporary files can be cleaned up
40or resumed when the process restarts.)
41
42# A sink that may fail
43
44If we read a file from disk, and upload it, then the upload is the sink that may error.
45The file system is probably faster than the upload and
46so it will mostly be waiting for the sink to ask for more data.
47Usually the sink calls `read(null, cb)` and the source retrives chunks of the file
48until the file ends. If the sink errors, it then calls `read(true, cb)`
49and the source closes the file descriptor and stops reading.
50In this case the whole file is never loaded into memory.
51
52# A sink that may fail out of turn.
53
54A http client connects to a log server and tails a log in realtime.
55(Another process will write to the log file,
56but we don't need to worry about that.)
57
58The source is the server's log stream, and the sink is the client.
59First the source outputs the old data, this will always be a fast
60response, because that data is already at hand. When the old data is all
61written then the output rate may drop significantly because the server (the source) will
62wait for new data to be added to the file. Therefore,
63it becomes much more likely that the sink will error (for example if the network connection
64drops) while the source is waiting for new data. Because of this,
65it's necessary to be able to abort the stream reading (after you called
66read, but before it called back). If it was not possible to abort
67out of turn, you'd have to wait for the next read before you can abort
68but, depending on the source of the stream, the next read may never come.
69
70# A through stream that needs to abort.
71
72Say we wish to read from a file (source), parse each line as JSON (through),
73and then output to another file (sink).
74If the parser encounters illegal JSON then it will error and,
75if this parsing is a fatal error, then the parser needs to abort the pipeline
76from the middle. Here the source reads normaly, but then the through fails.
77When the through finds an invalid line, it should first abort the source,
78and then callback to the sink with an error. This way,
79by the time the sink receives the error, the entire stream has been cleaned up.
80
81(You could abort the source and error back to the sink in parallel.
82However, if something happened to the source while aborting, for the user
83discover this error they would have to call the source again with another callback, as
84situation would occur only rarely users would be inclined to not handle it leading to
85the possiblity of undetected errors.
86Therefore, as it is better to have one callback at the sink, wait until the source
87has finished cleaning up before callingback to the pink with an error.)
88
89In some cases you may want the stream to continue, and the the through stream can just ignore
90an any lines that do not parse. An example where you definately
91want a through stream to abort on invalid input would be an encrypted stream, which
92should be broken into chunks that are encrypted separately.

1
2	`This document describes some examples of where various features`
3	`of pull streams are used in simple real-world examples.`
4
5	`Much of the focus here is handling the error cases. Indeed,`
6	`distributed systems are _all about_ handling the error cases.`
7
8	`# A simple source that ends correctly. (read, end)`
9
10	`A normal file (source) is read, and sent to a sink stream`
11	`that computes some aggregation upon that input such as`
12	the number of bytes, or number of occurances of the `\n`
13	`character (i.e. the number of lines).`
14
15	`The source reads a chunk of the file at each time it's called,`
16	`there is some optimium size depending on your operating system,`
17	`file system, physical hardware,`
18	`and how many other files are being read concurrently.`
19
20	`When the sink gets a chunk, it iterates over the characters in it`
21	counting the `\n` characters. When the source returns `end` to the
22	`sink, the sink calls a user provided callback.`
23
24	`# A source that may fail. (read, err, end)`
25
26	`A file is downloaded over http and written to a file.`
27	`The network should always be considered to be unreliable,`
28	`and you must design your system to recover if the download`
29	`fails. (For example if the wifi were to cut out).`
30
31	`The read stream is just the http download, and the sink`
32	`writes it to a temporary file. If the source ends normally,`
33	`the temporary file is moved to the correct location.`
34	`If the source errors, the temporary file is deleted.`
35
36	`(You could also write the file to the correct location,`
37	`and delete it if it errors, but the temporary file method has the advantage`
38	`that if the computer or process crashes it leaves only a temporary file`
39	`and not a file that appears valid. Stray temporary files can be cleaned up`
40	`or resumed when the process restarts.)`
41
42	`# A sink that may fail`
43
44	`If we read a file from disk, and upload it, then the upload is the sink that may error.`
45	`The file system is probably faster than the upload and`
46	`so it will mostly be waiting for the sink to ask for more data.`
47	Usually the sink calls `read(null, cb)` and the source retrives chunks of the file
48	until the file ends. If the sink errors, it then calls `read(true, cb)`
49	`and the source closes the file descriptor and stops reading.`
50	`In this case the whole file is never loaded into memory.`
51
52	`# A sink that may fail out of turn.`
53
54	`A http client connects to a log server and tails a log in realtime.`
55	`(Another process will write to the log file,`
56	`but we don't need to worry about that.)`
57
58	`The source is the server's log stream, and the sink is the client.`
59	`First the source outputs the old data, this will always be a fast`
60	`response, because that data is already at hand. When the old data is all`
61	`written then the output rate may drop significantly because the server (the source) will`
62	`wait for new data to be added to the file. Therefore,`
63	`it becomes much more likely that the sink will error (for example if the network connection`
64	`drops) while the source is waiting for new data. Because of this,`
65	`it's necessary to be able to abort the stream reading (after you called`
66	`read, but before it called back). If it was not possible to abort`
67	`out of turn, you'd have to wait for the next read before you can abort`
68	`but, depending on the source of the stream, the next read may never come.`
69
70	`# A through stream that needs to abort.`
71
72	`Say we wish to read from a file (source), parse each line as JSON (through),`
73	`and then output to another file (sink).`
74	`If the parser encounters illegal JSON then it will error and,`
75	`if this parsing is a fatal error, then the parser needs to abort the pipeline`
76	`from the middle. Here the source reads normaly, but then the through fails.`
77	`When the through finds an invalid line, it should first abort the source,`
78	`and then callback to the sink with an error. This way,`
79	`by the time the sink receives the error, the entire stream has been cleaned up.`
80
81	`(You could abort the source and error back to the sink in parallel.`
82	`However, if something happened to the source while aborting, for the user`
83	`discover this error they would have to call the source again with another callback, as`
84	`situation would occur only rarely users would be inclined to not handle it leading to`
85	`the possiblity of undetected errors.`
86	`Therefore, as it is better to have one callback at the sink, wait until the source`
87	`has finished cleaning up before callingback to the pink with an error.)`
88
89	`In some cases you may want the stream to continue, and the the through stream can just ignore`
90	`an any lines that do not parse. An example where you definately`
91	`want a through stream to abort on invalid input would be an encrypted stream, which`
92	`should be broken into chunks that are encrypted separately.`