UNPKG

benchpress/docs/index.md

Version:

10.5 kBMarkdownView Raw

1# Benchpress
2
3Benchpress is a framework for e2e performance tests.
4See [here for an example project](https://github.com/angular/benchpress-tree).
5
6# Why?
7
8There are so called "micro benchmarks" that essentially use a stop watch in the browser to measure time
9(e.g. via `performance.now()`). This approach is limited to time, and in some cases memory
10(Chrome with special flags), as metric. It does not allow to measure:
11
12- rendering time: e.g. the time the browser spends to layout or paint elements. This can e.g. used to
13  test the performance impact of stylesheet changes.
14- garbage collection: e.g. how long the browser paused script execution, and how much memory was collected.
15  This can be used to stabilize script execution time, as garbage collection times are usually very
16  unpredictable. This data can also be used to measure and improve memory usage of applications,
17  as the garbage collection amount directly affects garbage collection time.
18- distinguish script execution time from waiting: e.g. to measure the client side only time that is spent
19  in a complex user interaction, ignoring backend calls.
20- measure fps to assert the smoothness of scrolling and animations.
21
22This kind of data is already available in the DevTools of modern browsers. However, there is no standard way to
23use those tools in an automated way to measure web app performance, especially not across platforms.
24
25Benchpress tries to fill this gap, i.e. allow to access all kinds of performance metrics in an automated way.
26
27
28# How it works
29
30Benchpress uses webdriver to read out the so called "performance log" of browsers. This contains all kinds of interesting
31data, e.g. when a script started/ended executing, gc started/ended, the browser painted something to the screen, ...
32
33As browsers are different, benchpress has plugins to normalizes these events.
34
35
36# Features
37
38* Provides a loop (so called "Sampler") that executes the benchmark multiple times
39* Automatically waits/detects until the browser is "warm"
40* Reporters provide a normalized way to store results:
41  - console reporter
42  - file reporter
43  - Google Big Query reporter (coming soon)
44* Supports micro benchmarks as well via `console.time()` / `console.timeEnd()`
45  - `console.time()` / `console.timeEnd()` mark the timeline in the DevTools, so it makes sense
46    to use them in micro benchmark to visualize and understand them, with or without benchpress.
47  - running micro benchmarks in benchpress leverages the already existing reporters,
48    the sampler and the auto warmup feature of benchpress.
49
50
51# Supported browsers
52
53* Chrome on all platforms
54* Mobile Safari (iOS)
55* Firefox (work in progress)
56
57
58# How to write a benchmark
59
60A benchmark in benchpress is made by an application under test
61and a benchmark driver. The application under test is the
62actual application consisting of html/css/js that should be tests.
63A benchmark driver is a webdriver test that interacts with the
64application under test.
65
66
67## A simple benchmark
68
69Let's assume we want to measure the script execution time, as well as the render time
70that it takes to fill a container element with a complex html string.
71
72The application under test could look like this:
73
74```
75index.html:
76
77<button id="reset" onclick="reset()">Reset</button>
78<button id="fill" onclick="fill()">fill innerHTML</button>
79<div id="container"></div>
80<script>
81  var container = document.getElementById('container');
82  var complexHtmlString = '...'; // TODO
83
84  function reset() { container.innerHTML = ''; }
85
86  function fill() {
87    container.innerHTML = complexHtmlString;
88  }
89</script>
90```
91
92A benchmark driver could look like this:
93
94```
95// A runner contains the shared configuration
96// and can be shared across multiple tests.
97var runner = new Runner(...);
98
99driver.get('http://myserver/index.html');
100
101var resetBtn = driver.findElement(By.id('reset'));
102var fillBtn = driver.findElement(By.id('fill'));
103
104runner.sample({
105  id: 'fillElement',
106  // Prepare is optional...
107  prepare: () {
108    resetBtn.click();
109  },
110  execute: () {
111    fillBtn.click();
112    // Note: if fillBtn would use some asynchronous code,
113    // we would need to wait here for its end.
114  }
115});
116```
117
118## Measuring in the browser
119
120If the application under test would like to, it can measure on its own.
121E.g.
122
123```
124index.html:
125
126<button id="measure" onclick="measure()">Measure document.createElement</button>
127<script>
128  function measure() {
129    console.time('createElement*10000');
130    for (var i=0; i<100000; i++) {
131      document.createElement('div');
132    }
133    console.timeEnd('createElement*10000');
134  }
135</script>
136```
137
138When the `measure` button is clicked, it marks the timeline and creates 10000 elements.
139It uses the special names `createElement*10000` to tell benchpress that the
140time that was measured is for 10000 calls to createElement and that benchpress should
141take the average for it.
142
143A test driver for this would look like this:
144
145````
146driver.get('.../index.html');
147
148var measureBtn = driver.findElement(By.id('measure'));
149runner.sample({
150  id: 'createElement test',
151  microMetrics: {
152    'createElement': 'time to create an element (ms)'
153  },
154  execute: () {
155    measureBtn.click();
156  }
157});
158````
159
160When looking into the DevTools Timeline, we see a marker as well:
161![Marked Timeline](marked_timeline.png)
162
163# Smoothness Metrics
164
165Benchpress can also measure the "smoothness" of scrolling and animations. In order to do that, the following set of metrics can be collected by benchpress:
166
167- `frameTime.mean`: mean frame time in ms (target: 16.6ms for 60fps)
168- `frameTime.worst`: worst frame time in ms
169- `frameTime.best`: best frame time in ms
170- `frameTime.smooth`: percentage of frames that hit 60fps
171
172To collect these metrics, you need to execute `console.time('frameCapture')` and `console.timeEnd('frameCapture')` either in your benchmark application or in you benchmark driver via webdriver. The metrics mentioned above will only be collected between those two calls and it is recommended to wrap the time/timeEnd calls as closely as possible around the action you want to evaluate to get accurate measurements.
173
174In addition to that, one extra binding needs to be passed to benchpress in tests that want to collect these metrics:
175
176    benchpress.sample(bindings: [bp.bind(bp.Options.CAPTURE_FRAMES).toValue(true)], ... )
177
178# Requests Metrics
179
180Benchpress can also record the number of requests sent and count the received "encoded" bytes since [window.performance.timing.navigationStart](http://www.w3.org/TR/navigation-timing/#dom-performancetiming-navigationstart):
181
182- `receivedData`: number of bytes received since the last navigation start
183- `requestCount`: number of requests sent since the last navigation start
184
185To collect these metrics, you need the following corresponding extra bindings:
186
187    benchpress.sample(bindings: [
188      bp.bind(bp.Options.RECEIVED_DATA).toValue(true),
189      bp.bind(bp.Options.REQUEST_COUNT).toValue(true)
190    ], ... )
191
192# Best practices
193
194* Use normalized environments
195  - metrics that are dependent on the performance of the execution environment must be executed on a normalized machine
196  - e.g. a real mobile device whose cpu frequency is set to a fixed value.
197      * see our [build script](https://github.com/angular/angular/blob/master/scripts/ci/android_cpu.sh)
198      * this requires root access, e.g. via a userdebug build of Android on a Google Nexus device
199        (see [here](https://source.android.com/source/building-running.html) and [here](https://source.android.com/source/building-devices.html#obtaining-proprietary-binaries))
200  - e.g. a calibrated machine that does not run background jobs, has a fixed cpu frequency, ...
201
202* Use relative comparisons
203  - relative comparisons are less likely to change over time and help to interpret the results of benchmarks
204  - e.g. compare an example written using a ui framework against a hand coded example and track the ratio
205
206* Assert post-commit for commit ranges
207  - running benchmarks can take some time. Running them before every commit is usually too slow.
208  - when a regression is detected for a commit range, use bisection to find the problematic commit
209
210* Repeat benchmarks multiple times in a fresh window
211  - run the same benchmark multiple times in a fresh window and then take the minimal average value of each benchmark run
212
213* Use force gc with care
214  - forcing gc can skew the script execution time and gcTime numbers,
215    but might be needed to get stable gc time / gc amount numbers
216
217* Open a new window for every test
218  - browsers (e.g. chrome) might keep JIT statistics over page reloads and optimize pages differently depending on what has been loaded before
219
220# Detailed overview
221
222![Overview](overview.png)
223
224Definitions:
225
226* valid sample: a sample that represents the world that should be measured in a good way.
227* complete sample: sample of all measure values collected so far
228
229Components:
230
231* Runner
232  - contains a default configuration
233  - creates a new injector for every sample call, via which all other components are created
234
235* Sampler
236  - gets data from the metrics
237  - reports measure values immediately to the reporters
238  - loops until the validator is able to extract a valid sample out of the complete sample (see below).
239  - reports the valid sample and the complete sample to the reporters
240
241* Metric
242  - gets measure values from the browser
243  - e.g. reads out performance logs, DOM values, JavaScript values
244
245* Validator
246  - extracts a valid sample out of the complete sample of all measure values.
247  - e.g. wait until there are 10 samples and take them as valid sample (would include warmup time)
248  - e.g. wait until the regression slope for the metric `scriptTime` through the last 10 measure values is >=0, i.e. the values for the `scriptTime` metric are no more decreasing
249
250* Reporter
251  - reports measure values, the valid sample and the complete sample to backends
252  - e.g. a reporter that prints to the console, a reporter that reports values into Google BigQuery, ...
253
254* WebDriverAdapter
255  - abstraction over the used web driver client
256  - one implementation for every webdriver client
257    E.g. one for selenium-webdriver Node.js module, dart async webdriver, dart sync webdriver, ...
258
259* WebDriverExtension
260  - implements additional methods that are standardized in the webdriver protocol using the WebDriverAdapter
261  - provides functionality like force gc, read out performance logs in a normalized format
262  - one implementation per browser, e.g. one for Chrome, one for mobile Safari, one for Firefox
263
264
265
\No newline at end of file

1	`# Benchpress`
2
3	`Benchpress is a framework for e2e performance tests.`
4	`See [here for an example project](https://github.com/angular/benchpress-tree).`
5
6	`# Why?`
7
8	`There are so called "micro benchmarks" that essentially use a stop watch in the browser to measure time`
9	(e.g. via `performance.now()`). This approach is limited to time, and in some cases memory
10	`(Chrome with special flags), as metric. It does not allow to measure:`
11
12	`- rendering time: e.g. the time the browser spends to layout or paint elements. This can e.g. used to`
13	`test the performance impact of stylesheet changes.`
14	`- garbage collection: e.g. how long the browser paused script execution, and how much memory was collected.`
15	`This can be used to stabilize script execution time, as garbage collection times are usually very`
16	`unpredictable. This data can also be used to measure and improve memory usage of applications,`
17	`as the garbage collection amount directly affects garbage collection time.`
18	`- distinguish script execution time from waiting: e.g. to measure the client side only time that is spent`
19	`in a complex user interaction, ignoring backend calls.`
20	`- measure fps to assert the smoothness of scrolling and animations.`
21
22	`This kind of data is already available in the DevTools of modern browsers. However, there is no standard way to`
23	`use those tools in an automated way to measure web app performance, especially not across platforms.`
24
25	`Benchpress tries to fill this gap, i.e. allow to access all kinds of performance metrics in an automated way.`
26
27
28	`# How it works`
29
30	`Benchpress uses webdriver to read out the so called "performance log" of browsers. This contains all kinds of interesting`
31	`data, e.g. when a script started/ended executing, gc started/ended, the browser painted something to the screen, ...`
32
33	`As browsers are different, benchpress has plugins to normalizes these events.`
34
35
36	`# Features`
37
38	`* Provides a loop (so called "Sampler") that executes the benchmark multiple times`
39	`* Automatically waits/detects until the browser is "warm"`
40	`* Reporters provide a normalized way to store results:`
41	`- console reporter`
42	`- file reporter`
43	`- Google Big Query reporter (coming soon)`
44	* Supports micro benchmarks as well via `console.time()` / `console.timeEnd()`
45	- `console.time()` / `console.timeEnd()` mark the timeline in the DevTools, so it makes sense
46	`to use them in micro benchmark to visualize and understand them, with or without benchpress.`
47	`- running micro benchmarks in benchpress leverages the already existing reporters,`
48	`the sampler and the auto warmup feature of benchpress.`
49
50
51	`# Supported browsers`
52
53	`* Chrome on all platforms`
54	`* Mobile Safari (iOS)`
55	`* Firefox (work in progress)`
56
57
58	`# How to write a benchmark`
59
60	`A benchmark in benchpress is made by an application under test`
61	`and a benchmark driver. The application under test is the`
62	`actual application consisting of html/css/js that should be tests.`
63	`A benchmark driver is a webdriver test that interacts with the`
64	`application under test.`
65
66
67	`## A simple benchmark`
68
69	`Let's assume we want to measure the script execution time, as well as the render time`
70	`that it takes to fill a container element with a complex html string.`
71
72	`The application under test could look like this:`
73
74	```
75	`index.html:`
76
77	`<button id="reset" onclick="reset()">Reset</button>`
78	`<button id="fill" onclick="fill()">fill innerHTML</button>`
79	`<div id="container"></div>`
80	`<script>`
81	`var container = document.getElementById('container');`
82	`var complexHtmlString = '...'; // TODO`
83
84	`function reset() { container.innerHTML = ''; }`
85
86	`function fill() {`
87	`container.innerHTML = complexHtmlString;`
88	`}`
89	`</script>`
90	```
91
92	`A benchmark driver could look like this:`
93
94	```
95	`// A runner contains the shared configuration`
96	`// and can be shared across multiple tests.`
97	`var runner = new Runner(...);`
98
99	`driver.get('http://myserver/index.html');`
100
101	`var resetBtn = driver.findElement(By.id('reset'));`
102	`var fillBtn = driver.findElement(By.id('fill'));`
103
104	`runner.sample({`
105	`id: 'fillElement',`
106	`// Prepare is optional...`
107	`prepare: () {`
108	`resetBtn.click();`
109	`},`
110	`execute: () {`
111	`fillBtn.click();`
112	`// Note: if fillBtn would use some asynchronous code,`
113	`// we would need to wait here for its end.`
114	`}`
115	`});`
116	```
117
118	`## Measuring in the browser`
119
120	`If the application under test would like to, it can measure on its own.`
121	`E.g.`
122
123	```
124	`index.html:`
125
126	`<button id="measure" onclick="measure()">Measure document.createElement</button>`
127	`<script>`
128	`function measure() {`
129	`console.time('createElement*10000');`
130	`for (var i=0; i<100000; i++) {`
131	`document.createElement('div');`
132	`}`
133	`console.timeEnd('createElement*10000');`
134	`}`
135	`</script>`
136	```
137
138	When the `measure` button is clicked, it marks the timeline and creates 10000 elements.
139	It uses the special names `createElement*10000` to tell benchpress that the
140	`time that was measured is for 10000 calls to createElement and that benchpress should`
141	`take the average for it.`
142
143	`A test driver for this would look like this:`
144
145	````
146	`driver.get('.../index.html');`
147
148	`var measureBtn = driver.findElement(By.id('measure'));`
149	`runner.sample({`
150	`id: 'createElement test',`
151	`microMetrics: {`
152	`'createElement': 'time to create an element (ms)'`
153	`},`
154	`execute: () {`
155	`measureBtn.click();`
156	`}`
157	`});`
158	````
159
160	`When looking into the DevTools Timeline, we see a marker as well:`
161	`![Marked Timeline](marked_timeline.png)`
162
163	`# Smoothness Metrics`
164
165	`Benchpress can also measure the "smoothness" of scrolling and animations. In order to do that, the following set of metrics can be collected by benchpress:`
166
167	- `frameTime.mean`: mean frame time in ms (target: 16.6ms for 60fps)
168	- `frameTime.worst`: worst frame time in ms
169	- `frameTime.best`: best frame time in ms
170	- `frameTime.smooth`: percentage of frames that hit 60fps
171
172	To collect these metrics, you need to execute `console.time('frameCapture')` and `console.timeEnd('frameCapture')` either in your benchmark application or in you benchmark driver via webdriver. The metrics mentioned above will only be collected between those two calls and it is recommended to wrap the time/timeEnd calls as closely as possible around the action you want to evaluate to get accurate measurements.
173
174	`In addition to that, one extra binding needs to be passed to benchpress in tests that want to collect these metrics:`
175
176	`benchpress.sample(bindings: [bp.bind(bp.Options.CAPTURE_FRAMES).toValue(true)], ... )`
177
178	`# Requests Metrics`
179
180	`Benchpress can also record the number of requests sent and count the received "encoded" bytes since [window.performance.timing.navigationStart](http://www.w3.org/TR/navigation-timing/#dom-performancetiming-navigationstart):`
181
182	- `receivedData`: number of bytes received since the last navigation start
183	- `requestCount`: number of requests sent since the last navigation start
184
185	`To collect these metrics, you need the following corresponding extra bindings:`
186
187	`benchpress.sample(bindings: [`
188	`bp.bind(bp.Options.RECEIVED_DATA).toValue(true),`
189	`bp.bind(bp.Options.REQUEST_COUNT).toValue(true)`
190	`], ... )`
191
192	`# Best practices`
193
194	`* Use normalized environments`
195	`- metrics that are dependent on the performance of the execution environment must be executed on a normalized machine`
196	`- e.g. a real mobile device whose cpu frequency is set to a fixed value.`
197	`* see our [build script](https://github.com/angular/angular/blob/master/scripts/ci/android_cpu.sh)`
198	`* this requires root access, e.g. via a userdebug build of Android on a Google Nexus device`
199	`(see [here](https://source.android.com/source/building-running.html) and [here](https://source.android.com/source/building-devices.html#obtaining-proprietary-binaries))`
200	`- e.g. a calibrated machine that does not run background jobs, has a fixed cpu frequency, ...`
201
202	`* Use relative comparisons`
203	`- relative comparisons are less likely to change over time and help to interpret the results of benchmarks`
204	`- e.g. compare an example written using a ui framework against a hand coded example and track the ratio`
205
206	`* Assert post-commit for commit ranges`
207	`- running benchmarks can take some time. Running them before every commit is usually too slow.`
208	`- when a regression is detected for a commit range, use bisection to find the problematic commit`
209
210	`* Repeat benchmarks multiple times in a fresh window`
211	`- run the same benchmark multiple times in a fresh window and then take the minimal average value of each benchmark run`
212
213	`* Use force gc with care`
214	`- forcing gc can skew the script execution time and gcTime numbers,`
215	`but might be needed to get stable gc time / gc amount numbers`
216
217	`* Open a new window for every test`
218	`- browsers (e.g. chrome) might keep JIT statistics over page reloads and optimize pages differently depending on what has been loaded before`
219
220	`# Detailed overview`
221
222	`![Overview](overview.png)`
223
224	`Definitions:`
225
226	`* valid sample: a sample that represents the world that should be measured in a good way.`
227	`* complete sample: sample of all measure values collected so far`
228
229	`Components:`
230
231	`* Runner`
232	`- contains a default configuration`
233	`- creates a new injector for every sample call, via which all other components are created`
234
235	`* Sampler`
236	`- gets data from the metrics`
237	`- reports measure values immediately to the reporters`
238	`- loops until the validator is able to extract a valid sample out of the complete sample (see below).`
239	`- reports the valid sample and the complete sample to the reporters`
240
241	`* Metric`
242	`- gets measure values from the browser`
243	`- e.g. reads out performance logs, DOM values, JavaScript values`
244
245	`* Validator`
246	`- extracts a valid sample out of the complete sample of all measure values.`
247	`- e.g. wait until there are 10 samples and take them as valid sample (would include warmup time)`
248	- e.g. wait until the regression slope for the metric `scriptTime` through the last 10 measure values is >=0, i.e. the values for the `scriptTime` metric are no more decreasing
249
250	`* Reporter`
251	`- reports measure values, the valid sample and the complete sample to backends`
252	`- e.g. a reporter that prints to the console, a reporter that reports values into Google BigQuery, ...`
253
254	`* WebDriverAdapter`
255	`- abstraction over the used web driver client`
256	`- one implementation for every webdriver client`
257	`E.g. one for selenium-webdriver Node.js module, dart async webdriver, dart sync webdriver, ...`
258
259	`* WebDriverExtension`
260	`- implements additional methods that are standardized in the webdriver protocol using the WebDriverAdapter`
261	`- provides functionality like force gc, read out performance logs in a normalized format`
262	`- one implementation per browser, e.g. one for Chrome, one for mobile Safari, one for Firefox`
263
264
265
\	No newline at end of file