UNPKG

10.5 kBMarkdownView Raw
1# Benchpress
2
3Benchpress is a framework for e2e performance tests.
4See [here for an example project](https://github.com/angular/benchpress-tree).
5
6# Why?
7
8There are so called "micro benchmarks" that essentially use a stop watch in the browser to measure time
9(e.g. via `performance.now()`). This approach is limited to time, and in some cases memory
10(Chrome with special flags), as metric. It does not allow to measure:
11
12- rendering time: e.g. the time the browser spends to layout or paint elements. This can e.g. used to
13 test the performance impact of stylesheet changes.
14- garbage collection: e.g. how long the browser paused script execution, and how much memory was collected.
15 This can be used to stabilize script execution time, as garbage collection times are usually very
16 unpredictable. This data can also be used to measure and improve memory usage of applications,
17 as the garbage collection amount directly affects garbage collection time.
18- distinguish script execution time from waiting: e.g. to measure the client side only time that is spent
19 in a complex user interaction, ignoring backend calls.
20- measure fps to assert the smoothness of scrolling and animations.
21
22This kind of data is already available in the DevTools of modern browsers. However, there is no standard way to
23use those tools in an automated way to measure web app performance, especially not across platforms.
24
25Benchpress tries to fill this gap, i.e. allow to access all kinds of performance metrics in an automated way.
26
27
28# How it works
29
30Benchpress uses webdriver to read out the so called "performance log" of browsers. This contains all kinds of interesting
31data, e.g. when a script started/ended executing, gc started/ended, the browser painted something to the screen, ...
32
33As browsers are different, benchpress has plugins to normalizes these events.
34
35
36# Features
37
38* Provides a loop (so called "Sampler") that executes the benchmark multiple times
39* Automatically waits/detects until the browser is "warm"
40* Reporters provide a normalized way to store results:
41 - console reporter
42 - file reporter
43 - Google Big Query reporter (coming soon)
44* Supports micro benchmarks as well via `console.time()` / `console.timeEnd()`
45 - `console.time()` / `console.timeEnd()` mark the timeline in the DevTools, so it makes sense
46 to use them in micro benchmark to visualize and understand them, with or without benchpress.
47 - running micro benchmarks in benchpress leverages the already existing reporters,
48 the sampler and the auto warmup feature of benchpress.
49
50
51# Supported browsers
52
53* Chrome on all platforms
54* Mobile Safari (iOS)
55* Firefox (work in progress)
56
57
58# How to write a benchmark
59
60A benchmark in benchpress is made by an application under test
61and a benchmark driver. The application under test is the
62actual application consisting of html/css/js that should be tests.
63A benchmark driver is a webdriver test that interacts with the
64application under test.
65
66
67## A simple benchmark
68
69Let's assume we want to measure the script execution time, as well as the render time
70that it takes to fill a container element with a complex html string.
71
72The application under test could look like this:
73
74```
75index.html:
76
77<button id="reset" onclick="reset()">Reset</button>
78<button id="fill" onclick="fill()">fill innerHTML</button>
79<div id="container"></div>
80<script>
81 var container = document.getElementById('container');
82 var complexHtmlString = '...'; // TODO
83
84 function reset() { container.innerHTML = ''; }
85
86 function fill() {
87 container.innerHTML = complexHtmlString;
88 }
89</script>
90```
91
92A benchmark driver could look like this:
93
94```
95// A runner contains the shared configuration
96// and can be shared across multiple tests.
97var runner = new Runner(...);
98
99driver.get('http://myserver/index.html');
100
101var resetBtn = driver.findElement(By.id('reset'));
102var fillBtn = driver.findElement(By.id('fill'));
103
104runner.sample({
105 id: 'fillElement',
106 // Prepare is optional...
107 prepare: () {
108 resetBtn.click();
109 },
110 execute: () {
111 fillBtn.click();
112 // Note: if fillBtn would use some asynchronous code,
113 // we would need to wait here for its end.
114 }
115});
116```
117
118## Measuring in the browser
119
120If the application under test would like to, it can measure on its own.
121E.g.
122
123```
124index.html:
125
126<button id="measure" onclick="measure()">Measure document.createElement</button>
127<script>
128 function measure() {
129 console.time('createElement*10000');
130 for (var i=0; i<100000; i++) {
131 document.createElement('div');
132 }
133 console.timeEnd('createElement*10000');
134 }
135</script>
136```
137
138When the `measure` button is clicked, it marks the timeline and creates 10000 elements.
139It uses the special names `createElement*10000` to tell benchpress that the
140time that was measured is for 10000 calls to createElement and that benchpress should
141take the average for it.
142
143A test driver for this would look like this:
144
145````
146driver.get('.../index.html');
147
148var measureBtn = driver.findElement(By.id('measure'));
149runner.sample({
150 id: 'createElement test',
151 microMetrics: {
152 'createElement': 'time to create an element (ms)'
153 },
154 execute: () {
155 measureBtn.click();
156 }
157});
158````
159
160When looking into the DevTools Timeline, we see a marker as well:
161![Marked Timeline](marked_timeline.png)
162
163# Smoothness Metrics
164
165Benchpress can also measure the "smoothness" of scrolling and animations. In order to do that, the following set of metrics can be collected by benchpress:
166
167- `frameTime.mean`: mean frame time in ms (target: 16.6ms for 60fps)
168- `frameTime.worst`: worst frame time in ms
169- `frameTime.best`: best frame time in ms
170- `frameTime.smooth`: percentage of frames that hit 60fps
171
172To collect these metrics, you need to execute `console.time('frameCapture')` and `console.timeEnd('frameCapture')` either in your benchmark application or in you benchmark driver via webdriver. The metrics mentioned above will only be collected between those two calls and it is recommended to wrap the time/timeEnd calls as closely as possible around the action you want to evaluate to get accurate measurements.
173
174In addition to that, one extra binding needs to be passed to benchpress in tests that want to collect these metrics:
175
176 benchpress.sample(bindings: [bp.bind(bp.Options.CAPTURE_FRAMES).toValue(true)], ... )
177
178# Requests Metrics
179
180Benchpress can also record the number of requests sent and count the received "encoded" bytes since [window.performance.timing.navigationStart](http://www.w3.org/TR/navigation-timing/#dom-performancetiming-navigationstart):
181
182- `receivedData`: number of bytes received since the last navigation start
183- `requestCount`: number of requests sent since the last navigation start
184
185To collect these metrics, you need the following corresponding extra bindings:
186
187 benchpress.sample(bindings: [
188 bp.bind(bp.Options.RECEIVED_DATA).toValue(true),
189 bp.bind(bp.Options.REQUEST_COUNT).toValue(true)
190 ], ... )
191
192# Best practices
193
194* Use normalized environments
195 - metrics that are dependent on the performance of the execution environment must be executed on a normalized machine
196 - e.g. a real mobile device whose cpu frequency is set to a fixed value.
197 * see our [build script](https://github.com/angular/angular/blob/master/scripts/ci/android_cpu.sh)
198 * this requires root access, e.g. via a userdebug build of Android on a Google Nexus device
199 (see [here](https://source.android.com/source/building-running.html) and [here](https://source.android.com/source/building-devices.html#obtaining-proprietary-binaries))
200 - e.g. a calibrated machine that does not run background jobs, has a fixed cpu frequency, ...
201
202* Use relative comparisons
203 - relative comparisons are less likely to change over time and help to interpret the results of benchmarks
204 - e.g. compare an example written using a ui framework against a hand coded example and track the ratio
205
206* Assert post-commit for commit ranges
207 - running benchmarks can take some time. Running them before every commit is usually too slow.
208 - when a regression is detected for a commit range, use bisection to find the problematic commit
209
210* Repeat benchmarks multiple times in a fresh window
211 - run the same benchmark multiple times in a fresh window and then take the minimal average value of each benchmark run
212
213* Use force gc with care
214 - forcing gc can skew the script execution time and gcTime numbers,
215 but might be needed to get stable gc time / gc amount numbers
216
217* Open a new window for every test
218 - browsers (e.g. chrome) might keep JIT statistics over page reloads and optimize pages differently depending on what has been loaded before
219
220# Detailed overview
221
222![Overview](overview.png)
223
224Definitions:
225
226* valid sample: a sample that represents the world that should be measured in a good way.
227* complete sample: sample of all measure values collected so far
228
229Components:
230
231* Runner
232 - contains a default configuration
233 - creates a new injector for every sample call, via which all other components are created
234
235* Sampler
236 - gets data from the metrics
237 - reports measure values immediately to the reporters
238 - loops until the validator is able to extract a valid sample out of the complete sample (see below).
239 - reports the valid sample and the complete sample to the reporters
240
241* Metric
242 - gets measure values from the browser
243 - e.g. reads out performance logs, DOM values, JavaScript values
244
245* Validator
246 - extracts a valid sample out of the complete sample of all measure values.
247 - e.g. wait until there are 10 samples and take them as valid sample (would include warmup time)
248 - e.g. wait until the regression slope for the metric `scriptTime` through the last 10 measure values is >=0, i.e. the values for the `scriptTime` metric are no more decreasing
249
250* Reporter
251 - reports measure values, the valid sample and the complete sample to backends
252 - e.g. a reporter that prints to the console, a reporter that reports values into Google BigQuery, ...
253
254* WebDriverAdapter
255 - abstraction over the used web driver client
256 - one implementation for every webdriver client
257 E.g. one for selenium-webdriver Node.js module, dart async webdriver, dart sync webdriver, ...
258
259* WebDriverExtension
260 - implements additional methods that are standardized in the webdriver protocol using the WebDriverAdapter
261 - provides functionality like force gc, read out performance logs in a normalized format
262 - one implementation per browser, e.g. one for Chrome, one for mobile Safari, one for Firefox
263
264
265
\No newline at end of file