1 | # Technical implementation of text / document selections and highlights
|
2 |
|
3 | ## Selections
|
4 |
|
5 | ### DOM selection
|
6 |
|
7 | A good starting point is to listen for user "selection" events in the DOM tree, such as `selectionstart`:
|
8 |
|
9 | ```javascript
|
10 | win.document.addEventListener("selectionstart", (evt) => {
|
11 | // ...
|
12 | });
|
13 | ```
|
14 |
|
15 | Note: there are instances where the `selectionstart` event is not raised. Not sure why.
|
16 |
|
17 | Note: the `selectionchange` event may be problematic in cases where the DOM selection API is used programmatically as a post-processing step, in order to normalize multiple selections to a single range, as this can potentially cause duplicate events and infinite loops. Code example:
|
18 |
|
19 | ```javascript
|
20 | win.document.addEventListener("selectionchange", (evt) => {
|
21 | const selection = window.getSelection();
|
22 | if (selection) {
|
23 | const range = NORMALIZE_TO_SINGLE_RANGE(...);
|
24 | selection.removeAllRanges(); // => triggers selectionchange again!
|
25 | selection.addRange(range); // => triggers selectionchange again!
|
26 | }
|
27 | });
|
28 | ```
|
29 |
|
30 | As shown above, `selection.removeAllRanges()` can be used to clear existing user selections in the DOM. This may be useful in cases where the document viewport has shifted pass visible user selections (this is particularly relevant in "structured" paginated mode, but this logic applies to a "looser" scroll view as well). From a UX perspective, such hidden selections can be confusing, so the application may decide to void selections that disappear out of view.
|
31 |
|
32 | Note that a selection can exist in a "collapsed" state, effectively a "cursor" with no actual content (in which case this may need to be ignored ... it depends on the consumer logic). Code example:
|
33 |
|
34 | ```javascript
|
35 | const selection = window.getSelection();
|
36 | if (selection) {
|
37 | if (selection.isCollapsed) {
|
38 | return;
|
39 | }
|
40 | }
|
41 | ```
|
42 |
|
43 | Getting the raw text from a DOM selection is easy, but it might be necessary to cleanup the text (e.g. whitespaces), and to filter-out selections that are deemed "empty". Example:
|
44 |
|
45 | ```javascript
|
46 | const selection = window.getSelection();
|
47 | if (selection) {
|
48 | if (selection.isCollapsed) {
|
49 | return;
|
50 | }
|
51 | const rawText = selection.toString();
|
52 | const cleanText = rawText.trim().replace(/\n/g, " ").replace(/\s\s+/g, " ");
|
53 | if (cleanText.length === 0) {
|
54 | return;
|
55 | }
|
56 | }
|
57 | ```
|
58 |
|
59 | DOM selections contain ranges, and should have `anchorNode` (+ `anchorOffset`) and `focusNode` (+ `focusOffset`):
|
60 |
|
61 | ```javascript
|
62 | const selection = window.getSelection();
|
63 | if (selection) {
|
64 | if (!selection.anchorNode || !selection.focusNode) {
|
65 | return;
|
66 | }
|
67 | }
|
68 | ```
|
69 |
|
70 | DOM selections can contain a single range, and multiple ranges can be normalized (ensure document ordering):
|
71 |
|
72 | ```javascript
|
73 | const selection = window.getSelection();
|
74 | if (selection) {
|
75 | if (!selection.anchorNode || !selection.focusNode) {
|
76 | return;
|
77 | }
|
78 |
|
79 | const range = selection.rangeCount === 1 ? selection.getRangeAt(0) :
|
80 | createOrderedRange(selection.anchorNode, selection.anchorOffset, selection.focusNode, selection.focusOffset);
|
81 | if (!range || range.collapsed) {
|
82 | return;
|
83 | }
|
84 | }
|
85 | ```
|
86 |
|
87 | There are multiple ways to ensure the order of selection ranges, here is an example `createOrderedRange()` function:
|
88 |
|
89 | ```javascript
|
90 | function createOrderedRange(startNode, startOffset, endNode, endOffset) {
|
91 |
|
92 | const range = new Range(); // document.createRange()
|
93 | range.setStart(startNode, startOffset);
|
94 | range.setEnd(endNode, endOffset);
|
95 | if (!range.collapsed) {
|
96 | return range;
|
97 | }
|
98 |
|
99 | const rangeReverse = new Range(); // document.createRange()
|
100 | rangeReverse.setStart(endNode, endOffset);
|
101 | rangeReverse.setEnd(startNode, startOffset);
|
102 | if (!rangeReverse.collapsed) {
|
103 | return range;
|
104 | }
|
105 |
|
106 | return undefined;
|
107 | }
|
108 | ```
|
109 |
|
110 | At that point, we have a DOM range object. We now want to serialize it into a JSON data structure that can be used for persistent storage.
|
111 |
|
112 | ### Range serialization
|
113 |
|
114 | The `convertRange()` function:
|
115 |
|
116 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/selection.ts#L229-L373
|
117 |
|
118 | ...returns a `IRangeInfo` object which is a direct "translation" of the DOM range (unlike CFI which has its own indexing rules and representation conventions):
|
119 |
|
120 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/common/selection.ts#L13-L40
|
121 |
|
122 | In a nutshell: CSS Selectors are used to reliably encode references to DOM elements in a web-friendly manner (i.e. not CFI). In the case of DOM text nodes, the direct parent is referenced, and the child offset is stored (zero-based integer index).
|
123 |
|
124 | A CFI reference is also created for good measure, but this is not actually critical to the inner workings of the selection/highlight mechanisms.
|
125 |
|
126 | Note that the `convertRange()` function takes two additional parameters (external functions) which are used to encode CSS selectors and CFI representations.
|
127 |
|
128 | The CFI "generator" implementation is currently simplistic (elements only). Code excerpt (blacklist handling removed, for brevity):
|
129 |
|
130 | ```javascript
|
131 | export const computeCFI = (node) => {
|
132 |
|
133 | if (node.nodeType !== Node.ELEMENT_NODE) {
|
134 | return undefined;
|
135 | }
|
136 |
|
137 | let cfi = "";
|
138 |
|
139 | let currentElement = node;
|
140 | while (currentElement.parentNode && currentElement.parentNode.nodeType === Node.ELEMENT_NODE) {
|
141 | const currentElementParentChildren = currentElement.parentNode.children;
|
142 | let currentElementIndex = -1;
|
143 | for (let i = 0; i < currentElementParentChildren.length; i++) {
|
144 | if (currentElement === currentElementParentChildren[i]) {
|
145 | currentElementIndex = i;
|
146 | break;
|
147 | }
|
148 | }
|
149 | if (currentElementIndex >= 0) {
|
150 | const cfiIndex = (currentElementIndex + 1) * 2;
|
151 | cfi = cfiIndex +
|
152 | (currentElement.id ? ("[" + currentElement.id + "]") : "") +
|
153 | (cfi.length ? ("/" + cfi) : "");
|
154 | }
|
155 | currentElement = currentElement.parentNode;
|
156 | }
|
157 |
|
158 | return "/" + cfi;
|
159 | };
|
160 | ```
|
161 |
|
162 | ...however, there is a prototype (low development priority) CFI generator for character-level CFI range:
|
163 |
|
164 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/selection.ts#L283-L361
|
165 |
|
166 | The pseudo-canonical unique CSS Selectors are generated using a TypeScript port of an external library called "finder":
|
167 |
|
168 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/common/cssselector2.ts#L8
|
169 |
|
170 | The original CSS Selectors algorithm was borrowed from the Chromium code, but the "finder" lib turned out to be a better choice (uniqueness is VERY important, along with the blacklisting capabilities):
|
171 |
|
172 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/common/cssselector.ts#L9
|
173 |
|
174 | Naturally, range serialization must be bidirectional. Hence the `convertRangeInfo()` function which performs the reverse transformation of `convertRange()`:
|
175 |
|
176 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/selection.ts#L375-L416
|
177 |
|
178 | As you can see, very straight-forward reliable conversion, no CFI edge-case juggling.
|
179 |
|
180 | ## Highlights
|
181 |
|
182 | ### Client rectangles (aggregated atomic bounding boxes)
|
183 |
|
184 | Once a DOM range is obtained (either directly from a user selection, or from a deserialized range object), the "client rectangles" (bounding boxes) can be normalized to prevent overlap (which would otherwise result in rendering artefacts due to combined opacity factors), and to minimize their number (as this would otherwise impact performance).
|
185 |
|
186 | The `getClientRectsNoOverlap()` implements the necessary logic:
|
187 |
|
188 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/common/rect-utils.ts#L35-L39
|
189 |
|
190 | The differences are very obvious between `range.getClientRects()` and `getClientRectsNoOverlap(range)`. The former often generates many duplicates, overlaps, unnecessarily fragmented boxes, etc.
|
191 |
|
192 | ### Rendering, CSS coordinates in paginated and scroll views
|
193 |
|
194 | The `createHighlightDom()` function implements a particular strategy for encapsulating rendered highlights inside a hierarchy of DOM elements, including individual client rectangles that make the entire shape, as well as surrounding bounding box (single rectangular shape):
|
195 |
|
196 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/highlight.ts#L353-L487
|
197 |
|
198 | Note how `pointer-events` is set to `none` on DOM elements used to render highlights, so that neither bounding boxes nor aggregates client rectangles interfere with document-level user interaction (publication HTML). Yet, rendered highlights must react to pointing device / mouse hover and click. This is done using event delegation on the document's body:
|
199 |
|
200 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/highlight.ts#L242-L252
|
201 |
|
202 | ...see the `processMouseEvent()` function:
|
203 |
|
204 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/highlight.ts#L113-L233
|
205 |
|
206 | Also note how CSS `position` must be `relative` on the HTML `body` element. This is critical for the coordinate system to work:
|
207 |
|
208 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/highlight.ts#L383
|
209 |
|
210 | Furthermore, `position` must be `fixed` for CSS columns, `absolute` for reflow scroll, and fixed layout:
|
211 |
|
212 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/highlight.ts#L439
|
213 |
|
214 | In Electron/Chromium, depending on whether the document is rendered in scroll or column-paginated mode, there is an offset to take into account when computing coordinates for rendering:
|
215 |
|
216 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/highlight.ts#L393-L406
|
217 |
|
218 | There is also a scaling factor for fixed-layout documents:
|
219 |
|
220 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/highlight.ts#L408
|
221 |
|
222 | Note that the `DEBUG_VISUALS` condition is checked in the `highlight.ts` code to determine when to render special styles for debugging the highlights. In production mode, this code is not used.
|
223 |
|
224 | Finally, notice how highlights are given unique identifiers so that they can be managed by the navigator instance:
|
225 |
|
226 | https://github.com/readium/r2-navigator-js/blob/59c593511502eb460b8252f807a6e11dfebb952e/src/electron/renderer/webview/highlight.ts#L335-L336
|
227 |
|
228 | This way, renderered highlights can be destroyed when the document formatting changes (e.g. font size), which triggers a complete text reflow and therfore requires recreating the character-level highlights using updated coordinates (newly-generated client rectangles).
|