UNPKG

11.3 kBMarkdownView Raw
1# CUID
2
3Collision-resistant ids optimized for horizontal scaling and sequential lookup performance.
4
5Currently available for Node, browsers, Ruby and .Net (see ports below -- more ports are welcome).
6
7`cuid()` returns a short random string with some collision-busting measures. Safe to use as HTML element ID's, and unique server-side record lookups.
8
9## Example
10
11Node style. For the browser stand-alone version, just leave off the require line or use component.io.
12
13```js
14var cuid = require('cuid');
15console.log( cuid() );
16
17// ch72gsb320000udocl363eofy
18```
19
20## Installing
21
22```
23$ npm install --save cuid
24```
25
26Install with [component(1)](http://component.io):
27
28```
29$ component install dilvie/cuid
30```
31
32
33### Broken down
34
35** c - h72gsb32 - 0000 - udoc - l363eofy **
36
37The groups, in order, are:
38
39* 'c' - identifies this as a cuid, and allows you to use it in html entity ids. The fixed value helps keep the ids sequential.
40* Timestamp
41* Counter - a single process might generate the same random string. The weaker the pseudo-random source, the higher the probability. That problem gets worse as processors get faster. The counter will roll over if the value gets too big.
42* Client fingerprint
43* Pseudo random (`Math.random()` in JavaScript)
44
45## Fingerprints
46
47**In browsers**, the first chars are obtained from the user agent string (which is fairly unique), and the supported mimeTypes (which is also fairly unique, except for IE, which always returns 0).
48That string is concatenated with a count of variables in the global scope (which is also fairly unique), and the result is trimmed to 4 chars.
49
50**In node**, the first two chars are extracted from the process.pid. The next two chars are extracted from the hostname.
51
52
53## Motivation
54
55Modern web applications have different requirements than applications from just a few years ago. Our modern unique identifiers have a stricter list of requirements that cannot all be satisfied by any existing version of the GUID/UUID specifications:
56
57### Horizontal scalability
58
59Today's applications don't run on any single machine.
60
61Applications might need to support online / offline capability, which means we need a way for clients on different hosts to generate ids that won't collide with ids generated by other hosts -- even if they're not connected to the network.
62
63Most pseudo-random algorithms use time in ms as a random seed. Random IDs lack sufficient entropy when running in separate processes (such as cloned virtual machines or client browsers) to guarantee against collisions. Application developers report v4 UUID collisions causing problems in their applications when the ID generation is distributed between lots of machines such that lots of IDs are generated in the same millisecond.
64
65Each new client exponentially increases the chance of collision in the same way that each new character in a random string exponentially reduces the chance of collision. Successful apps scale at hundreds or thousands of new clients per day, so fighting the lack of entropy by adding random characters is a losing strategy.
66
67Because of the nature of this problem, it's possible to build an app from the ground up and scale it to a million users before this problem rears its head. By the time you notice the problem (when your peak hour use requires dozens of ids to be created per ms), if your db doesn't have unique constraints on the id because you thought your guids were safe, you're in a world of hurt. Your users start to see data that doesn't belong to them because the db just returns the first ID match it finds.
68
69Alternatively, you've played it safe and you only let your database create ids. Writes only happen on a master database, and load is spread out over read replicas. But with this kind of strain, you have to start scaling your database writes horizontally, too, and suddenly your application starts to crawl (if the db is smart enough to guarantee unique ids between write hosts), or you start getting id collisions between different db hosts, so your write hosts don't agree about which ids represent which data.
70
71
72### Performance
73
74Because entities might need to be generated in high-performance loops, id generation should be fast. That means no waiting around for asynchronous entropy pool requests, or cross-process/cross-network communication. Performance slows to impracticality in the browser. All sources of entropy need to be fast enough for synchronous access.
75
76Even worse, when the database is the only guarantee that ids are unique, that means that clients are forced to send incomplete records to the database, and wait for a network round-trip before they can use the ids in any algorithm. Forget about fast client performance. It simply isn't possible.
77
78That situation has caused some clients to create ids that are only usable in a single client session (such as an in-memory counter). When the database returns the real id, the client has to do some juggling logic to swap out the id being used.
79
80If client side ID generation were stronger, the chances of collision would be much smaller, and the client could send complete records to the db for insertion without waiting for a full round-trip request to finish before using the ID.
81
82
83#### Sequential IDs
84
85[Sequential ids can enhance performance](http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid) for database transactions for a variety of reasons. Ids should be suitable for use as high-performance database primary keys. Pure pseudo-random variants don't meet this requirement.
86
87
88#### Tiny
89
90Somewhat related to performance, an algorithm to generate an ID should require a tiny implementation. This is especially important for thick-client JavaScript applications.
91
92
93### Security
94
95Client-visible ids often need to have sufficient random data that it becomes practically impossible to try to guess valid IDs based on an existing, known id. That makes simple sequential ids unusable in the context of client-side generated database keys.
96
97
98#### Portability
99
100Most stronger forms of the UUID / GUID algorithms require access to OS services that are not available in browsers, meaning that they are impossible to implement as specified.
101
102
103# Features of cuids
104
105## Scaleable
106
107Because of the timestamp and the counter, cuid is really good at generating unique IDs on one machine.
108
109Because of the fingerprints, cuid is also good at preventing collisions between multiple clients.
110
111
112## Fast
113
114Because cuids can be safely generated synchronously, you can generate a lot of them quickly. Since it's unlikely that you'll get a collision, you don't have to wait for a round trip to the database just to insert a complete record in your database.
115
116Because cuids are sequential, database primary key performance gets a significant boost.
117
118Weighing in at less than 1k minified and compressed, the cuid source should be suitable for even the lightest-weight mobile clients, and will not have a significant impact on the download time of your app, particularly if you follow best practices and concatenate it with the rest of your code in order to avoid the latency hit of an extra file request.
119
120## Secure
121
122Cuids contain enough random data and moving parts as to make guessing another id based on an existing id practically impossible. It also opens up a way to detect for abuse attempts -- if a client requests large blocks of ids that don't exist, there's a good chance that the client is malicious, or trying to get at data that doesn't belong to it.
123
124
125## Portable
126
127The only part of a cuid that might be hard to replicate between different clients is the fingerprint. It's easy to override the fingerprint method in order to port to different clients. Cuid already works standalone in browsers, as a node module, or with applitude, so you can use cuid where you need to use it.
128
129The algorithm is also easy to reproduce in other languages. You are encouraged to port it to whatever language you see fit.
130
131### Ports:
132
133* JavaScript (Browsers, Browsers + [Applitude](https://github.com/dilvie/applitude), Node)
134* [CUID for Ruby](https://github.com/iyshannon/cuid) - [Ian Shannon](https://github.com/iyshannon)
135* [CUID for .Net](https://github.com/moonpyk/ncuid ) - [Clément Bourgeois](https://github.com/moonpyk)
136
137
138# Short URLs
139
140Need a smaller ID? `cuid.slug()` is for you. Weighing in at only 8 characters, `.slug()` is a great solution for short urls.
141
142Just be aware:
143
144* They're less likely to be sequential. Stick to full cuids for database lookups, if possible.
145
146* They have less random data, less room for the counter, and less room for the fingerprint, which means that all of them are more likely to collide or be guessed, especially as CPU speeds increase.
147
148Don't use them if guessing an existing ID would expose confidential information to malicious users. For example, if you're providing a service like Google Drive or DropBox, which hosts user's private files, I would prefer `cuid()` over `.slug()` for private collaboration URLs, for security reasons.
149
150
151# Questions
152
153### Is this a replacement for GUID / UUID?
154
155No. Cuid is great for the use case it was designed for -- to generate ids for applications which need to be scaleable past tens or hundreds of new entities per second across multiple id-generating hosts. In other words, if you're building a web or mobile app and want the assurance that your choice of id standards isn't going to slow you down, cuid is for you.
156
157However, if you need to obscure the order of id generation, or if it's potentially problematic to know the precise time that an id was generated, you'll want to go with something different.
158
159Cuids should not be considered cryptographically secure (but neither should most guid algorithms. Make sure yours is using a crypto library before you rely on it).
160
161
162### Why don't you use sha1, md5, etc?
163
164A sha1 implementation in JavaScript is about 300 lines by itself, uncompressed, and its use would provide little benefit. For contrast, the cuid source code weighs in at less than 100 lines of code, uncompressed. It also comes at considerable performance cost. Md5 has similar issues.
165
166
167### Why are there no dashes?
168
169Almost all web-technology identifiers allow numbers and letters (though some require you to begin with a letter -- hence the 'c' at the beginning of a cuid). However, dashes are not allowed in some identifier names. Removing dashes between groups allows the ids to be more portable. Also, identifier groupings should not be relied on in your application. Removing them should discourage application developers from trying to extract data from a cuid.
170
171The cuid specification should not be considered an API contract. Code that relies on the groupings as laid out here should be considered brittle and not be used in production.
172
173
174### [Submit a Question or Comment](https://github.com/dilvie/cuid/issues/new?title=Question)
175
176
177### Credit
178
179Created by Eric Elliott, Author, ["Programming JavaScript Applications (O'Reilly)"](http://www.tkqlhce.com/click-7037282-11260198?url=http%3A%2F%2Fshop.oreilly.com%2Fproduct%2F0636920024231.do%3Fcmp%3Daf-code-book-product_cj_9781449338220_%7BPID%7D&cjsku=0636920024231)
180
181Thanks to [Tout](http://tout.com/) for support and production testing.