Designing Scalable Systems
Since too many systems from small to large now require unique global identifiers, it is an important task in distributed computing that needs to be tackled immediately.
Many systems are now requiring unique global identifiers such as social numbers, bank account numbers, and so on.
In general, an ID generator can be simply implemented using a shared counter that is increased at each call. Another native solution is to generate an ID as a function of the timestamp. But those are bad solutions because there are the existing following problems.
- Multiple independent servers can generate the same ID.
- The same ID will be generated for two consecutive requests.
Tech giants have implemented their own solutions, it is amazing. Let’s look at it now!
A distributed ID generator must meets the following requirements.
- They can’t be any length. Let’s assume that is is 64 bits of length.
- The date is used to increase the ID.
- An ID generator should sequentially generate unique IDs across the cluster.
- Ability to generate over 10,000 unique IDs per second.
Universal Unique Identifiers is a well-known concept that’s been used in software for years. UUIDs are 128-bit hexadecimal numbers that are globally unique.
There are four versions of UUIDs.
- UUID1 uses MAC address and timestamp to generate effective uniqueness.
- UUID3 and UUID 5 uses cryptographic hashing and application-provided text strings to generate UUID. (UUID 3 uses MD5 hashing, and UUID 5 uses SHA-1 hashing).
- UUID4 uses pseudo-random number generators to generate UUID.
UUID has the following advantages and disadvantages:
- Any server can independently generate a unique ID without any coordination.
- Effectively unique with the rare chance of duplicates.
- They are large and complex to index.
- Generated IDs are not ordered.
- 128-bit may be too big for an ID for some use cases.
MySQL generate ID using AUTO_INCREMENT.
- Guarantees ordering and uniqueness.
- Simplicity to use a database to generate.
- Not horizontally scalable since you only have one instance.
- Not fault-tolerant since you only have one instance to generate ID.
MongoDB uses ObjectIds as the default value of _id field of each document, which is generated while the creation of any document.
MongoDB Object Ids are 12-byte (96-bit) hexadecimal integers that begin with a random value and consist of a 4-byte epoch timestamp in seconds, a 3-byte machine identification, a 2-byte process id, and a 3-byte counter.
This is a smaller UUID than the previous 128-bit version. However, the size is more than we would typically find in a single MySQL auto-increment column (a 64-bit digit value).
- Each application thread creates IDs individually, reducing ID creation failure spots and contention. The IDs remain time-sortable if the first component of the ID is a timestamp.
- To establish adequate uniqueness assurances, additional storage space (96 bits or more) is usually required.
- Some UUID types have no natural order and are random.
Filicker developed ticket servers to generate disitrbuted primary keys. They use a centtralized auto_increatement feature in a single database.
- It performs admirably in a sharded database and at scale.
- Short length, good indexing, and does not degrade query speed while dealing with massive datasets.
- Because all nodes rely on this table for the next Id, there is a single point of failure.
- Ticketing servers can become a write bottleneck at scale because they may not be adequate when the number of writes per second is enormous, causing the ticketing server to overload and decrease performance.
- Extra computers are required for ticketing servers.
- There is a single point of failure if we use a single database, and we can’t ensure that ids will be sortable over time if we use many databases.
There’s a famous ID generator called Snowflakes created by Twitter.
– Roughly ordered and unique.
– Can handle high throughput.
– No need for machine coordination.
– 64-bit number.
– Can horizontally scale by adding more machines.
– Still not perfectly ordered. If two different machines generate two IDs, it’s unclear who was generated first.
- Complexity to maintain since Snowflake requires machines to do the work like a UUID library.
Sonyflake is a distributed unique ID generator inspired by Twitter’s Snowflake.
As a result, Sonyflake has the following advantages and disadvantages:
- The lifetime (174 years) is longer than that of Snowflake (69 years)
- It can work in more distributed machines (2¹⁶) than Snowflake (2¹⁰)
- It can generate 2⁸ IDs per 10 msec at most in a single machine/thread (slower than Snowflake)
However, if you want more generation rate in a single host, you can easily run multiple Sonyflake ID generators concurrently using goroutines.
Since there are some disadvantages of those above implementations by tech giants, you can write a custom ID to meet your specific requirements.