How Do Tech Giants Design and Implement a Distributed ID Generator | by Anh Dang | Geek Culture | Oct, 2021

Shared By



Visit The Original Post

Designing Scalable Systems

Anh Dang
Photo by Noah Näf on Unsplash

Since too many systems from small to large now require unique global identifiers, it is an important task in distributed computing that needs to be tackled immediately.

Many systems are now requiring unique global identifiers such as social numbers, bank account numbers, and so on.

In general, an ID generator can be simply implemented using a shared counter that is increased at each call. Another native solution is to generate an ID as a function of the timestamp. But those are bad solutions because there are the existing following problems.

  • Multiple independent servers can generate the same ID.

Tech giants have implemented their own solutions, it is amazing. Let’s look at it now!

A distributed ID generator must meets the following requirements.

  • They can’t be any length. Let’s assume that is is 64 bits of length.

UUID

Universal Unique Identifiers is a well-known concept that’s been used in software for years. UUIDs are 128-bit hexadecimal numbers that are globally unique.

There are four versions of UUIDs.

  • UUID1 uses MAC address and timestamp to generate effective uniqueness.

UUID has the following advantages and disadvantages:

Advantages

  • Any server can independently generate a unique ID without any coordination.

Disadvantages

  • They are large and complex to index.

MySQL

MySQL generate ID using AUTO_INCREMENT.

Advantages

  • Guarantees ordering and uniqueness.

Disadvantages

  • Not horizontally scalable since you only have one instance.

MongoDB

MongoDB uses ObjectIds as the default value of _id field of each document, which is generated while the creation of any document.

MongoDB Object Ids are 12-byte (96-bit) hexadecimal integers that begin with a random value and consist of a 4-byte epoch timestamp in seconds, a 3-byte machine identification, a 2-byte process id, and a 3-byte counter.

This is a smaller UUID than the previous 128-bit version. However, the size is more than we would typically find in a single MySQL auto-increment column (a 64-bit digit value).

Advantages

  • Each application thread creates IDs individually, reducing ID creation failure spots and contention. The IDs remain time-sortable if the first component of the ID is a timestamp.

Flickr

Filicker developed ticket servers to generate disitrbuted primary keys. They use a centtralized auto_increatement feature in a single database.

Advantages

  • It performs admirably in a sharded database and at scale.

Disadvantages

  • Because all nodes rely on this table for the next Id, there is a single point of failure.

Twitter

There’s a famous ID generator called Snowflakes created by Twitter.

Advantages

– Roughly ordered and unique.

– Can handle high throughput.

– No need for machine coordination.

– 64-bit number.

– Can horizontally scale by adding more machines.

Disadvantages

– Still not perfectly ordered. If two different machines generate two IDs, it’s unclear who was generated first.

  • Complexity to maintain since Snowflake requires machines to do the work like a UUID library.

Sony

Sonyflake is a distributed unique ID generator inspired by Twitter’s Snowflake.

As a result, Sonyflake has the following advantages and disadvantages:

  • The lifetime (174 years) is longer than that of Snowflake (69 years)

However, if you want more generation rate in a single host, you can easily run multiple Sonyflake ID generators concurrently using goroutines.

Since there are some disadvantages of those above implementations by tech giants, you can write a custom ID to meet your specific requirements.

Easy, right?

Leave a Reply

Your email address will not be published. Required fields are marked *