System Design I

§System Design I

eg. Design a url shortening service, like bit.ly

§Process
  • Step 1: [Scope the problem]

    • Use Cases

      1. Shortening: take a url => return a shorter url
      2. Redirection: take a short url => redirect to the original url
      3. Custom url: allow users to pick their available shortened URL
      4. High availability of the system

      Others:

      1. Analytics: to look at usage statistics
      2. Automatic link expiration: automatically expire link or stay alive forever
      3. Manual link removal: the user wants to remove the link
      4. UI vs API: design a web service or a website, or a URL shortening API
    • Constraints

      • the amount of traffic & data: how many the requests a mouth should handle and how many new URLs per month should the site handle => per second
      • approcimation eg. active users: Facebook, 1.3billion; Twitter, 650 million, new tweets made per day, 500 million
      1. => 1.5BN new tweets => All shortened URL per month 1.5BN => Site below the top 3: shorten 300M per monty => We: 100M new urls per month

      2. assume average lifetime: 1-2 weeks ~ 10 days

        & assume 1 click per day,

        & 20% got much more traffic than the rest 80% => 100 mln * 10 days * 1 click per day = 1 BN requests per month => 400+ requests per second

      3. 10% from shortening, 90% from redirection => 40 shortens, 360 redirections

      4. Total urls: 5 years X 12 months X 100M = 6 BN url

      5. 500 bytes per URL => total storage of urls over 5 yrs: 6BN X 500 bytes = 3TB

      6. log(62,6BN) => 6 bytes per hash => hases: 6BN X 6 bytes = 36GB

      7. New data written per second: 40 X (500+6) = 12K

      8. Data read per second: 360 X (500+6) = 180K

      => pic

  • Step 2: [Sketch up an] Abstract design

    sketch the important components and the connections between them

    1. Aplication Design
      1. Shortening service: generate on your hash; check if it’s in the datastore, not -> store a new mapping, or -> generate new hash …
      2. Redirection service:
      3. Custom URL: take a hash from user, check if it’s a shorten URL word, check if it’s in the datastore, not -> store, or -> return a message
    2. Data storage layer (keep track of the hash => url mappings)
      1. Acts like a hash table: stores new mappings and retrieve a value given a key

    hashed_url = convert_to_base62(md5(original_url + random_salt))[:6]

  • Step 3: [Think about the] bottlenecks

    eg. Perhaps your system needs a load balancer and many machines behind it to handle the user requests. Or maybe the data is so huge that you need to distribute your database on multiple machines. What are some of the downsides that occur from doing that? Is the database too slow and does it need some in-memory caching?

    1. traffic: not very challenging since it’s lightweight; think how we quickly look up URL
    2. lots of data
  • Step 4: [Address these bottlenecks by using the fundamentals principles of scalable system design] Scaling your abstract design