X4_Design Tiny URL | Skyline Problem

§Design A Tiny URL system

§System Design
  • is always about trade-off && weighing pros and cons && constraints
  • is also about sacrificing what is not needed && optimizing what is most needed
§Problem
  • Ask requirement first: uique URL <-> unique alias

  • Ask Read or Write heavy: eg. Twitter: read heavy

  • Constraints

    1. Amount of data to store
    2. Amount of traffic — QPS or the query-per-second rate(read)
    3. How many new URLs per second(write)
    4. Peak time / Hot spot(->Region)
    5. Projection (10X times) Scalability
  • Numbers, eg.

    1. Daily active users: 1,000,000
    2. Insert: Per day: 1,000,000 * 1%(function usage) * 10(function frequency) = 100,000
    3. Loop up: Per day: 1,000,000 * 100%(function usage) * 3(function frequency) = 3,000,000
  • URLs: (lower case alphabet 26+upper case alphabet 26+ digit10)^length(=6) ~= 56.8 billions

  • Design 接口: REST APIs

    • Status Code: 1xx Informational responses; 2XX Success, 200 success, 201 created; 3XX Redirection, 301 moved permanently; 4xx Client errors; 5xx Server errors
    • Google URL Shortener, json
  • 存储: SQL or NoSQL?

    1. 是否需要丰富的 SQL Query? NoSQL的SQL Query丰富度不如SQL,不过目前差距正在缩小,比如 MongoDB。此题不需要,该因素可以忽略。
    2. 是否追求效率?大多数Web Framework与SQL数据库兼容得很好(自带ORM(Object,-> SQL dialect),意味着可以少些很多代码。此题没太多业务逻辑,该因素可以忽略。
    3. 是否需要支持Transaction? NoSQL不支持Transaction,但是可以在业务层做处理保证。
    4. 对QPS要求高不高?
      NoSQL性能高,MondoDB可以到10k级别,MySQL只能在k这个级别。
    5. 对Scalability的要求有多高?sharding,replica。
    • Number

      Average size of longURL = 100 bytes

      Average size of shortURL = 4 bytes(int)

      State = 4 byte

      Daily new URL = 3,000,000 * 108 = 324MB => Yearly new URL = 324*365 = 118GB

    • NoSQL,

      1. Randomly generate 6 characters containing [A-Z, a-z, 0-9]
      2. Use consistent hashing for partition: Partition By Key Or Consistent Hashing; Vitural nodes; Replication for high avaialbility
      3. Use NoSQL’s distributed key-value store
      4. Eventually consistent: quorum; others return checksum(eg. SHA1), if it is same with data -> no need to lookup
      5. Short URL already exists? What to do with conflicts? Retry
      • Write Optimization 1
        • Use centralized “Counter” to generate unique URL
        • similar to auto-increment id in SQL
        • Uniqueness guaranteed
      • Write Optimization 2
        • Round robin to different URL generators, each governing a dedicate key space
        • Partition key spaces such as “AXXXXX”, “BXXXXX” and etc.
        • Similar to sharding - avoid conflicts
    • SQL, id(primary key), long_url(unique index), created_at

      • Map 6 characters containing [A-Z, a-z, 0-9] to an integer
      • Base Conversion problem: base 62
      • More of single machine solution
      • Schema?
  • Other Factors to Consider

    1. High availability

    2. How to cache? Preload / Replace

      First use sharding(分片,根据region) to partition different regions

      You may have China DB and US DB for example

      Use redis / memcached LRU cache for repeated query

    3. Reuse popular short urls

      Store those popular urls based on clicks. Use the same short url

    4. Custom URLs

      (1) Use another table for custom URLs (SQL), since SQL use id rather than short_url

        • Need to look up two tables now for read/write
        • Atomicity not guaranteed in NoSQL

      (2) Use the same table with additional boolean column (NoSQL), short_url, long_url, created_at, custom(true/false)

        • Same look up, need to query the boolean column
        • Atomicity still guaranteed
        • Not working well for SQL solution, which is auto id based
    5. Analytics (usage of URLs), eg. referrers, browsers, countires, patforms

    6. Automatic link expiration, add last_read_time

    7. Manual link removal

      in NoSQL, such in disk, Append only operation is faster, add alive(true/false)

    8. UI v.s. API

    9. User’s historical short urls, eg. with account system

  • SQL index: use B+ tree index primary key, everytime use binary tree to find

    • When primary key is declared, DBMS automatically creates unique index

    • Often need additional indexes

    • Using CREATE INDEX command, SQL indexes can be created on basis of any selected attribute

    • Composite index

      • Index based on two or more attributes
      • Often used to prevent data duplication

§218. The Skyline Problem

  • reference

  • OOD

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    class Building implements Comparable<Building> {
    int left;
    int right;
    int height;
    public Building(int left, int right, int height) {
    this.left = left;
    this.right = right;
    this.height = height;
    }
    @Override
    public int compareTo(Building o) {
    if (this.height != o.height) {
    return Integer.compare(this.height, o.height);
    }
    if (this.left != o.left) {
    return Integer.compare(this.left, o.left);
    }
    return Integer.compare(this.right, o.right);
    }
    @Override
    public String toString() {
    return "Building [left=" + left + ", right=" + right + ", height=" + height + "]";
    }
    }
  • Other test case

    • what if left meets right (B0[1 5 10], B1[5 9 10], B2[9 12 20], B3[12 1510]) -> enter first, otherwise it will add to result (wrong)
    • What if start at same and end at same (B0[1 17 3], B1[1 3 8], B2[1 4 2], B3[13 17 8], B4[14 17 2]) -> when compare: if left1=left2, first enter larger height, if right1=right2, first leave smaller height
  • code

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    	    /**
    * The Skyline Problem
    * city's skyline is the outer contour of the silhouette formed by all the buildings in that city when viewed from a distance. Now suppose you are given the locations and height of all the buildings as shown on a cityscape photo (Figure A), write a program to output the skyline formed by these buildings collectively (Figure B).
    * The geometric information of each building is represented by a triplet of integers [Li, Ri, Hi], where Li and Ri are the x coordinates of the left and right edge of the ith building, respectively, and Hi is its height. It is guaranteed that 0 鈮� Li, Ri 鈮� INT_MAX, 0 < Hi 鈮� INT_MAX, and Ri - Li > 0. You may assume all buildings are perfect rectangles grounded on an absolutely flat surface at height 0.
    For instance, the dimensions of all buildings in Figure A are recorded as: [ [2 9 10], [3 7 15], [5 12 12], [15 20 10], [19 24 8] ] .
    The output is a list of "key points" (red dots in Figure B) in the format of [ [x1,y1], [x2, y2], [x3, y3], ... ] that uniquely defines a skyline. A key point is the left endpoint of a horizontal line segment. Note that the last key point, where the rightmost building ends, is merely used to mark the termination of the skyline, and always has zero height. Also, the ground in between any two adjacent buildings should be considered part of the skyline contour.
    For instance, the skyline in Figure B should be represented as:[ [2 10], [3 15], [7 12], [12 0], [15 10], [20 8], [24, 0] ].
    Notes:
    The number of buildings in any input list is guaranteed to be in the range [0, 10000].
    The input list is already sorted in ascending order by the left x position Li.
    The output list must be sorted by the x position.
    There must be no consecutive horizontal lines of equal height in the output skyline. For instance, [...[2 3], [4 5], [7 5], [11 5], [12 7]...] is not acceptable; the three lines of height 5 should be merged into one in the final output as such: [...[2 3], [4 5], [12 7], ...]
    https://briangordon.github.io/2014/08/the-skyline-problem.html
    */

    public List<int[]> getSkyline(int[][] buildings) {
    if (buildings.length == 0) {
    return Collections.emptyList();
    }
    Building[] lefts = new Building[buildings.length];
    Building[] rights = new Building[buildings.length];
    for (int i = 0; i < buildings.length; i++) {
    int[] b = buildings[i];
    Building building = new Building(b[0], b[1], b[2]);
    rights[i] = lefts[i] = building;
    }
    //sort according to left
    Arrays.sort(lefts, (a, b) -> {
    if (a.left != b.left) {
    return Integer.compare(a.left, b.left);
    }
    return Integer.compare(b.height, a.height);
    });
    //sort according to right
    Arrays.sort(rights, (a, b) -> {
    if (a.right != b.right) {
    return Integer.compare(a.right, b.right);
    }
    return Integer.compare(a.height, b.height);
    });

    System.out.println(Arrays.toString(lefts));
    System.out.println(Arrays.toString(rights));
    List<int[]> result = new ArrayList<>();
    TreeSet<Building> set = new TreeSet<>(); // will use compareTo to compare
    int leftsIndex = 1;
    int rightsIndex = 0;
    int h = lefts[0].height; // max height of buildings in set
    int top = h; // last seen h
    set.add(lefts[0]);
    result.add(new int[] { lefts[0].left, top });
    int index = 0;
    while (rightsIndex < buildings.length) {
    if (leftsIndex == buildings.length || rights[rightsIndex].right < lefts[leftsIndex].left) {
    index = rights[rightsIndex].right;
    System.out.println("Remove " + rights[rightsIndex]);
    set.remove(rights[rightsIndex++]);
    } else {
    index = lefts[leftsIndex].left;
    System.out.println("Add " + lefts[leftsIndex]);
    set.add(lefts[leftsIndex++]);
    }
    System.out.println(set);
    h = set.isEmpty() ? 0 : set.last().height;
    if (h != top) {
    top = h;
    result.add(new int[] { index, top });
    }
    }
    return result;
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    //
    /*
    sol1: h[mostL...mostR]
    sol2:
    maintain a PQ <height,end time>
    cur = 0,
    nexT = PQ top end time;
    if cur.begT > next // it ends before others coming
    pop all the buildings in Pq whose enttime <= nextT;
    else{
    push all the buildings at the same time,cur++
    nexT = cur.beg
    }
    //nexT = PQ's heightest building's end time
    curH = PQ's heightest or 0;
    if != last height
    add curH,nexT

    */


    class Solution {
    public:
    vector<pair<int, int>> getSkyline(vector<vector<int>>& buildings) {
    vector<pair<int,int>> ans;
    priority_queue<pair<int,int>> aliveB;
    int cur = 0, nexT = 0, curH = 0, len = buildings.size();
    while (!aliveB.empty() || (cur<len)){
    if (aliveB.empty())
    nexT = buildings[cur][0];
    else
    nexT = aliveB.top().second;
    if ((cur>=len)||(buildings[cur][0]>nexT))
    while (!aliveB.empty() && aliveB.top().second<= nexT)
    aliveB.pop();
    else{
    nexT = buildings[cur][0];
    while ((cur<len)&&(buildings[cur][0]==nexT)){
    aliveB.push(make_pair(buildings[cur][2],buildings[cur][1]));
    cur++;
    }
    }
    if (aliveB.empty())
    curH = 0;
    else
    curH = aliveB.top().first;
    if ((ans.empty())||(ans.back().second!=curH))
    ans.push_back(make_pair(nexT, curH));
    }
    return ans;
    }
    };
  • Interval: 253. Meeting Rooms II