Zhuo Han


  • 关于

  • 首页

  • 分类

  • 归档

  • 标签

  • 搜索
close

Modernize Old Movies through Colorization and Animation

发表于 2019-04-30   |   分类于 Course   |     |   阅读次数

Team: DeeperFour

Motivation

It is a pity to watch old classic movies are being forgotten by people due to limited shooting technology and the relatively boring black-and-white presentations. Therefore, we decided to make those films more enjoyable by filling them with vivid colors by virtue of existing colorization networks, and creating an animated cartoon version thanks to style-transfer techniques. However, existing colorization and animation networks work on independent images. In order to make the final output video smooth and well displayed, we added another image transfer network with a ConvLSTM layer to increase the consistency of the video through short-term and long-term temporal loss between consecutive frames. Training the animation network with different artist’s style, our work can render any black-and-white film with realistic and colorization and cartoon style in specific art style.

Previous Works

DeOldify is a deep learning based model, which presents an approach for colorization, espeically for photos taken with old/bad equipment. It is inspired and a combination of Self-Attention Generative Adversarial Network[1], Progressive Growing of GANs[2], Two Time-Scale Update Rule[3].

ConvLSTM, based on a deep recurrent network, proposes a solution to enforcing temporal consistency in a video after image processing algorithms applied. In this work, they minimize the temporal loss(short-term loss and long-term loss) for temporal stability and a perceptual loss for perceptual similarity.

CartoonGAN[4] works on transforming real-world photos into cartoon style images with two novel losses: semantic content loss for style variation and the edge-promoting adversarial loss for preserving clear edges.

CartoonGAN architecture

Datasets

We gathered four Makoto Shinkai’s works: The Place Promised in Our Early Days(雲のむこう、約束の場所), Five Centimeters per Second(秒速5センチメートル), The Garden of Words(言の葉の庭), and *Your Name(*君の名は). First, we clipped the video to frames on a pace of 1 frame per 3 seconds, and then discarded black frames and frames from crew lists. Second, we resized and cropped all frames into size $256\times 256$. And in order to make the generator to generate clearer contours, we augmented the data by dilating and smoothing the edges(regions that were found by Canny edge detector[5]).

Architectural Design

§Initial Idea

Basic Implementation

Initially, we implemented a straight-forward pipeline and let colorization and animation work as building blocks inside of it; we used original video frames as inputs, and let them pass through colorization and animation blocks and then stream together the output frames back to a video.

One of the problems for this approach is that the output video was not stable enough because of significant color gaps and strobes. Therefore we decided to use the initial approach as baseline and moved on to find ways to stabilize the video output.

Unstable Sequential Frames

§Final Solution: Pipeline with Combined Models and Consistency Loss Functions

Description
To stabilize the video, we need to make the color and the tone between consecutive frames and of a tandem of frames consistent with each other. We decided to add consistency loss during the training process. The inspiration came from Wei-Sheng Lai’s paper Learning Blind Video Temporal Consistency[6]. We used a combined loss function of perception loss, short-term and long term temporal loss. VGG perception loss was used to guarantee the quality of the colorized frame, temporal loss was designed to increase the consistency of adjacent frames. Detailed model architecture and loss functions are shown below.

Model Architecture

Detailed Model Architecture

The above picture shows the model architecture and here are some details in every iteration:

  • Sequential black-and-white frames, denoted as $I_1, …, I_{t-1}, I_{t}$ and frames colorized by pre-trained DeOldify model, denoted as $P_1, …, P_{t-1}, P_t$ are taken as inputs.
  • Colorized frames: $P_1, …, P_{t-1}, P_t$, are passed through Image Transformation Network, which as a ConvLSTM inside. The network will a function $O_t = P_t + F(P(t))$ to generate the colorized, stabilized output frame, $O_t$.
  • Meanwhile, $I_1, …, I_{t-1}, I_{t}$ are passed through a pre-trained FlowNet network, which will generate an optimal flow.
  • The optimal flow will be taken, together with colorized frames: $P _ 1, …, P _ {t-1}, P _ t$, as inputs to the warping layer, which will generate frames $ \hat {O} _ 1, … , \hat {O} _ {t-1}, \hat {O} _ t$, which will work as previous inputs in calculating LSTM loss.
  • Parameter M, is computed, based on occlusion estimation, which takes black-and-white frames, $I _ 1, …, I _ {t-1}, I _ {t}$ as inputs. This parameter controls the altitude of the consistency loss based on how different two sequential frames are(If two frames are really different, we will have a really small M, to guarantee long term and short term loss stay at a reasonable scale).
  • Calculate temporal loss based on short term loss function and long term loss function.
  • Calculate VGG perceptual loss by passing through a VGG network.
  • Calculate overall loss, and try to minimize this loss at every iterations.

Loss Function

$$\mathcal {L} _ {p}= \sum ^{T} _ {t=2} \sum ^{N} _ {i=1} \sum _ l \Vert \phi _ l(O _ t^{(i)})- \phi _ l(P _ t^{(i)})\Vert _ 1$$

$$\mathcal {L} _ {st}= \sum ^{T} _ {t=2} \sum ^{N} _ {i=1}M _ {t \Rightarrow {}t-1}^{(i)} \Vert O _ t^{(i)}- \hat {O} _ {t-1}^{(i)}\Vert _ 1$$

$$\mathcal {L} _ {lt}= \sum ^{T} _ {t=2} \sum ^{N} _ {i=1}M _ {t \Rightarrow {}t-1}^{(i)} \Vert O _ t^{(i)}- \hat {O} _ {1}^{(i)} \Vert _ 1$$

$$\mathcal {L}= \lambda _ p \mathcal {L} _ p + \lambda _ {st} \mathcal {L} _ {st}+ \lambda _ {lt} \mathcal {L} _ {lt}$$

Notations: $l$ denotes layers, and $\phi_l$ is the activation of the layer $l$. T is the number of frames in the batch, N is the total number of pixels.

VGG loss was used to make sure that the quality of the image corresponds well to human perception.

Short term loss minimize the pixel-by-pixel L-1 loss between two consecutive inputs, making them consistent. The input here are the previous frame that was warped by the pre-trained FlowNet, which will guide the image transformation network learn the transformation of current frame being processed.

Long term loss sums up pair-wise distance of a sequence of frames in the current batch to the first image of the current batch, the distance is the same pixel-by-pixel L-1 loss as used in the short term loss. Memory and speed constrained the length of frames coherence we consider, but we can always decrease batch size to conform with the limits.

Overall loss is a weighted sum of the above losses, the weight hyper-parameters from the paper[6] was used here, because training is too time consuming to tune them. $\lambda_p=10, \lambda_{st} = 100, \lambda_{lt} = 100$.

Training of the stabilization

For stabilization, we selected 60 videos with 4,209 frames in total named DAVIS-gray[7], which contains a lot of moving objects. And also additional 80 videos with 21,526 frames named VIDEVO-gray videos[8] as our training set. All frames were processed to be 480x480.

Limited by time, we only managed to finish training for 30 epoch, 1000 batch per epoch, which took around 4 days. Considering the memory limits we had, the long-term loss was handled on a 10 image period.

Evaluation

It is difficult to quantify the quality of synthesized images, especially cartoons and videos. Here, we used the warping metric[6] to measure the temporal stability on the output videos.
The metric is based on the flow warping error between two frames,

$$E _ {warp(V _ t, V _ {t+1} ) } = \frac {1} { \sum ^N _ {i=1} M _ t^{ (i) } } \sum ^N _ {i=1} M _ t ^{ (i) } ||V _ t^{ (i) } - \hat {V} _ {t+1} ^ { (i) } || ^ 2 _ 2$$

where $\hat{V} _ {t+1}$ is the warped frame $V _ {t+1}$ and $M_t \in {0, 1}$ is a non-occlusion mask indicating non-occluded regions, which is based the occlusion detection method in[9].
The average warping error over the entire sequence is calculated as:

$$E _ {warp(V)} = \frac {1}{T-1} \sum ^{T-1} _ {t=1} E _ {warp(V _ t, V _ {t+1})}$$

Quantitative evaluation on temporal warping error

Videos Our model Baseline(original processed video)
psycho_kettle 0.000293 0.000688
psycho_kiss 0.000271 0.000782
BeachByDrone 0.000188 0.000375
Cat 0.000409 0.000996
CloudsTimelapse 0.000181 0.000293
TajMahal 0.000138 0.000260
WestminsterBridge 0.000224 0.000450
SanFrancisco 0.000561 0.000739
MotherAndSon 0.000295 0.001414
WomanWorking 0.000099 0.000337
Average 0.000266 0.000633

Results

§Psycho videos

Psycho Section I
your browser does not support the video tag

Baseline Video

your browser does not support the video tag

Our Stable Video

your browser does not support the video tag

Cartoon Style

Psycho Section II
your browser does not support the video tag

Baseline Video

your browser does not support the video tag

Our Stable Video

your browser does not support the video tag

Cartoon Style

§More results

your browser does not support the video tag

BeachByDrone

your browser does not support the video tag

CloudsTimelapse

your browser does not support the video tag

Cat

your browser does not support the video tag

MotherAndSon

your browser does not support the video tag

WomanWorking

your browser does not support the video tag

FatherAndSon

your browser does not support the video tag

SanFrancisco

your browser does not support the video tag

WestminsterBridge

your browser does not support the video tag

TajMahal

Experiments and ideas we tried

§Choosing between Colorization Models

We also tried another colorization model, for example an implementation of GAN model based on[10]. And we eventually chose DeOldify as our model because it renders more vivid colors, thus will be better for animation in next phase.

Results of Using Johnson’s Network

Results of Using Johnson’s Network

Results of Using DeOldify

Results of Using DeOldify

§Tune DeOldify

Previously, we tried to do some transfer learning on DeOldify with video frames to improve its capability to work on video colorization. But the training failed because of “cuda is out of memory” on Google Cloud Platform. Therefore, we stayed with the pre-trained model.

§GAN for Stabilization

As discussed with Professor Lim, we thought about adding a stabilization structure to CartoonGAN and a stabilization discriminator. We enriched our original dataset with its consecutive frames (5 frames, 10 frames, and 20 frames), we calculated pixal-based squared distance between every two consecutive frames to make sure there are no scene-cut inside our dataset(Afterwards, we realized that we shouldn’t do that, we manually made our dataset more biased). We utilized a dual-discriminator structure[11], one is the same as CartoonGAN, to determine whether the frame-by-frame input is a real cartoon and the otherone is used to determine whether the sequence input is a consecutive cartoon using short-term and long-term loss[6][12]. However, this GAN is incredible harder to train, we downsize our batch size from 6 to 3 to 1, and reduce consecutive frames from 5 to 3, then we managed to get rid of Cuda’s out of memory error. And training speed is very slow(couldn’t finish even one epoch in a day), and still had many hyper-parameter to tune. It seemed unworthy to finish it to check if it works and tune everything, therefore we dropped this idea.

References


  1. Han Zhang et al. “Self-Attention Generative Adversarial Networks”. In: arXiv e-prints, arXiv:1805.08318 (May 2018), arXiv:1805.08318. arXiv: 1805.08318 [stat.ML]. ↩

  2. Karras, Tero, Timo Aila, Samuli Laine, and Jaakko Lehtinen. “Progressive growing of gans for improved quality, stability, and variation.” arXiv preprint arXiv:1710.10196 (2017). ↩

  3. Heusel, Martin, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Klambauer, and Sepp Hochreiter. “Gans trained by a two time-scale update rule converge to a nash equilibrium.” arXiv preprint arXiv:1706.08500 12, no. 1 (2017). ↩

  4. Yang Chen, Yu-Kun Lai, and Yong-Jin Liu. “CartoonGAN: Generative Adversarial Networks for Photo Cartoonization”. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 9465– 9474. ↩

  5. Canny, John. “A computational approach to edge detection.” In Readings in computer vision, pp. 184-203. Morgan Kaufmann, 1987. ↩

  6. Wei-Sheng Lai et al. “Learning Blind Video Temporal Consistency”. In: CoRR abs/1808.00449 (2018). arXiv: 1808.00449. url: http://arxiv.org/abs/1808. 00449. ↩ ↩ ↩ ↩

  7. https://davischallenge.org/index.html ↩

  8. https://www.videvo.net ↩

  9. Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In: NIPS (2015) ↩

  10. Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. “Perceptual losses for real-time style transfer and super-resolution.” In European conference on computer vision, pp. 694-711. Springer, Cham, 2016. ↩

  11. Nguyen, Tu, Trung Le, Hung Vu, and Dinh Phung. “Dual discriminator generative adversarial nets.” In Advances in Neural Information Processing Systems, pp. 2670-2680. 2017. ↩

  12. Ruder, Manuel, Alexey Dosovitskiy, and Thomas Brox. “Artistic style transfer for videos and spherical images.” International Journal of Computer Vision 126, no. 11 (2018): 1199-1219. ↩

Summer Lec 4-5

发表于 2018-09-05   |   分类于 Program , Leetcode   |     |   阅读次数

§Summer Lec 4: Special Data Structures

§problem

§Flatten 2D Vector
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class Vector2D {
private:
vector<vector<int>> vec2d;
int rowIdx = 0, colIdx = 0;
public:
Vector2D(vector<vector<int>>& vec2d) {
this->vec2d = vec2d;
}
/*
OR
vector<vector<int>>::iterator i, iEnd;
vector<int>::iterator j;
Vector2D(vector<vector<int>>& vec2d) {
i = vec2d.begin();
iEnd = vec2d.end();
}
*/


int next() {
if (!hasNext())
return NULL;
else
return vec2d[rowIdx][colIdx++];
}

bool hasNext() {
while (rowIdx < vec2d.size()) {
if (colIdx < vec2d[rowIdx].size())
return true;
colIdx = 0;
++rowIdx;
}
return false;
}
};

/**
* Your Vector2D object will be instantiated and called as such:
* Vector2D i(vec2d);
* while (i.hasNext()) cout << i.next();
*/

§Follow up

Java:

runtiome Exception/unchecked, eg. throw new NoSuchElementException()

checked Exception -> catch or declare

Uber interview, Implement Token Bucket. eg.sleep throws InterruptedException

remove()

§Iterator Design Pattern in Java

implement next(), hasNext(), remove()

good for Lazy Loading: File I/O, DAO, Big Data

§pancake sort
§Max Chunks To Make Sorted, Max Chunks To Make Sorted II
§shallow copy / deep copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
//sol 2
// find (0..n)max/(n..0)min, if at this position, max <= min ->add chunk
// max < min ->must add chunk, max == min->can add
// a0[p0]a1[p1]a2...[pn-2]pn-1
int n = arr.size(), res = 1;
vector<int> maxN(n), minN(n);
int maxn = maxN[0] = arr[0];
int minn = minN[n-2] = arr[n-1];
for (int i = 1; i < n-1; ++i)
maxN[i] = maxn = max(maxn, arr[i]);
for (int i = n-3; i >= 0; --i)
minN[i] = minn = min(minn, arr[i+1]);
for (int i = 0; i < n-1; ++i)
if (maxN[i]<=minN[i])
++res;
§76. Minimum Window Substring
§380. Insert Delete GetRandom O(1), 381. Insert Delete GetRandom O(1) - Duplicates allowed

Load Balancer: Randomly choose a server. Runder Robin

Array: O(1) getRandom

HashMap: O(1) insert and remove

1
2
3
4
5
vector<int> nums;
unordered_map<int, int> map;

vector<pair<int,int>> nums; //<num,map-idx>
unordered_map<int, vector<int>> map; //num, nums-idxs

§Summer Lec 5: DP & Airbnb

§Dynamic programming

  1. Deduction formula
  2. Initialization
  3. Space improvement

§problem

§Edit Distance

#####Airbnb interview questions

  • K Edit Distance, KEditDistance.java: Trie + DP

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    class TrieNode{
    public:
    vector<TrieNode*> child;
    bool isEnd;
    string word;
    TrieNode(){
    child.resize(26);
    isEnd = false;
    word = "";
    }
    ~TrieNode(){
    for (TrieNode *p : child)
    if (p)
    delete p;
    }
    };
    class Solution {
    private:
    void insert(string& word, TrieNode* root){
    TrieNode* cur = root;
    for (char c:word){
    int i = c - 'a';
    if (cur->child[i] == NULL)
    cur->child[i] = new TrieNode();
    cur = cur->child[i];
    }
    cur->isEnd = true;
    cur->word = word;
    }
    void dfs(TrieNode* cur, string& target, int k, vector<int>& dp, vector<string>& res){
    int l = target.length();
    if (cur->isEnd && dp[l] <= k)
    res.push_back(cur->word);

    vector<int> nextdp(l+1);
    nextdp[0] = dp[0]+1;
    for (int i = 0; i < 26; ++i)
    if (cur->child[i]){
    for (int j = 1; j <= l; ++j)
    //nextdp[j]: target j-1 ~ trie with c
    if (target[j-1]-'a' == i)
    //dp[j-1]: taret j-2 ~ trie without c
    nextdp[j] = dp[j-1];
    else
    //taret j-2 ~ trie without c ; j-1 ~ without c, j-2 ~ with c
    //add c; replace one letter, convert c ~ target[j-1]; add target[j-1]
    nextdp[j] = min(min(dp[j-1],dp[j]),nextdp[j-1])+1;
    dfs(cur->child[i], target, k, nextdp, res);
    }
    }
    public:
    /**
    * @param words: a set of stirngs
    * @param target: a target string
    * @param k: An integer
    * @return: output all the strings that meet the requirements
    */

    vector<string> kDistance(vector<string> &words, string &target, int k) {
    TrieNode* root = new TrieNode();
    int l = target.length();
    for (string& word:words){
    //prune
    //if (word.length() <= l+k && word.length() >= l-k)
    insert(word,root);

    }
    vector<int> dp(l+1);
    iota(dp.begin(),dp.end(),0);
    vector<string> res;
    dfs(root, target, k, dp, res);
    delete root;
    return res;
    }
    };

    ​

  • Cheapest Flights within K Stops, CheapestFlightsWithinKStops.java

    • Follow up: output the route
§221. Maximal Square

DP

§85. Maximal Rectangle, 764. Largest Plus Sign

§stack/string : Basic Calculator

§224. Basic Calculator, 227. Basic Calculator II
§772. Basic Calculator III
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
class Solution {
public:
//res pre op num/() ch
int calculate(string s) {
s = "+("+s+")++";
stack<int> stRes;
stack<int> stPre; //pre num
stack<char> stOp;
int res = 0, pre = 0, num = 0;
char op = '+';
for (char ch : s){
if (isdigit(ch))
num = num*10 + ch-'0';
else if (ch!=' '){
if (ch == '('){
stRes.push(res), res = 0;
stPre.push(pre), pre = 0;
stOp.push(op), op = '+';
}
else if (ch ==')'){
if (op=='+' || op=='-')
num = res + pre + (','-op)*num;
else if (op=='*')
num = res + pre*num;
else if (op=='/')
num = res + pre/num;
res = stRes.top(), stRes.pop();
pre = stPre.top(), stPre.pop();
op = stOp.top(), stOp.pop();
}
else {
if (op=='+' || op =='-'){
res += pre;
pre = (','-op)*num;
}
else if (op=='*')
pre *= num;
else if (op=='/')
pre /= num;

op = ch;
num = 0;
}
//cout << res <<","<< pre <<","<< num <<","<< op <<endl;
}
}
return res;
}
};
§770. Basic Calculator IV - undo

Summer Lec 1-3

发表于 2018-08-31   |   分类于 Program , Leetcode   |     |   阅读次数

§Lec 1 - Trie / Prefix Tree

§Materials

https://en.wikipedia.org/wiki/Trie

https://www.geeksforgeeks.org/trie-insert-and-search/

PPT

§Tries

  • Variable length keys

  • The decision on what path to follow is taken based on portion of the key

  • Static environment, fast retrieval but large space overhead

  • Applications: dictionaries, text searching

  • Trade space for runtime

  • Costly operation to build Trie: assume updating or rebuilding is rare

§Problem

§Implement Trie (Prefix Tree)
  • Use Array/HashMap

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    class TrieNode{
    //boolean isWord;
    //String word; //pointer to word
    List<String> words; //contains pointers to passing words
    //TrieNode[] children;
    Map<Character, TrieNode> children;

    public TrieNode(){
    this.children = new HashMap<>();
    }
    }
  • In trie implementation, we should confirm that we have frequent search but less frequent add operation

  • 谷歌变种

    • compress: apply, apple; a->p->p->l->… => app->…
    • compound word: cat, copy, copycat; root->cat, root->copy->cat=> copy -back-> cat
§Longest Word in Dictionary
1
2
3
4
5
6
7
8
9
int index = -1; //4bits, save space than pointer(use string, 4bits-32 / 8bits-64) 
/*
=> space improvement: boolean isWord;
public String dfs(TrieNode cur, StringBuilder temp, String[] ans)
temp.append(entry.getKey());
dfs
temp.deleteCharAt(temp.length()-1);// restore
*/

cur = cur.children.computeIfAbsent(c, k -> new TrieNode()); //if c not exist, return k
§Runtime

Trie1, boolean isWord, 需要StringBuilder,平均每个node有k child,有l层

Tire2, Stirng word,same runtime

Tire3,List words.,最耗空间,runtime最快,不需要dfs

§Prefix and Suffix Search
1
2
3
4
5
6
7
8
apple
->a*->ap*->...(n-1)
->*e->*le->...
->a*e
->ap*e->app*e->...(n-2)
->a*le->a*ple->...
->ap*le
->...
1
2
3
4
5
6
7
8
space:
n is avg length of one word O(n)
only prefix/only suffix: (n-1) + n-3 + … -> O(n^2)
prefix+suffix: O(n)
m words with avg length of n
worst case => no overlapping => O(m*n^2)

本题,Trie 的每一个 TrieNode 对应的是 Prefix+Suffix 的一种组合并不是某一个word或几个words

https://leetcode.com/problems/map-sum-pairs/

https://leetcode.com/problems/word-search-ii/

§Lec 2/3 - Graph: Graph Search, Union Find & Topological Sort

§Materials

Graph: Data Structure and Algorithms

Graph and Its Representations

Topological Sorting

§Graph

edges, Adjacency Matrix, Adjacency List (use map)

§Problem

§谷歌难题:maximum number of edges in graph without a cycle

undirected graph, n vertexes: n-1

directed graph, n vertexes: n-1 + n-2 + … + 1 = (n-1)n/2

Number of Connected Components in an Undirected Graph

//bfs. iterate: use heap

//dfs. recursive: stack overflow

//union-find: less memory

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
//dfs | bfs
class Solution {
public:
void dfs(vector<vector<int>>& adjL, vector<bool>& visited, int cur, int& res){
visited[cur] = true;
for (auto v : adjL[cur])
if (!visited[v])
dfs(adjL, visited, v, res);
}
int countComponents(int n, vector<pair<int, int>>& edges) {
//adjlist
vector<vector<int>> adjL = vector<vector<int>>(n,vector<int>(0));
for (auto edge:edges){
adjL[edge.first].push_back(edge.second);
adjL[edge.second].push_back(edge.first);
}
vector<bool> visited = vector<bool>(n, false);
int res = 0;

//dfs
for (int i = 0; i < n; ++i)
if (!visited[i])
dfs(adjL, visited, i, ++res);

//bfs
vector<bool> visited = vector<bool>(n, false);
int cur, visitedN = 0;
queue<int> q;
while (!q.empty() || visitedN!=n){
if (q.empty()){
for (int i = 0; i < n; ++ i)
if (!visited[i]){
cur = i;
++ res;
break;
}
}
else{
cur = q.front();
q.pop();
if (visited[cur])
continue;
}
visited[cur] = true, ++visitedN;
for (auto v : adjL[cur])
q.push(v);
}

return res;
}
};
//union-find
class Solution {
public:
int find(vector<int>& par, int cur){
return par[cur]==cur ? cur : par[cur] = find(par, par[cur]); //compress path
}
int countComponents(int n, vector<pair<int, int>>& edges) {
//union-find
vector<int> par(n);
iota(begin(par), end(par), 0); //0..n-1
for (auto edge:edges){
int u = find(par, edge.first), v = find(par, edge.second);
par[u] = v;
n -= (v!=u);
}
return n;
}
};
§Clone Graph

Evalualte Division

FB&GG high frequency. 谷歌Variation: Currency Exchange

  1. Process equations and values to build graph
    numMap: map<string, map<string, double>>
    visitedSet: set<string>
    visitKey = num + “:” + denom //如果分子分母有:,这里handle
  2. Process query by the graph

Follow Up: currency rate update->concurrency同步->synchronization/multi-thread

ConcurrentHashMap - https://quip.com/aCoTAb1SQfnD - Lec 16: Multithreading (English)

CopyOnWroteMap (suit for write Rarely)

§Topological Sort

To create a topological sort from a DAG

  1. Remove vertex that has indegree of 0
  2. Update other indegrees of other vertices who are neighbors of removed vertex
  3. Go to 1

cyclic graph -> halt

Alien Dictionary

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
class Solution {
public:
string alienOrder(vector<string>& words) {
if (words.size()==0)
return "";
unordered_map<char, unordered_set<char>> map;
unordered_map<char, int> degree;
//build graph
for (auto word:words)
for (auto ch:word)
degree[ch] = 0;
for (int i = 0; i < words.size()-1; ++i){
string cur = words[i];
string next = words[i+1];
int len = min(cur.length(),next.length());
if (cur.length()>next.length() && cur.substr(0,len).compare(next)==0)
return "";//illegal
for (int j = 0; j < len; ++j){
char c1 = cur[j], c2 = next[j];
if (c1!=c2){
if (map.find(c1)==map.end())
map[c1] = unordered_set<char>(0);
if (map[c1].count(c2)==0){
map[c1].insert(c2);
degree[c2] += 1;
}
break;
}
}
}

//topological
string res = "";
char ch;
queue<char> q;
for (auto itr = degree.begin(); itr != degree.end(); ++itr){
if ((*itr).second == 0)
q.push((*itr).first);
}
while (!q.empty()){
ch = q.front();
q.pop();
res += ch;
if (map.find(ch)!=map.end()){
for (auto next:map[ch]){
degree[next] -= 1;
if (degree[next] == 0)
q.push(next);
}
}
}
if (res.length()!=degree.size()) // existcircle
return "";
return res;
}
};

https://leetcode.com/problems/course-schedule/

https://leetcode.com/problems/course-schedule-ii/

§Union-Find
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class UnionFind {
int[] id, size;
int count;

public UnionFind(int len) {
this.id = new int[len];
this.size = new int[len];
this.count = len;
}

public boolean find(int p, int q) {
return root(p) == root(q);
}

public void union(int p, int q) {
int pi = root(p), qi = root(q);
if (this.size[pi] < this.size[qi]) {
this.id[pi] = qi;
this.size[qi] += this.size[pi];
} else {
this.id[qi] = pi;
this.size[pi] += this.size[qi];
}
this.count--;
}

public int root(int i) {
while (i != id[i]) {
id[i] = id[id[i]]; // path compression
i = id[i];
}
return i;
}
}

Graph Valid Tree

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Solution {
public:
int find(vector<int>& par, int cur){
return par[cur]==cur ? cur : par[cur] = find(par, par[cur]); //compress path
}
bool validTree(int n, vector<pair<int, int>>& edges) {
vector<int> par(n);
iota(begin(par), end(par), 0); //0..n-1
for (auto edge:edges){
int u = find(par, edge.first), v = find(par, edge.second);
par[u] = v;
if (v==u)
return false; //circle
n -= 1;
}
return n==1;//connected
}
};

Redundant Connection, Redundant Connection II

  1. If there is a node having two parents. Store them as candidates A and B, and set the second edge invalid.
    [5,1], 5 is parent of 1

  2. Perform normal union find.

    1. It is now valid tree => return candidate B.

    2. It is still not valid tree

      1. Candidates not existing => there is a loop, return current edge.
        [5,1],[1,2],[2,3],[3,4],[4,5], loop => current[4,5]
      2. Candidates existing => return candidate A instead of B.
        [1,5],[4,5],[1,2],[2,3],[3,4], 5 has 2 parents => remove[4,5]
        => union find, f(1)=5, f(2)=5, f(3)=5, f(4)=5, f(5)=5 => remove[1,5] instead

【Amazon-OA2面经】城市连接问题,即MST
给十几个城市供电,连接不同城市的花费不同,让花费最小同时连到所有的边。给出一系列connection类,里面是edge两端的城市名和它们之间的一个cost,找出要你挑一些边,把所有城市连接起来并且总花费最小。不能有环,最后所以城市要连成一个连通块。

Algorithm Steps: (Kruskal’s Algorithm)

  • Sort the graph edges with respect to their weights
  • Start adding edges to the MST from the edge with the smallest weight until the edge of the largest weight
  • Only add edges which doesn’t form a cycle , edges which connect only disconnected components
§Optimal Account Balancing, Social Network Graph Database

pre-process; non-zero debt -> DFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Solution {
private:
vector<int> debt;
public:
int helper(int cur){
while (cur < debt.size() && debt[cur]==0)
++cur;
int res = INT_MAX;
for (int i = cur+1, prev = 0; i < debt.size(); ++i)
if (debt[i] != prev && debt[i]*debt[cur] < 0){ // skip already tested or same sign debt
debt[i] += debt[cur];
res = min(res, helper(cur+1)+1);
debt[i] -= debt[cur];
prev = debt[i];
}
return res < INT_MAX? res : 0;
}
int minTransfers(vector<vector<int>>& transactions) {
unordered_map<int, long> balance;
for(auto t: transactions)
balance[t[0]] -= t[2], balance[t[1]] += t[2];
for(auto& b: balance)
if(b.second != 0)
debt.push_back(b.second);
return helper(0);
}
};
§Minimum Height Tree

keep deleting leaves layer-by-layer, until reach the root.

Specifically, first find all the leaves, then remove them. After removing, some nodes will become new leaves. So we can continue remove them. Eventually, there is only 1 or 2 nodes left. If there is only one node left, it is the root. If there are 2 nodes, either of them could be a possible root.

Time Complexity: Since each node will be removed at most once, the complexity is O(n).

§Implement HashMap

https://www.geeksforgeeks.org/implementing-our-own-hash-table-with-separate-chaining-in-java/

X7_OOD

发表于 2018-04-07   |   分类于 System Design , Program , Leetcode   |     |   阅读次数

§Design Food Delivery App

  • OOD fundamental concepts: Inheritance, encapsulation, abstrction and polymorphism
    First POJO(Plain Old Java Object), Then methods

  • Scenario, Use Cases

    1. User can search different restaurant
    2. User can select a restaurant
    3. User sees a menu
    4. Restaurant can change the menu any time
    5. User adds an item from menu
    6. User orders the food
    7. User can track the order in real time
    8. User can cancel the order
    9. User pays for the order
  • Solution

    Design Patterns involved in the design of this app:

    1. Builder Design Pattern (For adding food item and ordering)
    2. Iterator Pattern (User Sees Menu)
    3. Observer Pattern (Track an order in Real Time)
    4. (Command Design Pattern) Order and cancellation of Food
  • Major Classes

    User

    Restaurant

    Menu

    Item

    Order

  • Patterns

    • Builder Pattern

      reference: I, II

    • Iterator Pattern

    • Observer Pattern
      Subject —— notify() ——> Observers
      ​ <— subscribe(), unsubscribe() —
      eg. When DeliveryStatus changes, we can use observery pattern to notify users

      kafka:queue, pub/sub

    • Command Pattern
      eg. In order to pulic class CancelFood: IFoodOrderCommands and public class OrderFood: IFoodOrderCommands, so we can use command pattern: public interface IFoodOrderCommands

      java: runnable, callable

  • Summary

    1
    2
    3
    4
    class{ /*write class signature first,*/
    //implement
    //call servers
    }

§Reading Materials

  1. OODmock interview: Design a Vending Machine
  2. OODmock interview:How to design meeting room schedulingsystem
  3. Amazon Interview Question for SDE-2s | Design OO food delivery app catering to use cases
  4. Design OO food delivery app with C# & Design Patterns

§Problem

§760. Find Anagram Mappings | unordered_map
§Longest Common Substring | DP
§718. Maximum Length of Repeated Subarray | DP

double for loop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
if (A[i-1] == B[j-1]){
dp[i][j] = 1 + dp[i-1][j-1];
update max;
}
//Space improvement
// Since we need f[i-1][j-1], we have to update from right
if (n > m)
return findLength(B,A);
vector<int> dp(n+1,0);
for (int i = 1; i <= m; ++i)
for (int j = n; j >= 1; --j)
if (A[i-1] == B[j-1]){
dp[j] = 1+dp[j-1];
res = max(res, dp[j]);
}
else
dp[j] = 0;
§ Bloomberg 高频题 | 航班规划 | Greedy

给你一堆人从纽约飞各个地方开会的 cost。比如
A 去 城市SF, LA的 cost 分别是 200, 300
B 去城市SF, LA的 cost 分别是 200, 400
C 去城市SF, LA的 cost 分别是 100, 400
D 去城市SF, LA的 cost 分别是 320, 210
然后保证一半的人去 SF,一半的人去 LA,使得总的 cost 最小

  1. Represent the Input: Two Dimensional Array / Matrix

  2. Greedy => number not equal
    if constSF>constLA, numLA++
    else if constLA>constSF, numSF++
    else numSame++ // Equal cost

  3. Switch smallest difference

  4. Check Legal

    1
    2
    3
    4
    if (arr = null || arr.length == 0)
    throw new IllegalArgumentException();
    if (m%2 != 0)
    throw new IllegalArgumentException();

    Or use Guava: Preconditions.checkArgument

X6_DP & Knapsack

发表于 2018-03-26   |   分类于 Program , Leetcode   |     |   阅读次数
§Problem
  • 322. Coin Change

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    class Solution {
    public:
    int coinChange(vector<int>& coins, int amount) {
    int n = coins.size();
    if (n == 0)
    return -1;
    sort(coins.rbegin(),coins.rend());
    vector<int> dp(amount+1, amount+1);
    dp[0] = 0;
    for (auto coin : coins)
    for (int w = coin; w <= amount; ++w)
    dp[w] = min(dp[w], dp[w-coin]+1);
    return dp[amount] > amount ? -1 : dp[amount];
    }
    };

    Others:

    1
    2
    3
    4
    // if coin[1000, 2000], there is no need for i starting at 0
    // coins[0] + 1 may overflow if coins[0] == Integer.MAX_VALUE
    for (int i = coins[0]; i <= amount; i++) {
    for (int coin : coins) {
  • 377. Combination Sum IV, similar

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    class Solution {
    public:
    int combinationSum4(vector<int>& nums, int target) {
    int n = nums.size();
    if (n == 0)
    return 0;
    sort(nums.begin(),nums.end());
    vector<int> dp(target+1, 0);
    dp[0] = 1;
    /* remove duplicate.
    eg. 1 1 2 = 1 2 1, since 1 will not consider again
    for (auto num : nums)
    for (int w = num; w <= target; ++w)
    */

    for (int w = nums[0]; w <= target; ++w)
    for (auto num : nums){
    if (w < num)
    break;
    dp[w] += dp[w-num];
    }
    return dp[target];
    }
    };
  • DP

    1. Deduction Formula
    2. Initialization
    3. Space Improvement (Bonus)
    • Audible 题:BugdetShopping

    n dollars, bundleQuantities array, bundleCosts array

    eg. Helen has n = 50 dollars to purchase notebooks from 2 stores described by bundleQunantities = [20, 19] and bundleCost = [24, 20]. She makes the following purchases:
    ​ One bundle of 20 notebooks from shop 0 at a cost 24 dollars and has 50 – 24 = 26 dollars left.
    ​ Another bundle of 20 notebooks from shop 0 at a cost 24 dollars and has 26 – 24 = 2 dollars left.
    Helen can’t afford any more notebooks, so we return the total number of notebooks she was able to purchase: 20 + 20 = 40.

    1
    2
    3
    // f(i) is the maximum number of notebook given i dollar
    f[i] = max(f[i – b.cost] + b.quantity) given different b
    maxf = 0, maxf = max(maxf, f[i])
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    int max = 0;
    int[] f = new int[n + 1];
    for (int i = bundles[0].cost; i < f.length; i++) {
    for (Bundle b : bundles) {
    if (b.cost > i) {
    break;
    }

    if (i == b.cost) {
    f[i] = Math.max(f[i], b.quantity);
    max = Math.max(max, f[i]);
    }

    int c = f[i - b.cost];
    if (c == 0) {
    continue;
    }

    f[i] = Math.max(f[i], c + b.quantity);
    System.out.println(i + "=" + f[i]);
    max = Math.max(max, f[i]);
    }
    }

Coin Change is unlimited, while Knapsack is limited

  • 背包问题

    1. Backpack
      Given n items with size Ai, an integer m denotes the size of a backpack. How full you can fill this backpack?
    1
    2
    3
    i = [0, A.length)
    j = [m, A[i]]
    result[j] = max(result[j-A[i]]+A[i], result[j])
    1. Backpack II
      Given n items with size Ai and value Vi, and a backpack with size m. What’s the maximum value can you put into the backpack?
      Example
      Given 4 items with size [2, 3, 5, 7] and value [1, 5, 2, 4], and a backpack with size 10. The maximum value is 9.

      1
      2
      3
      i = [0, A.length)
      j = [m, A[i]]
      result[j] = max(result[j-A[i]]+v[i], result[j])
    2. https://en.wikipedia.org/wiki/Knapsack_problem
      https://www.geeksforgeeks.org/knapsack-problem
      https://brilliant.org/wiki/backpack-problem

  • 背包问题变种

    • 474. Ones and Zeroes

      maximize chossedn string -> m’s 1, n’s 0

      1
      2
      3
      for k=1..l
      i=m..s0, j=n..s1
      dp[i][j] = max(dp[i][j],dp[i-s0][j-s1]+1)// wo/w s
    • 494. Target Sum

      +/- ai = s

      1
      2
      3
      4
      5
      6
      7
      sum(P) - sum(N) = target           
      sum(P) + sum(N) + sum(P) - sum(N) = target + sum(P) + sum(N)
      2 * sum(P) = target + sum(nums)

      for num:nums
      for w = (t+s)/2 ... num
      dp[w] += dp[w-num]
    • 416. Partition Equal Subset Sum

      find if the array can be partitioned into two subsets, that sum is equal

      1
      2
      3
      4
      5
      sum(subset1) = sum/2

      for num:nums
      for w = sum/2 ... num
      dp[w] = dp[w]|dp[w-num]
§Thinking
  • DP

    1. 先枚举物品,再枚举容量。
      eg. [1, 2, 3] 组合4时,(1, 1, 2) == (1, 2, 1)。因为同一个物体不会再次考虑

      1
      2
      3
      4
      5
      /* remove duplicate.
      eg. 1 1 2 = 1 2 1, since 1 will not consider again
      for (auto num : nums)
      for (int w = num; w <= target; ++w)
      */

    2. 先枚举容量,再枚举物品。

      1
      2
      for (int w = nums[0]; w <= target; ++w)
      for (auto num : nums)
  • 背包问题:

    1. 枚举物品
    2. 枚举容量
      增序枚举,重复装入某个物品,而且尽可能多的,使价值最大
      逆序枚举:物品至多装一次
  • 背包九讲

Responsive Design

发表于 2018-03-24   |   分类于 Web   |     |   阅读次数

Lydna | Lerning Responsive Design

§Responsive Web Design - Ethan Marcotte
  1. Fluid Grid for flexible layouts
  2. Media Queries to help you adapt content to specific screen sizes
  3. Flexible Images and media that respond to changes in screen sizes as well.
  • HTML: clean code, proper semantics HTMLs

    CSS: media queries, fluid layout, transitions css graphics

    JavaScript: resourse loading, adaptive media, device functionality

§Concepts
  • viewport(have a minimum scaling factor) > the screen’s resolution

    Mobile browsers will display a web page within the viewport ans the shrink that viewport down until the content fits within the width of the screen itself. While this usually results in tiny pages, it does allow the user to see the page in its entirety and decide which areas of the page they’d like to zoom up to, and read.

    => how to control both the viewport size and its initial scale factor.

  • Controlling Viewports

    1. viewport meta tag met name="viewport" content="width=480, initial-scale=1">

    2. @viewport CSS rule @viewport{width: 480px; zoom: 1;}

      should be placed before media queries

    • device width: instructs the browser to set the viewport to the exact size as the available screen pixels.

      And the browser automatically sets initial scale to 100%.

      In fact, on iOS devices, there’s a bug with initial scale that affects pages when the orientation changes, so many people advise leaving initial scale off entirely unless you need to set the value to something other than 100%.

    • zoom: 2, -> 200%

    • control how much zoom

      eg. minimum-scale=1 maximum-scale=2 in tag, min-zoom:1; max-zoom:2in CSS. The user can scale up to 200% but can’t scale down pass 100%.

      eg. user-scalable=no or user-zoom:fixed;

  • Screen density is the number of pixels within a physical area of a screen, and newer screens are featuring higher and higher screen densities every year

    1. hardware pixels

      eg. Let’s say you had a line of text that was 30 pixels high on a lower-density screen. The same text mapped to the higher- density screen would now appear twice as small.

    2. reference pixels - “CSS pixel”

      eg. If you’ve specified that an element be 50 pixels wide, it might map to the first screen at exactly 50 pixels if the hardware and reference pixels were the same.
      On a higher-density screen, however, the device would actually display the same element over 100 hardware pixels to ensure that it remains the same visual size.

      • scaling factor - device-pixel-ratio, the ratio between reference pixels and hardware pixels
  • Designing for multiple screen densities

    • Optimize graphics for higher-density displays - Reduce bitmap graphics

      1. SVG

        Often larger than bitmaps

        Graphics are redrawn with each screen change

      2. CSS

        Complex artwork is difficult

        Often require non-semantic markup

        Property support isn’t universal

      3. use icon fonts

    • Screen-density targeting option

      1. Don’t target screen, but use high-res graphocs where needed
        issue1: Higher-resolution graphics are larger in size -> some of your users to download larger images than they really need.
        issue2: Not all screens use the same scaling factors -> you’d have to account for the largest size needed, which again leads to unnecessary file sizes in most instances.

      2. target

        if you are targeting CSS background images, you can use media queries to target specific device-pixel-ratios. -> JavaScript

  • Media Queries

    @media [not | only] screen [and] (expression){} in CSS

    media="screen and (min-width: 480px), screen and (orientation: landscape)" in tag

    modified: @import url(example.css) screen and (max-width: 480px);

  • breakpoint: A breakpoint indicates the moment your layout switches from one layout to another, and is usually triggered by the width of a screen.

    1. @media only screen and (max-width: 480px){} // mobile, eg. 480/16 = 30em
    2. @media only screen and (min-width:481px) and (max-width: 768px){} //tablet, 30em 48em
    3. @media only screen and (min-width: 769px) //desktop, 48em

    if use em, the width of the screen is set relative to the device’s font size.

  • Fluid grid

    eg. Grid systems

    downsides of frameworks: class-based, non-semantic, container/clearing elements, large in size/extra code

  • Making images responsive

    1. define with different @media

      but, in earlier versions of Android, all of the CSS background images would be downloaded, not just the ones needed for the appropriate media query.

    2. modify your .htaccess file, link to their responsive images JavaScript file, and add a query string on any image that requires a larger desktop version of that image.

      but, more complex solution, JS parse slower load time(and often replacement images are requested after the original image is already downloaded.)

      And that’s the evolution of browser prefetching: speed up load time, but may download elements that will be replaced.

    ->

    1. WHTWQ: add a new attribute called srcset, eg. <img … src="" srcset="1.png 800w, 2.png 600w">
      benefits: straight forward, uncluttered, no new html elements, backwards compatible
    2. w3c: <picture><source..><source..><img></picture>
      benefits: acts as parent for range of elements. multiple source options with alternate images, image properties, and media queries. user agent selects matching images
  • Responsive forms
    start by considering the mobile context first
    eg. use <label><input.. placehold="balabala" required="required"></label>
    Makes submitting data easier, automates input when possible, and even combines form elements to require less input from the user.

    1. combile address into a single input ans use placeholder text to show format.
    2. single dropdown menu instead of using a radio button grouping of multiple choices
  • Performance

    • Network connections, bandwidth speed, network latency
    • keep code lean
      1. use standards compliant code
      2. elimicate extraneous markup
      3. keep external scripts and styles lean and efficient
    • limit requests
      1. eg. If you’re linking to multiple external scripts, try combining them into a single file.
        Or tools: Filament Group’s QuickConcat, Enhance.js, which can combine files at runtime into single request.
      2. Combine images and icons into sprites -> use same images for multiple
      3. Use data URI: allows you to embed images directly into your CSS without having to request the image from the server
      4. Evaluate any external resource to make sure that you’re actually using them.
      5. use a JavaScript library in one context but not the other.
§Responsice Design Strategies
  • Designing for the appropriate: targeting the appropriate context in which the site will be used.
  • Planning a responsie design
    develop a content strategy, think about breakpoints
    tweak point -> allow you to refine your designs throughout the scaling proces
    device-specific functionality: geo location, instant messaging, touch sensitivity
    layout considerations: plan out your navigation, creat consistent coding standards
  • Building responsive mockups
  • Developing a context strategy for responsive sites
    goas: emphasize important content, make relationship clear, make it accessible on small screens
    start with content -> think about prioritize/purpose -> generate a set of rules that govern how content should be displayed and shifted as screen sizes change
  • Understanding the mobile context
  • Designing for mobile capabilities
    location, motion/accelerometer, built in compass, media, touch screen, phone
  • Creating flexible HTML
    1. write clean code
    2. use html5 if possible
    3. explore microformats
    4. standards for each content area
    5. creat smaller sections of content, reassess layout goals/scripting
  • Testing responsive designs
§Conclution
  • Fluid grid framework

    • benefits:
      speed up development
      help keep code consistent
      solid starting code base
      can add unnecessary code
      some require non-semantic code
  • eg. Twitter’s Bootstrap: it’s a collection of HTML, CSS, and JavaScript that includes a fluid grid, starter layouts, UI components, and much, much more.

  • eg. Skeleton: lightweight, includes a fluid grid, basic typography controls, and default styling for forms and buttons.

  • Responsice design tools

X5_Interval & Sliding Window

发表于 2018-03-22   |   分类于 Program , Leetcode   |     |   阅读次数
§Problem
  • Merge Intervals

    • follow up

      list in java is not thread safe: synchronized

      concurrent list java

    • note

      1. 先确认signature,再确认requirement
        work through test,画图说明
      2. compare的时候要用Integer.compare,而不是相减,会溢出
      3. collection.sort是modified merge sort(因为stable,保持相对顺序,eg. 1 2 2’ 3不会出现1 2’ 2 3)
        object use merge sort,primitive type use quick sort
      4. 用collections.emptyList(),是系统初始化好的,不用开辟额外的空间。如果用new Arraylist<>()会占新的空间
      5. Runtime, 要根据代码nlogn
      6. test case copy到代码附近
      7. 定义res时候直接最下面写return res。显得有条理
      8. 先定义cur,而不是每次a.get(i ),更清晰
    • FB变种,raindrop

      rope 0-1, [0.2,0.5], [0.75,1.0], [0.3,0.75]

      1. deal with float
        private static final float EPSILON = 1e-10
        Math.abs(a-b) < EPSILON

      2. If drops are given on by one. Not all in the begining. -> 57. insert Interval

        boolean isWet, 0-1 all is wet

        注意处理边界,如果drop>1

  • Comparable vs Comparator in Java

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    // 定义在class内部
    class Movie implements Comparable<Movie>{
    public int compareTo(Movie m){}
    }
    Collection.sort(list);

    // 定义专门的compare函数
    class RatingCompare implements Comparator<Movie>{
    pulic int compare(Movie m1, Movie m2){}
    }
    RatingCompare ratingCompare = new Rating Compare();
    Collection.sort(list, ratingCompare);
  • Amazon OA Window Sum

    给定数组,给定长度,算出数组里面,在这个长度下,分别的连续和

    1
    2
    3
    4
    > Given array: [1,2,3,4,5,6,7]
    > Given window: 3
    > Output: [6,9,12,15,18]
    >

    Sliding Window

  • Hard: 689. Maximum Sum of 3 Non-Overlapping Subarrays

    find 3, fixed subarray size = k => linear time

    else , not linear time, different sol, is similar to the buy and sell stock problem.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    1, 2, 1, 2, 6, 7, 5, 1
    ...........[s1 ].....
    maxFromLeft maxFromRight
    1, 2, 1, 2, 6, 7, 5, 1
    presum:3, 3, 3, 8,13,12, 6
    maxFL: 0, 0, 0, 3, 4 <-index
    3, 3, 3, 8, 13 <-value
    maxFR: 0, 0, 0, 0, 4, 5, 6 <-index
    ., ., ., .,13,12, 6 <-value

    s1: [k, n-2k]; s0:[0,i-k]; s2:[i+k,n-1]
    sum = presum[maxFromLeft[i - k]] + presum[i] + presum[maxFromRight[i + k]]
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    class Solution {
    public:
    vector<int> maxSumOfThreeSubarrays(vector<int>& nums, int k) {
    int n = nums.size();
    vector<long> presum(n,0);
    long sum = 0;
    for (int i = 0; i < n; ++i){
    if (i>=k){
    presum[i-k] = sum;
    sum -= nums[i-k];
    }
    sum += nums[i];
    }
    presum[n-k] = sum;
    vector<int> maxFL(n,0), maxFR(n,0);
    sum = 0; int idx = 0;
    for (int i = 0; i <= n-3*k; ++i){
    if (presum[i] > sum){
    sum = presum[i];
    idx = i;
    }
    maxFL[i] = idx;
    }
    sum = 0; idx = 0;
    for (int i = n-k; i >= 2*k; --i){
    if (presum[i] > sum){
    sum = presum[i];
    idx = i;
    }
    maxFR[i] = idx;
    }
    sum = 0; idx = 0;
    for (int i = k; i <= n-2*k; ++i){
    long tmp = presum[maxFL[i-k]]+presum[i]+presum[maxFR[i+k]];
    if (tmp > sum){
    sum = tmp;
    idx = i;
    }
    }
    return {maxFL[idx-k], idx, maxFR[idx+k]};
    }
    };

    ​

  • Interfaces and Inheritance in Java

    Interface inheitance: An Interface can extend other interface.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    //Class: Smple —implements—> Interface: intFB —extends—>Interface: intFA
    interface intFA{
    void geekName();
    }
    interface intFB extends intFA{
    void geekInstitute()
    }
    class sample implements intFB{

    @override
    public void geekName(){...}
    @override
    public void geekInstitute(){...}
    }
  • Airbnb: 759. Employee Free Time

    We are given a list schedule of employees, which represents the working time for each employee.

    Each employee has a list of non-overlapping Intervals, and these intervals are in sorted order.

    Return the list of finite intervals representing common, positive-length free time for all employees, also in sorted order.

    Example 1:

    1
    2
    3
    4
    5
    6
    7
    8
    > Input: schedule = [[[1,2],[5,6]],[[1,3]],[[4,10]]]
    > Output: [[3,4]]
    > Explanation:
    > There are a total of three employees, and all common
    > free time intervals would be [-inf, 1], [3, 4], [10, inf].
    > We discard any intervals that contain inf as they aren't finite.
    >
    >

    Example 2:

    1
    2
    3
    > Input: schedule = [[[1,3],[6,7]],[[2,4]],[[2,5],[9,12]]]
    > Output: [[5,6],[7,9]]
    >
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    //similar to 252, 253
    class Solution {
    public:
    vector<Interval> employeeFreeTime(vector<vector<Interval>>& schedule) {
    vector<int> starts, ends;
    for (auto s:schedule)
    for (auto i:s){
    starts.push_back(i.start);
    ends.push_back(i.end);
    }
    sort(starts.begin(),starts.end());
    sort(ends.begin(),ends.end());
    vector<Interval> ans;
    for (int i = 0; i < starts.size() - 1; ++i)
    if (ends[i] < starts[i+1])
    ans.push_back(Interval(ends[i],starts[i+1]));
    // ==> starts[ends[i].index] < starts[i+1]
    // ==> there is i starts and they have already ended
    // ==> there is free interval x before next i+1 start
    return ans;
    }
    };
  • 56. Merge Intervals | Sort

    Given a collection of intervals, merge all overlapping intervals.

    For example,
    Given [1,3],[2,6],[8,10],[15,18],
    return [1,6],[8,10],[15,18].

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    class Solution {
    public:
    vector<Interval> merge(vector<Interval>& intervals) {
    if (intervals.empty())
    return {};
    sort(intervals.begin(), intervals.end(), [](Interval a, Interval b){return a.start < b.start;});
    int end = intervals[0].end, beg = intervals[0].start;
    vector<Interval> ans;
    for (int i=1; i<intervals.size(); i++){
    if (intervals[i].start<=end)
    end = max(end,intervals[i].end);
    else {
    ans.push_back(Interval(beg,end));
    beg = intervals[i].start;
    end = intervals[i].end;
    }

    }
    if (beg!=-1)
    ans.push_back(Interval(beg,end));
    return ans;
    }
    };
  • 23. Merge k Sorted Lists

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    struct compare{
    bool operator()(const ListNode* l1, const ListNode* l2){
    return l1->val > l2->val;
    }
    };
    class Solution{
    public:
    ListNode* mergeKLists(vector<ListNode*>& lists){
    priority_queue<ListNode*, vector<ListNode*>, compare> q;
    //for (ListNode* l:lists)
    for (auto l:lists)
    if (l)
    q.push(l);
    if (q.empty())
    return NULL;
    ListNode* ans = q.top();
    ListNode* h = ans;
    q.pop();
    if (h->next) q.push(h->next);
    while (!q.empty()){
    h->next = q.top();
    q.pop();
    h = h->next;
    if (h->next) q.push(h->next);
    }
    return ans;
    }
    };

    Or, binary use Merge2Lists

§Materials

Meeting Rooms II -Interval 所有题型总结归纳

  • 683. K Empty Slots | sliding window

    There is a garden with N slots. In each slot, there is a flower. The N flowers will bloom one by one in N days. In each day, there will be exactly one flower blooming and it will be in the status of blooming since then.

    Given an array flowers consists of number from 1 to N. Each number in the array represents the place where the flower will open in that day.

    For example, flowers[i] = x means that the unique flower that blooms at day i will be at position x, where i and x will be in the range from 1 to N.

    Also given an integer k, you need to output in which day there exists two flowers in the status of blooming, and also the number of flowers between them is k and these flowers are not blooming.

    If there isn’t such day, output -1.

    Example 1:

    1
    2
    3
    4
    5
    6
    > Input: 
    > flowers: [1,3,2]
    > k: 1
    > Output: 2
    > Explanation: In the second day, the first and the third flower have become blooming.
    >

    Example 2:

    1
    2
    3
    4
    5
    > Input: 
    > flowers: [1,2,3]
    > k: 1
    > Output: -1
    >

    Note:

    1. The given array will be in the range [1, 20000].
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    /*
    flowers[i] = x : the unique flower that blooms at day i+1 will be at position x
    => day[x-1] = i+1

    x1 = f[i], x2 = f[j], x1 < x2
    sum( x1 < f[max(i,j)+1...N-1] < x2) == k
    =>
    sliding window
    left = x, right = x+k+1
    day[x+1]...day[x+k] > max(day[left],day[right])
    */

    class Solution {
    public:
    int kEmptySlots(vector<int>& flowers, int k) {
    int ans = INT_MAX;
    int n = flowers.size();
    vector<int> days(n,0);
    for (int i = 0; i < n; ++i)
    days[flowers[i]-1] = i + 1;
    int left = 0, right = k+1;
    int day = max(days[left],days[right]);
    for (int i = 1; right < n; ++i){
    if (days[i] > day)
    continue;
    if (i == right) // find one
    ans = min(ans,day);
    left = i;
    right = i+k+1;
    day = max(days[left],days[right]);
    }
    return ans == INT_MAX ? -1 : ans;
    }
    };
  • 252. Meeting Rooms | Sort

    Given an array of meeting time intervals consisting of start and end times [[s1,e1],[s2,e2],...] (si < ei), determine if a person could attend all meetings.

    For example,
    Given [[0, 30],[5, 10],[15, 20]],
    return false.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    /**
    * Definition for an interval.
    * struct Interval {
    * int start;
    * int end;
    * Interval() : start(0), end(0) {}
    * Interval(int s, int e) : start(s), end(e) {}
    * };
    */

    // sort with start time, itv[i].end>itc[i+1].start -> overloap
    class Solution {
    public:
    bool canAttendMeetings(vector<Interval>& intervals) {
    int n = intervals.size();
    if (n == 0)
    return true;
    auto comp = [](Interval& a, Interval& b){
    return a.start < b.start;
    };
    sort(intervals.begin(),intervals.end(),comp);
    for (int i = 0; i < n-1; ++i)
    if (intervals[i].end > intervals[i+1].start)
    return false;
    return true;
    }
    };
  • 253. Meeting Rooms II

    Given an array of meeting time intervals consisting of start and end times [[s1,e1],[s2,e2],...] (si < ei), find the minimum number of conference rooms required.

    For example,
    Given [[0, 30],[5, 10],[15, 20]],
    return 2.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    /*
    sort with start & sort with end
    i, j <= i
    if si < ej <=> i==j | when i comes, j is not end
    => overlap && ej<=i.end
    => since ej<=i.end, add another room -> i, next i
    ele next j
    */

    class Solution {
    public:
    int minMeetingRooms(vector<Interval>& intervals) {
    vector<int> starts, ends;
    for (auto i:intervals){ // store with idx
    starts.push_back(i.start);
    ends.push_back(i.end);
    }
    sort(starts.begin(),starts.end());
    sort(ends.begin(),ends.end());
    int ans = 0, j = 0;
    for (int i = 0; i < intervals.size(); ++i)
    if (starts[i] < ends[j])
    ++ ans; // rnum[i.idx] = ans++;
    else // can schedule after j
    ++ j; // rnum[i.idx] = rnum[j++.idx]
    return ans;
    }
    };

    follow up问题:把每个 meeting room 中的 intervals 打印出来

  • Given two lists of intervals, find their overlapping intervals. For example: l1: [1,5], [7,10], [12,18], [22,24], l2: [3,8], [13,15], [16,17], [18,21], [22,23]
    returns [3,5], [7,8], [13,15], [16,17], [18,18], [22,23]

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    // if there is something wrong, please correct me
    two pointers: i, j
    if Ai overlap Bj,
    cout << overlap
    if ie = je, i++, j++,
    else
    if is <= js, i++, je = ie + 1
    else j++, ie = je +1
    else
    if is < ie <= js < je, i++
    else j++
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    //sort with start & sort with end, then
    int count = 0, start = 0;
    for (int i = 0, j = 0; j < ends.length;) {
    if (i < begs.length && begs[i] < ends[j]) {
    if (++count == 2) {
    // enter overlap
    start = begs[i];
    }
    i++;
    } else if (i == begs.length || begs[i] > ends[j]) {
    if (--count == 1) {
    // exit overlap
    res.add(new Interval(start, ends[j]));
    }
    j++;
    } else {
    // begs[i] == ends[j]
    if (count > 1) {
    // already in overlap
    continue;
    }
    res.add(new Interval(begs[i++], ends[j++]));
    }
    }
    return res;
  • N个员工,每个员工有若干个interval表示在这段时间是忙碌的。求所有员工都不忙的intervals。每个subarray都已经sort好。
    举例: [
    [[1, 3], [6, 7]],
    [[2, 4]],
    [[2, 3], [9, 12]].
    ]
    返回 [[4, 6], [7, 9]]

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    //sort with start & sort with end, then
    int count = 0, start = 0;
    for (int i = 0, j = 0; i < begs.length;) {
    if (begs[i] < ends[j]) {
    if (++count == 1) {
    // exit free time
    if (start > 0) {
    res.add(new Interval(start, begs[i]));
    }
    }
    i++;
    } else if (begs[i] > ends[j]) {
    if (--count == 0) {
    // start free zone
    start = ends[j];
    }
    j++;
    } else {
    // begs[i] == ends[j]
    i++;
    j++;
    }
    System.out.println(res);
    }
    return res;

    ​

  • ​

What happens when you type an URL in the browser

发表于 2018-03-21   |   分类于 Web   |     |   阅读次数
§Reference I
  • what happens when you type in a URL in browser
  • What happens when you type an URL in the browser and press enter?
§Process

Attention: this is an extremely rough and oversimplified sketch, assuming the simplest possible HTTP request (no HTTPS, no HTTP2, no extras), simplest possible DNS, no proxies, single-stack IPv4, one HTTP request only, a simple HTTP server on the other end, and no problems in any step.

  1. browser checks cache; if requested object is in cache and is fresh, skip to #9

    1. Browser cache. The browser maintains a repository of DNS records for a fixed duration for websites you have previously visited
  2. browser asks OS for server’s IP address

    1. OS cache. the browser would make a system call (i.e. gethostname on Windows) to your underlying computer OS

    2. Router cache.

    3. ISP cache. Your ISP maintains its’ own DNS server which includes a cache of DNS records.

  3. OS makes a DNS lookup and replies the IP address to the browser

    DNS(Domain Name System) is a database that maintains the name of the website (URL) and the particular IP address it links to. The main purpose of DNS is human-friendly navigation.

    a recursive search since the search will continue repeatedly from DNS server to DNS server until it either finds the IP address we need or returns an error response saying it was unable to find it.

    call the ISP’s DNS server a DNS recursor -> other DNS servers are called name servers

    eg. DNS recursor -> root name server -> .com domain name server -> google.com name server, find the matching IP address

    ​

  4. browser opens a TCP connection to server (this step is much more complex with HTTPS)

    TCP/IP three-way handshake: the client and the server exchange SYN(synchronize) and ACK(acknowledge)

  5. browser sends the HTTP request through TCP connection

    GET/POST request, also contain additional information such as browser identification (User-Agent header), types of requests that it will accept (Acceptheader), and connection headers asking it to keep the TCP connection alive for additional requests.

    Example

  6. browser receives HTTP response and may close the TCP connection, or reuse it for another request

    The server response contains the web page you requested as well as the status code, compression type (Content-Encoding), how to cache the page (Cache-Control), any cookies to set, privacy information, etc.

    Example

  7. browser checks if the response is a redirect or a conditional response (3xx result status codes), authorization request (401), error (4xx and 5xx), etc.; these are handled differently from normal responses (2xx)

  8. if cacheable, response is stored in cache

  9. browser decodes response (e.g. if it’s gzipped)

  10. browser determines what to do with response (e.g. is it a HTML page, is it an image, is it a sound clip?)

  11. browser renders response, or offers a download dialog for unrecognized types

    First, it will render the bare bone HTML skeleton.

    Then it will check the HTML tags and sends out GET requests for additional elements on the web page, such as images, CSS stylesheets, JavaScript files etc. These static files are cached by the browser so it doesn’t have to fetch them again the next time you visit the page.

    At the end, you’ll see the page

§Reference II
  • 从输入URL到页面加载的过程?如何由一道题完善自己的前端知识体系!
  • 从输入 URL 到页面加载完成的过程中都发生了什么事情?
§Details
  1. 从浏览器接收url到开启网络请求线程(这一部分可以展开浏览器的机制以及进程与线程之间的关系)

    • 浏览器多进程:Browser进程(主控),第三方插件进程,GPU进程,浏览器渲染进程(内核,默认每个Tab页面一个进程)
    • 多线程的浏览器内核
    • 解析URL:protocol://host:port/path/query#fragment
  2. 开启网络线程到发出一个完整的http请求(这一部分涉及到dns查询,tcp/ip请求,五层因特网协议栈等知识)

    • DNS解析 -> IP Adress

    • tcp/ip请求构建:3次握手规则建立连接,断开连接时的四次挥手

    • 五层因特网协议栈:

      1. 应用层(dns,http) DNS解析成IP并发送http请求
      2. 传输层(tcp,udp) 建立tcp连接(三次握手)
      3. 网络层(IP,ARP) IP寻址
      4. 数据链路层(PPP) 封装成帧
      5. 物理层(利用物理介质传输比特流) 物理传输(然后传输的时候通过双绞线,电磁波等各种介质)

      OSI七层框架:物理层、数据链路层、网络层、传输层、会话层、表示层、应用层

      1. 表示层:主要处理两个通信系统中交换信息的表示方式,包括数据格式交换,数据加密与解密,数据压缩与终端类型转换等
      2. 会话层:它具体管理不同用户和进程之间的对话,如控制登陆和注销过程
  3. 从服务器接收到请求到对应后台接收到请求(这一部分可能涉及到负载均衡,安全拦截以及后台内部的处理等等)

    负载均衡:用户发起的请求都指向调度服务器(反向代理服务器,譬如安装了nginx控制负载均衡),然后调度服务器根据实际的调度算法,分配不同的请求给对应集群中的服务器执行,然后调度器等待实际服务器的HTTP响应,并将它反馈给用户

  4. 后台和前台的http交互(这一部分包括http头部、响应码、报文结构、cookie等知识,可以提下静态资源的cookie优化,以及编码解码,如gzip压缩等)

    Example

  5. 单独拎出来的缓存问题,http的缓存(这部分包括http缓存头部,etag,catch-control等)

    强缓存(200 from cache)与协商缓存(304)

  6. 浏览器接收到http数据包后的解析流程(解析html-词法分析然后解析成dom树、解析css生成css规则树、合并成render树,然后layout、painting渲染、复合图层的合成、GPU绘制、外链资源的处理、loaded和domcontentloaded等)

    1. 解析HTML,构建DOM树

    2. 解析CSS,生成CSS规则树

    3. 合并DOM树和CSS规则,生成render树

    4. 布局render树(Layout(reflow)/reflow),负责各元素尺寸、位置的计算

    5. 绘制render树(paint),绘制页面像素信息

    6. 浏览器会将各层的信息发送给GPU,GPU会将各层合成(composite),显示在屏幕上

  7. CSS的可视化格式模型(元素的渲染规则)

    1. 包含块(Containing Block)

    2. 控制框(Controlling Box)

    3. BFC(Block Formatting Context)

      如何触发BFC?

      1. 根元素
      2. float属性不为none
      3. position为absolute或fixed
      4. display为inline-block, flex, inline-flex,table,table-cell,table-caption
      5. overflow不为visible
    4. IFC(Inline Formatting Context):行内元素自身如何显示以及在框内如何摆放的渲染规则

  8. JS引擎解析过程(JS的解释阶段,预处理阶段,执行阶段生成执行上下文,VO,作用域链、回收机制等等)

    • 即时编译(JIT-Just In Time compiler)
      1. 读取代码,进行词法分析(Lexical analysis),然后将代码分解成词元(token)
      2. 对词元进行语法分析(parsing),然后将代码整理成语法树(syntax tree)
      3. 使用翻译器(translator),将代码转为字节码(bytecode)
      4. 使用字节码解释器(bytecode interpreter),将字节码转为机器码
  • JS的预处理阶段:Semicolon Insertion规则,分号补全,变量提升

  • JS的执行阶段

    • 变量对象(Variable object,VO)

    • 作用域链(Scope chain)

    • this:this的值只取决中进入上下文时的情况

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      var baz = 200;
      var bar = {
      baz: 100,
      foo: function() {
      console.log(this.baz);
      }
      };
      var foo = bar.foo;

      // 进入环境:global
      foo(); // 200,严格模式中会报错,Cannot read property 'baz' of undefined

      // 进入环境:global bar
      bar.foo(); // 100
  • 回收机制

    • simple GC:mark and sweep(标记清除),引用计数
    • 分代回收(Generation GC)
      • 多回收“临时对象”区(young generation)
      • 少回收“持久对象”区(tenured generation)
      • 减少每次需遍历的对象,从而减少每次GC的耗时
  1. 其它(可以拓展不同的知识模块,如跨域,web安全,hybrid模式等等内容)

X4_Design Tiny URL | Skyline Problem

发表于 2018-03-17   |   分类于 System Design , Program , Leetcode   |     |   阅读次数

§Design A Tiny URL system

§System Design
  • is always about trade-off && weighing pros and cons && constraints
  • is also about sacrificing what is not needed && optimizing what is most needed
§Problem
  • Ask requirement first: uique URL <-> unique alias

  • Ask Read or Write heavy: eg. Twitter: read heavy

  • Constraints

    1. Amount of data to store
    2. Amount of traffic — QPS or the query-per-second rate(read)
    3. How many new URLs per second(write)
    4. Peak time / Hot spot(->Region)
    5. Projection (10X times) Scalability
  • Numbers, eg.

    1. Daily active users: 1,000,000
    2. Insert: Per day: 1,000,000 * 1%(function usage) * 10(function frequency) = 100,000
    3. Loop up: Per day: 1,000,000 * 100%(function usage) * 3(function frequency) = 3,000,000
  • URLs: (lower case alphabet 26+upper case alphabet 26+ digit10)^length(=6) ~= 56.8 billions

  • Design 接口: REST APIs

    • Status Code: 1xx Informational responses; 2XX Success, 200 success, 201 created; 3XX Redirection, 301 moved permanently; 4xx Client errors; 5xx Server errors
    • Google URL Shortener, json
  • 存储: SQL or NoSQL?

    1. 是否需要丰富的 SQL Query? NoSQL的SQL Query丰富度不如SQL,不过目前差距正在缩小,比如 MongoDB。此题不需要,该因素可以忽略。
    2. 是否追求效率?大多数Web Framework与SQL数据库兼容得很好(自带ORM(Object,-> SQL dialect),意味着可以少些很多代码。此题没太多业务逻辑,该因素可以忽略。
    3. 是否需要支持Transaction? NoSQL不支持Transaction,但是可以在业务层做处理保证。
    4. 对QPS要求高不高?
      NoSQL性能高,MondoDB可以到10k级别,MySQL只能在k这个级别。
    5. 对Scalability的要求有多高?sharding,replica。
    • Number

      Average size of longURL = 100 bytes

      Average size of shortURL = 4 bytes(int)

      State = 4 byte

      Daily new URL = 3,000,000 * 108 = 324MB => Yearly new URL = 324*365 = 118GB

    • NoSQL,

      1. Randomly generate 6 characters containing [A-Z, a-z, 0-9]
      2. Use consistent hashing for partition: Partition By Key Or Consistent Hashing; Vitural nodes; Replication for high avaialbility
      3. Use NoSQL’s distributed key-value store
      4. Eventually consistent: quorum; others return checksum(eg. SHA1), if it is same with data -> no need to lookup
      5. Short URL already exists? What to do with conflicts? Retry
      • Write Optimization 1
        • Use centralized “Counter” to generate unique URL
        • similar to auto-increment id in SQL
        • Uniqueness guaranteed
      • Write Optimization 2
        • Round robin to different URL generators, each governing a dedicate key space
        • Partition key spaces such as “AXXXXX”, “BXXXXX” and etc.
        • Similar to sharding - avoid conflicts
    • SQL, id(primary key), long_url(unique index), created_at

      • Map 6 characters containing [A-Z, a-z, 0-9] to an integer
      • Base Conversion problem: base 62
      • More of single machine solution
      • Schema?
  • Other Factors to Consider

    1. High availability

    2. How to cache? Preload / Replace

      First use sharding(分片,根据region) to partition different regions

      You may have China DB and US DB for example

      Use redis / memcached LRU cache for repeated query

    3. Reuse popular short urls

      Store those popular urls based on clicks. Use the same short url

    4. Custom URLs

      (1) Use another table for custom URLs (SQL), since SQL use id rather than short_url

        • Need to look up two tables now for read/write
        • Atomicity not guaranteed in NoSQL

      (2) Use the same table with additional boolean column (NoSQL), short_url, long_url, created_at, custom(true/false)

        • Same look up, need to query the boolean column
        • Atomicity still guaranteed
        • Not working well for SQL solution, which is auto id based
    5. Analytics (usage of URLs), eg. referrers, browsers, countires, patforms

    6. Automatic link expiration, add last_read_time

    7. Manual link removal

      in NoSQL, such in disk, Append only operation is faster, add alive(true/false)

    8. UI v.s. API

    9. User’s historical short urls, eg. with account system

  • SQL index: use B+ tree index primary key, everytime use binary tree to find

    • When primary key is declared, DBMS automatically creates unique index

    • Often need additional indexes

    • Using CREATE INDEX command, SQL indexes can be created on basis of any selected attribute

    • Composite index

      • Index based on two or more attributes
      • Often used to prevent data duplication

§218. The Skyline Problem

  • reference

  • OOD

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    class Building implements Comparable<Building> {
    int left;
    int right;
    int height;
    public Building(int left, int right, int height) {
    this.left = left;
    this.right = right;
    this.height = height;
    }
    @Override
    public int compareTo(Building o) {
    if (this.height != o.height) {
    return Integer.compare(this.height, o.height);
    }
    if (this.left != o.left) {
    return Integer.compare(this.left, o.left);
    }
    return Integer.compare(this.right, o.right);
    }
    @Override
    public String toString() {
    return "Building [left=" + left + ", right=" + right + ", height=" + height + "]";
    }
    }
  • Other test case

    • what if left meets right (B0[1 5 10], B1[5 9 10], B2[9 12 20], B3[12 1510]) -> enter first, otherwise it will add to result (wrong)
    • What if start at same and end at same (B0[1 17 3], B1[1 3 8], B2[1 4 2], B3[13 17 8], B4[14 17 2]) -> when compare: if left1=left2, first enter larger height, if right1=right2, first leave smaller height
  • code

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    	    /**
    * The Skyline Problem
    * city's skyline is the outer contour of the silhouette formed by all the buildings in that city when viewed from a distance. Now suppose you are given the locations and height of all the buildings as shown on a cityscape photo (Figure A), write a program to output the skyline formed by these buildings collectively (Figure B).
    * The geometric information of each building is represented by a triplet of integers [Li, Ri, Hi], where Li and Ri are the x coordinates of the left and right edge of the ith building, respectively, and Hi is its height. It is guaranteed that 0 鈮� Li, Ri 鈮� INT_MAX, 0 < Hi 鈮� INT_MAX, and Ri - Li > 0. You may assume all buildings are perfect rectangles grounded on an absolutely flat surface at height 0.
    For instance, the dimensions of all buildings in Figure A are recorded as: [ [2 9 10], [3 7 15], [5 12 12], [15 20 10], [19 24 8] ] .
    The output is a list of "key points" (red dots in Figure B) in the format of [ [x1,y1], [x2, y2], [x3, y3], ... ] that uniquely defines a skyline. A key point is the left endpoint of a horizontal line segment. Note that the last key point, where the rightmost building ends, is merely used to mark the termination of the skyline, and always has zero height. Also, the ground in between any two adjacent buildings should be considered part of the skyline contour.
    For instance, the skyline in Figure B should be represented as:[ [2 10], [3 15], [7 12], [12 0], [15 10], [20 8], [24, 0] ].
    Notes:
    The number of buildings in any input list is guaranteed to be in the range [0, 10000].
    The input list is already sorted in ascending order by the left x position Li.
    The output list must be sorted by the x position.
    There must be no consecutive horizontal lines of equal height in the output skyline. For instance, [...[2 3], [4 5], [7 5], [11 5], [12 7]...] is not acceptable; the three lines of height 5 should be merged into one in the final output as such: [...[2 3], [4 5], [12 7], ...]
    https://briangordon.github.io/2014/08/the-skyline-problem.html
    */

    public List<int[]> getSkyline(int[][] buildings) {
    if (buildings.length == 0) {
    return Collections.emptyList();
    }
    Building[] lefts = new Building[buildings.length];
    Building[] rights = new Building[buildings.length];
    for (int i = 0; i < buildings.length; i++) {
    int[] b = buildings[i];
    Building building = new Building(b[0], b[1], b[2]);
    rights[i] = lefts[i] = building;
    }
    //sort according to left
    Arrays.sort(lefts, (a, b) -> {
    if (a.left != b.left) {
    return Integer.compare(a.left, b.left);
    }
    return Integer.compare(b.height, a.height);
    });
    //sort according to right
    Arrays.sort(rights, (a, b) -> {
    if (a.right != b.right) {
    return Integer.compare(a.right, b.right);
    }
    return Integer.compare(a.height, b.height);
    });

    System.out.println(Arrays.toString(lefts));
    System.out.println(Arrays.toString(rights));
    List<int[]> result = new ArrayList<>();
    TreeSet<Building> set = new TreeSet<>(); // will use compareTo to compare
    int leftsIndex = 1;
    int rightsIndex = 0;
    int h = lefts[0].height; // max height of buildings in set
    int top = h; // last seen h
    set.add(lefts[0]);
    result.add(new int[] { lefts[0].left, top });
    int index = 0;
    while (rightsIndex < buildings.length) {
    if (leftsIndex == buildings.length || rights[rightsIndex].right < lefts[leftsIndex].left) {
    index = rights[rightsIndex].right;
    System.out.println("Remove " + rights[rightsIndex]);
    set.remove(rights[rightsIndex++]);
    } else {
    index = lefts[leftsIndex].left;
    System.out.println("Add " + lefts[leftsIndex]);
    set.add(lefts[leftsIndex++]);
    }
    System.out.println(set);
    h = set.isEmpty() ? 0 : set.last().height;
    if (h != top) {
    top = h;
    result.add(new int[] { index, top });
    }
    }
    return result;
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    //
    /*
    sol1: h[mostL...mostR]
    sol2:
    maintain a PQ <height,end time>
    cur = 0,
    nexT = PQ top end time;
    if cur.begT > next // it ends before others coming
    pop all the buildings in Pq whose enttime <= nextT;
    else{
    push all the buildings at the same time,cur++
    nexT = cur.beg
    }
    //nexT = PQ's heightest building's end time
    curH = PQ's heightest or 0;
    if != last height
    add curH,nexT

    */


    class Solution {
    public:
    vector<pair<int, int>> getSkyline(vector<vector<int>>& buildings) {
    vector<pair<int,int>> ans;
    priority_queue<pair<int,int>> aliveB;
    int cur = 0, nexT = 0, curH = 0, len = buildings.size();
    while (!aliveB.empty() || (cur<len)){
    if (aliveB.empty())
    nexT = buildings[cur][0];
    else
    nexT = aliveB.top().second;
    if ((cur>=len)||(buildings[cur][0]>nexT))
    while (!aliveB.empty() && aliveB.top().second<= nexT)
    aliveB.pop();
    else{
    nexT = buildings[cur][0];
    while ((cur<len)&&(buildings[cur][0]==nexT)){
    aliveB.push(make_pair(buildings[cur][2],buildings[cur][1]));
    cur++;
    }
    }
    if (aliveB.empty())
    curH = 0;
    else
    curH = aliveB.top().first;
    if ((ans.empty())||(ans.back().second!=curH))
    ans.push_back(make_pair(nexT, curH));
    }
    return ans;
    }
    };
  • Interval: 253. Meeting Rooms II

System Design II

发表于 2018-03-17   |   分类于 System Design   |     |   阅读次数
§Scalability
  • concepts

    • Vertical scaling: CPU(more cores, L2 Cache…), Disk(PATA, SATA, SAS…; RAID), RAM…
    • Horizontal scaling: get a bunch of slower or at least cheaper machines instand of plural number of machines
    • Caching
    • Load balancing: traffic come from people -> distributed or balanced
    • Database replication
      • have master-slave; in case of database dies; if more read heavy > write heavy => select statements go to server 2,3,4, any inserts, updates or deletes go to go to server 1
      • have master-master: can write any server; if connection down, you can read from master
    • Database partitioning

    ​

    • Using NoSQL instead of scaling a relational database
    • Being asynchronous
    • how to deal with the two major bottlenecks: handling a lot of users, and handling a lot of data
  • eg.

    • Uber: A nice article about how Uber had to scale fast, about breaking your service into many micro services spread across many repos
    • Facebook: How Facebook handles 800,000 simultaneous viewers on a live stream
    • Twitter: How Twitter handles 3,000 image uploads per second and why the old ways it used would not work nowadays. Twitter subcomponents: Storing data (video | text), and Timeline (video | text).
    • Salesforce: A relatively short example from Salesforce.
    • Google, Youtube (video | text), Tumblr, StackOverflow, and Datashift.
  • Warp-up

    You first build a high-level architecture by identifying the constraints and use cases, sketching up the major components and the relationships between them, and thinking about the system’s bottlenecks. You then apply the right scalability patterns that will take care of these bottlenecks in the context of the system’s constraints.

    • eg. Scalable design

      1. Application Service layer

        • Start with one machine
        • Measure how far it takes us (load tests)
        • Add a load balance + a cluster of machines over time: to deal with spike-y traffic, to increase availability
      2. Data storage

        1. Billions of objects

        2. Each object is fairly smaill(<1K)

        3. There are no relationships between the objects

        4. Reads are 9x more frequent than writes (360 reads, 40 writes per second)

        5. 3TBs of urls, 36GB of hases

        MySQL:

        • Widely used
        • Mature technology
        • Clear acling paradigms(sharding, master/slave replication, master/master replication)
        • Used by Facebook, Twitter, Google, etc.
        • Index lookups are very fast

        mappings

        • hash: varchar(6)
        • original_url: varchar(512)
        1. Create a uniquue index on the hash(36GB+). We want to hold it in memory.

        2. Option 1: Vertival sacling of the MySQL machine

        3. Option 2: Partition the data: 5 partitions, 600GB of data, 8GB of indexes

        4. Master-slave (reading from the slaves, writes to the master)

        =>

        1. Use one MySQL table with two varchar fields
        2. Create a uniquue index on the hash(36GB+). We want to hold it in memory to speed up lookups.
        3. Vertival sacling of the MySQL machine for a while
        4. Eventually, partition the data by taking the first chr of the hash mod the number of partitions. Thinks about a master-slave setup.
  • Example: The Twitter Problem, The Summarization Problem

123
Zoey HAN

Zoey HAN

CS | Web | Write | Music

30 日志
12 分类
21 标签
GitHub Weibo facebook csdn
© 2016 - 2019 Zoey HAN
由 Hexo 强力驱动
主题 - NexT.Pisces