Re: [問題] Google Interview Question (4)

作者: stimim (qqaa)   2013-03-08 16:51:34
剛剛和 Leon 討論了一下,發現我們對題目的了解有一點不同
原題目:
Given a document and a query of K words, how do u find the smallest window
that covers all the words at least once in that document? (given you know the
inverted lists of all K words, that is, for each word, you have a list of all
its occurrrences). This one is really hard. Could someone propose an
algorithm in O(n)?
我們可以確定:
1. document is given
2. 知道我們關心的是哪 K 個字
3. 這 K 個字的 occurrence list 也是給定的
問題是,n 指的是什麼?
1. n = document size (in number of words)
在這種情況下, O(n) 是可能的,方法基本上就是 Leon 提出來的。
補充一個 O(document size) 的做法
// all indices starts from 1
Given occ (list of list of occurrences)
array = [-1, ..., -1] (size = document size)
i = 0
for list in occ
i = i + 1
for index in list
array[index] = i
count = [0, 0, ..., 0] (size = K)
nKinds = 0
start = 1
end = 1
minWindowSize = INF
while end <= document size
while end <= document size and nKinds < K
word = array[end]
end = end + 1
if word == -1 // we are not interested
continue
count[word] = count[word] + 1
if count[word] == 1
nKinds = nKinds + 1
while nKinds == K
word = array[start]
if word == -1 // we are not interested
start = start + 1
continue
count[word] = count[word] - 1
if count[word] == 0 // check this window
nKinds = nKinds - 1
if end - start < minWindowSize
minWindowSize = end - start
start = start + 1
return minWindowSizejk
2. n = 這 K 個字出現的總次數
Ex:
K = 3, 要找出包含 {a, b, c} 的最小 window
occurrences:
a = { 1, 1000, 2500}
b = {400, 1001, 2000}
c = {500, 1002, 1500}
document size = 2500 甚至更高 (其他的位置是其他的字)
n = 3 + 3 + 3 = 9 而不是 2500
在這種情況下,已知可以做到 O(n logK) (RockLee 寫的方法)
不過,在這種狀況下,給定原本的 document 似乎就沒什麼用了
而且在讀入 input 時,
讀入原本的 document 會花掉 O(document size) 的時間
所以在和 Leon 討論時,他對這種解釋方法有一點懷疑
作者: scwg ( )   2013-03-09 03:21:00
我 (ledia 應該也是) 自動採用 2. 的解釋, 因為 inverselist 可以 offline 建好, 之後這個演算法可以重複多次

Links booklink

Contact Us: admin [ a t ] ucptt.com