DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017) Veli Mäkinen 12/05/2017 1
COURSE STRUCTURE 7 weeks: video lecture -> demo lecture -> study group -> exercise Video lecture: Overview, main concepts, algorithm animations, simple examples Demonstration lecture (blackboard): Proofs, model solutions, corner cases, derivations, problem solving Study group: Problem solving in groups following the model from the lecture Exercise session: Problem solving on your own DAA 2017 week 1 / Veli Mäkinen 12/05/2017 2
DAA 2017 week 1 / Veli Mäkinen 12/05/2017 3
BACKGROUND Balanced trees, recursion, merge sort, big-o-notation, shortest paths in graphs, topological sort, connected components, spanning trees, Bellman-Ford, Dijkstra, Floyd-Warshall Chapters 1, 2, 3, 22, 23, 24, 25 We will revisit these when needed DAA 2017 week 1 / Veli Mäkinen 12/05/2017 4
TOPICS Week I: Simple recursions and their analysis (4-4.1, 4.3-4.5, 7.1). Overview of amortized analysis (Cartesian tree construction, dynamic arrays 17.4.2). Week II : More complex recurrences for divide and conquer type of problems (4.2,9.3) Week III: Network flows with an aim to introduce to reductions and shortest paths (26-26.3). Simple dynamic programming tasks like segmentation (1d clustering). Week IV: More complex dynamic programming, like those related to trees (15.2, 15.5) Week V: NP-hardness without formalities, example reductions, approximation algorithms (34 and 35 lightweight) Week VI: NP-hardness with formalities, including Cook theorem and encodings (34 heavyweight) Week VII: Randomized algorithms touching someway all the previous topics (5,7 + relevant parts of other chapters) DAA 2017 week 1 / Veli Mäkinen 12/05/2017 5
ANALYSIS OF RECURRENCES & AMORTIZED ANALYSIS DAA 2017 week 1 / Veli Mäkinen 12/05/2017 6
ANALYSIS OF RECURRENCES Three methods Substitution method (pp. 83-87) Recursion-tree method (pp. 88-93) Master method (pp. 93-96) Quicksort (Chapter 7, animation) We will continue on Week II with this topic, combining advanced recursive algorithms to the game DAA 2017 week 1 / Veli Mäkinen 12/05/2017 7
QUICKSORT pivot 4 7 8 1 3 6 5 2 9 1 3 2 4 7 8 6 5 9 1 3 2 4 6 5 7 8 9 2 3 4 5 6 7 8 9 2 3 5 6 Bad pivot causes recursion tree to be skewed O(n 2 ) worst case. We learn next week how to select median as pivot in linear time! DAA 2017 week 1 / Veli Mäkinen 12/05/2017 8
QUICKSORT WITH PERFECT PIVOT log n levels O(n) work on each level O(n log n) time This is called the recursion tree method. DAA 2017 week 1 / Veli Mäkinen 12/05/2017 9
QUICKSORT WITH PERFECT PIVOT Running time can also be stated as T(n) = 2T(n/2)+O(n), with base case T(1)=O(1). We can use substitution method to show that T(n)=O(n log n) Substitution method: 1. Assume by induction that the guessed bound holds true for inputs shorter than n. 2. Substitute the recurrences with the bounds assumed true by induction. 3. Show that the bound holds also for n. 4. Check that the induction base cases also hold. DAA 2017 week 1 / Veli Mäkinen 12/05/2017 10
SUBSTITUTION METHOD EXAMPLE Observation: big-o() notation is not compatible with substitution method, as we need more exact claims for induction to work. Hence, to solve T(n) = 2T(n/2)+O(n), we claim T(n) c n log n, where c is some constant. We also assume n=2 k for some integer k>0 (why is this fine to assume?). 1. Assume by induction T(n/2) cn/2 log (n/2). 2. T(n) = 2T(n/2)+O(n) cn log n-cn+an, for some constant a. 3. We notice that T(n) cn log n, for any c a. 4. T(1)=a by definition, T(2)=4a by definition, T(2) c2 log 2=c2, so we can pick e.g. c=2a to make the base case hold, so that T(n) cn log n holds by induction. DAA 2017 week 1 / Veli Mäkinen 12/05/2017 11
MASTER METHOD The pattern to analyse a recurrence by substitution method is usually quite similar, yet we will see more complex examples. Master Theorem characterizes many cases of recurrences of type T(n) = at(n/b)+f(n). Depending on the relationship between a,b, and f(n), three different outcomes for T(n) follow. This gives a Master method to solve this kind of recurrences: Check for which of the three cases your recurrence belongs, if any. We will state this theorem on demonstration lecture and practice the use of it. DAA 2017 week 1 / Veli Mäkinen 12/05/2017 12
AMORTIZED ANALYSIS Consider algorithms whose running time can be expressed as (time of a step) * (number of steps)=t step * #steps = t total E.g. linked list: O(1) append * n items added = O(n) Sometimes a single step can take long time, but the total time is much smaller than what the simple analysis gives Work done on heavy steps can be charged on the light steps Amortized cost of a step = t total / #steps Examples: Cartesian tree construction (separate pdf, animation) Dynamic array (17.4.2, animation) DAA 2017 week 1 / Veli Mäkinen 12/05/2017 13
CARTESIAN TREE CT(A) A = 7 9 1 5 8 3 4 2 3.5 DAA 2017 week 1 / Veli Mäkinen 12/05/2017 14
CARTESIAN TREE CONSTRUCTION 7 DAA 2017 week 1 / Veli Mäkinen 12/05/2017 15
CARTESIAN TREE CONSTRUCTION 7 9 DAA 2017 week 1 / Veli Mäkinen 12/05/2017 16
CARTESIAN TREE CONSTRUCTION 7 9 DAA 2017 week 1 / Veli Mäkinen 12/05/2017 17
CARTESIAN TREE CONSTRUCTION 7 9 1 DAA 2017 week 1 / Veli Mäkinen 12/05/2017 18
CARTESIAN TREE CONSTRUCTION 7 9 1... DAA 2017 week 1 / Veli Mäkinen 12/05/2017 19
CARTESIAN TREE CONSTRUCTION 7 9 1 5 8 3 DAA 2017 week 1 / Veli Mäkinen 12/05/2017 20
CARTESIAN TREE CONSTRUCTION 7 9 1 5 8 3 DAA 2017 week 1 / Veli Mäkinen 12/05/2017 21
CARTESIAN TREE Comparing a new item to all items in the right-most path may take O(n) time. 7 9 1 5 8 3 4 2 3.5 But after comparing an old item, you either do a local arrangement to insert the new item, or never compare that old item again (by-pass). The total running time is #by-passes + #insertions, which both are O(n). Hence, the amortized cost of modifying CT(A[1..n-1]) into CT(A[1..n]) is O(1). DAA 2017 week 1 / Veli Mäkinen 12/05/2017 22
DYNAMIC ARRAY / TABLE After last expansion to size n, there are at least n/4 deletions before shrinking (contracting). Before next doubling, there are at least n/2 insertions. Bad idea Think of charging 2 copies on each insertion / deletion. Before each doubling / shrinking you have already payed the copy work. Constant amortized insert / delete in an array. DAA 2017 week 1 / Veli Mäkinen 12/05/2017 23
STRATEGIES FOR AMORTIZED ANALYSIS Aggregate method Show that each step grows some quantity that is bounded. The bound on the quantity can be used to show that total time used for all steps is proportional to that same bound. In Cartesian tree construction, each operation added one to #by-passes or #insertions. Both are bounded by n, and hence the total number of operations is at most 2n. Accounting method Pay in advance the expensive operations by charging them from the cheap operations. Then show that any sequence of operations has more operations in bank account than the number of true operations. In Dynamic array we pay 2 copy operations at each insertion or deletion. Consider any sequence of operations after a shrink / doubling to size n until next a) shrink or b) doubling. In case a) n/4 deletions have gathered deposit n/2, which is sufficient to copy n/4 elements to a new location. In case b) n/2 insertions have gathered deposit n which is sufficient to copy n elements to a new location. DAA 2017 week 1 / Veli Mäkinen 12/05/2017 24
STRATEGIES FOR AMORTIZED ANALYSIS Potential method Let p(t), p(t) 0, be a potential of data structure after t operations with p(0)=0. Let at(t)=c(t)+p(t)-p(t-1) be the amortized time of operation t, where c(t) is the actual cost of that operation. By telescoping cancellation one can see that the sum of amortized times of n operations is c(1)+c(2)+ c(n)+p(n) and thus an upper bound for the actual running time. To show e.g. that total running time is linear, it is sufficient to show that for each type of operation amortized time is constant! This kind of analysis requires a good guess on p(t). Consider Dynamic array with insertions only. Let p(t)=2m-n, where n is the size of the array and m is the number of elements: For insertions not causing doubling, at(t)=1+2m-n-(2(m-1)-n)=3. For insertions causing doubling, at(t)=n/2+1+2(n/2+1)-n-(2n/2-n/2)=3. DAA 2017 week 1 / Veli Mäkinen 12/05/2017 25
AMORTIZED ANALYSIS VS COMPLEXITY Amortized analysis is a technique to analyse (worst case) complexity of an algorithm E.g. Cartesian tree construction takes linear worst case time. Amortized complexity refers to operations on data structures: Any series of n operations takes t total time, hence one operation takes amortized t total / n time. One can talk about amortized complexity or amortized cost of an operation. Some subset of supported operations might even have good worst case bounds. E.g. Insert / delete on dynamic arrays have amortized complexity O(1) Any series of n intermixed insertions / deletions take O(n) worst case time. DAA 2017 week 1 / Veli Mäkinen 12/05/2017 26
AMORTIZED ANALYSIS ELSEWHERE String processing algorithms, Period II, Autumn 2018: Knuth-Morris-Pratt, Aho-Corasick, suffix tree construction, LCP array construction We will also come back to this, connecting Cartesian trees with dynamic programming on Weeks III-IV DAA 2017 week 1 / Veli Mäkinen 12/05/2017 27