DS and Algorithms

Posted on 16 December 2021, by Kevin Mora

The following are my notes – based on the book "Algorithms 4th Edition, by Robert Sedgewick and Kevin Wayne."

The structure of writing a good program in computer science is similar to the structure of writing a proof in mathematics... I think, "Computers are like telescopes through which we could better investigate the computational nature of reality."

An algorithm is a method for doing something on a computer; a data structure is a layout for memory that represents some sort of data. You can't really write anything without both, so asking which one is "compulsory" is just silly: they're both vital. Moreover, you can't really split the two apart. Data structures are only interesting because of the algorithms they enable: when we say a hash table has O(1) insertion, what we really mean is that there is an O(1) algorithm to insert something into a hash table. On the flip side, most algorithms are inextricably tied to some data structure.

A data structure is a structure that stores some form of data in it. The goal of a data structure is to store the data in a form that it’s easy for the programmer to read or update it. An algorithm is a set of steps taken to achieve a particular goal, this goal can be anything (not specifically related to programming). Algorithm is just a fancy technical name given to the above definition. A good algorithm is one which uses some data structure that suit its need to make sure that those sequence of steps are achieved in the least amount of time, using the least amount of resources.

Developing Usable Algorithms

Model the problem.
Find an algorithm to solve it.
Fast enough? Fits in memory?
If not, figure out why.
Find a way to address the problem.
Iterate until satisfied.

This is the scientific approach to designing and analyzing algorithms, where we build mathematical models to try and understand what's going on, and then we do experiments to validate those models and help us improve things.

Monte Carlo Simulation

A Monte Carlo Simulation is a way of approximating the value of a function where calculating the actual value is difficult or impossible. It uses random sampling to define constraints on the value and then makes a sort of "best guess." A simple Monte Carlo Simulation can be used to calculate the value for π.

If you had a circle and a square where the length of a side of the square was the same as the diameter of the circle, the ratio of the area of the circle to the area of the square would be π/4. So, if you put this circle inside the square and select many random points inside the square, the number of points inside the circle divided by the number of points inside the square and the circle would be approximately π/4.

Union Find - Dynamic Connectivity

Given a set of n objects (their location doesn't really matter), we're going to use numbers to model them – zero through n. And then, we have the idea of a connection between two objects. Given two objects, provide a connection between them... The key part of the problem is to find a query or the connected query, which just asks, is there a path connecting the two objects? So that's our problem: intermix unions, commands and connected queries, where we need to be able to officially support those commands for a large number of objects.

CONNECTION DESCRIPTION connected(8, 9) even if we don't have a direct connection between eight and nine, there is a path from 8 to 3 to 4 to 9 union(5, 0) direct connection union(7, 2) direct connection union(6, 1) direct connection union(0, 1) redundant connection, but still a connection connected(0, 7) since there's a path, they are connected

Modeling Connections

We assume "is connected to" is an equivalence relation:

Reflexive: p is connected to p.
Symmetric: if p is connected to q, then q is connected to p.
Transitive: if p is connected to q and q is connected to r, then p is connected to r.

Union Find - Data Type

Efficient data structure for union-find:

Number of objects N can be huge.
Number of operations M can be huge.
Find queries and union commands may be intermixed.

public class UF
UF(int N)                         // initialize union-find with N objects (0 to N-1)
void union(int p, int q)          // add connection between p and q
boolean connected(int p, int q)   // are p and q in the same component?
int find(int p)                   // component identifier for p(0 to N-1)
int count()                       // number of components

Union Find - Applications

Pixels in a digital photo
Computers in a network
Friends in a social network
Transistors in a computer chip
Elements in a mathematical set
Variable names in a Fortran program
Metallic sites in a composite system

Union Find - Quick Find Algorithm

Interpretation: p and q are connected iff ("iff" stands for "if and only if") they have the same id. Each entry in the array contains a number that corresponds to one of the connected components.

Integer array id[] of size N.

public class QuickFindUF {
    private int[] id;
    public QuickFindUF(int N) {
        id = new int[N];                          // set id of each object to itself
        for(int i = 0; i < N; i++) {              // (N arrays accesses)
            id[i] = i;
        }
    }

    public boolean connected(int p, int q) {
        return id[p] == id[q];                    // check whether q and q are in the
    }                                             // same component (2 array accesses)

    public void union(int p, int q) {
        int pid = id[p];
        int qid = id[q];
        for(int i = 0; i < id.length; i++) {      // check all entries with id[p] and id[q]
            if (id[i] == pid) id[i] = qid;        // at most 2N +2 array accesses
        }
    }
}

// When using the QuickFindUF algorithm is very fast and efficient to 
// check whether 2 objects are connected. It is a bit more complicated to
// connect two objects.

In particular if you just have n union commands on n objects which is not unreasonable. They're either connected or not, which will take quadratic time in squared time. Quadratic time is much too slow, and we can't therefore accept quadratic time algorithms for large problems.

The reason is they don't scale. As computers get faster and bigger, quadratic algorithms actually get slower. Now, let's just talk roughly about what I mean by that. A very rough standard, say for now, is that people have computers that can run billions of operations per second, and they have billions of entries in main memory. That means that you could touch everything in the main memory in about a second. Now it's true when computers only have a few thousand words of memory and it's true now that they have billions, or more.

With huge memory, we can address huge problems. So we could have billions of objects, and hope to do billions of union commands on them, but the problem with that quick find algorithm is that, that would take 10^18 operations, or, say array axises or touching memory. Quadratic time algorithms are too slow, and they don't scale. You might have a new computer that's ten times as fast, and thus you could address a problem that's ten times as big, but with a quadratic algorithm the process is going to be ten times as slow...

Union Find – Quick Union Algorithm

So quick-find is too slow for huge problems. So, how are we going to do better? Our first attempt is an alternative called "Quick-Union." With this algorithm we try to avoid doing work until we have to – it uses the same data structure or array id with size m but now it has a different interpretation. To merge components containing different items (e.g. two items that are in different components) all we do is set the id of p's route to the id of q's route.

We are going to think of that array as representing a set of trees.
Each entry in the array is going to contain a reference to its parent in the tree.
Each entry in the array has associated with it a root – that's the root of its tree.
Elements that are all by themselves, in their own connected component, point to themselves.
A union operation only involves changing one entry in the array.
With just changing one value in the array we can get the two large components emerged together.

---     0	1	2	3	4	5	6	7	8	9
id[]    0 	1 	1	8 	3 	5 	5 	7 	8 	8

// Based on the array above determine how many elements 
// are in the same connected component as 8?
// - Answer: 4. 
//
// Set {3,8,9} contain component 8, but set{4} contains 3, 
// and 3 contains component 8; therefore 3 & 4 are connected.
// - Final set is: {3,4,8,9}

// PROBLEM #1
Executing the instruction union(3, 1) will result in the updated 
of one single array element from the diagram above.

Calling union(3, 1) changes the value on index: 8.
The new value on that index is: 1.

// PROBLEM #2
Executing the instruction union(5, 9) will result in the updated
of one single array element from the diagram above.

Calling union(5, 9)  changes the value on index: 0.
The new value on that index is: 8.

public class QuickUnionUF {
    private int[] id;

    public QuickUnionUF(int N){
        id = new int[N];
        for(int i = 0; i < N; i++) {               // set id of each object to
            id[i] = i;                             // itself (N array accesses)
        }
    }

    public int root(int i) {
        while(i != id[i]) i = id[i];               // chase parent pointers until reach root
        return i;                                  // (depth of i array accesses)
    }

    public boolean connected(int p, int q){        // check if p and q have same root
        return root(p) == root(q);                 // (depth of p and q array accesses)
    }

    public void union(int p, int q){ 
        int i = root(p);                           // change root of p to point to root of q
        int j = root(q);                           // (depth of p and q array accesses)
        id[i] = j;
    }
}

Quick-Union is faster than Quick-Find, but it's also too slow. The defect for Quick-Union is that the trees can get too tall, which would mean that the find operation would be too expensive – essentially, is going to be too slow if you have a lot of operations.

Union Find – Improvements

The weighted Quick-Union algorithm makes sure that the smaller tree gets linked to the root of the larger tree.

Modify quick-union to avoid tall trees.
Keep track of size of each tree (number of objects).
Balance by linking root of smaller tree to root of larger tree.

Data structure: same as quick-union, but maintain extra array sz[i] to count number of objects in the tree rooted at i.
Find / connected: identical to quick-union – i.e. root(p) == root(q).
Union: modify quick-union to link root of smaller tree to root of larger tree, and update the sz[] array.

Image above represents the quick-union algorithm and the weighted quick-union algorithm (100 sites, 88 union() operations).

We can analyze the running time mathematically and show that defined operation, whose time is proportional to how far down the nodes are in the tree (takes time proportional to depth of p). Union takes constant time, given roots. The nodes are in the tree, and we can show that it's guaranteed that the depth of any node in the tree is at most the logarithm to the base two of n.

When using the weighted Quick-Union algorithm the depth of any node x in the tree is at most lg N ("lg" stands for "base-2 logarithm").

Pf. What causes the depth of object x to increase?

Increases by 1 when tree T1 containing x is merged into another tree T2.
The size of the tree containing x at least doubles since |T2| ≥ |T1|.
Size of tree containing x can double at most lg N times.

ALGORITHM INITIALIZE UNION CONNECTED quick-find N N 1 quick-union N N N weighted QU N lg N lg N

Two-pass implementation: add second loop to find() to set the id[] of each examined node to the root.
Simpler one-pass variant (path halving): make every other node in path point to its grandparent.

public int find(int i) {
    while (i != id[i]) {
        id[i] = id[id[i]]; // only one extra line of code!
        i = id[i];
    }
 return i;
}

// Literally no reason not to.
// Keeps tree almost completely flat.

Is there any algorithm simpler than a linear algorithm? People looked for a long time for that, and actually it works out to be the case that we can prove that there is no such algorithm. There's a lot of theory that goes behind the algorithms that we use, and it's important for us to know that theory and that will help us decide how to choose which algorithms we're going to use in practice, and where to concentrate our effort in trying to find better algorithms. It was eventually proved by Fredman and Saks, that "... there is no linear time algorithm for the union find problem." In other words, no linear-time algorithm exists.

Proposition [Hopcroft-Ulman, Tarjan]:

Starting from an empty data structure, any sequence of M union-find ops on N objects makes ≤ c (N + M lg* N) array accesses. Analysis can be improved to N + M α(M, N).

// TODO ALGORITHM WORST-CASE TIME quick-find M N quick-union M N weighted QU N + M log N QU + path compression N + M log N weighted QU + path compression N + M lg* N

Weighted quick union with path compression in practice is close enough that it's going to enable the solution of huge problems. We can solve problems that could not be otherwise addressed. For example, if you have a billion operations/unions (10^9) and a billion (10^9) objects, it said before it might take thirty years to complete the computation. With this algorithm we can do it in six seconds.

Now, and what's most important to recognize about this is that it's the algorithm design that enables the solution to the problem. A faster computer wouldn't help much. You could spend millions on a super computer, and maybe you could get it done in six years instead of 30, or in two months but with a fast logarithm, you can do it in seconds, with your own PC.

Union Find – Percolation

Write a program to estimate the value of the percolation threshold via Monte Carlo simulation.

N-by-N grids of sites (each square represents a site).
White site = open; black site = closed.
Each site is open with probability p (or blocked with probability 1-p).
System percolates iff top and bottom are connected by open sites.

A system percolates if you can find a way to get from the top to the bottom through white squares. This is an example of a mathematical model where the problem is very well articulated. What's that threshold value but, nobody knows the solution to that mathematical problem. The only solution we have comes from a computational model, where we run simulations to try and determine the value of that probability. Those simulations are only enabled by fast union-find algorithms – that's our motivating example for why we might need fast union find algorithms, so let's look at that. So what we're going to run is called a so called Monte Carlo simulation, where we initialize the whole grid to be blocked (i.e., filled with black sites) and then we randomly fill in open sites.

Percolation Threshold Estimate

Create an object for each site and name them 0 to ((N^2) - 1)
Sites are in same component if connected by open sites.
Percolates iff any site on bottom row is connected to any site on top row [brute-force algorithm: N^2 calls to connected()].

Clever trick:

Introduce 2 virtual sites (and connections to top and bottom).
Percolates iff virtual top site is connected to virtual bottom site [more efficient algorithm: only 1 call to connected()].

What we want to do is connect, check whether any site in the bottom row is connected to any site in the top row, and use union find for that. The problem is that it would be a brute force algorithm – would be quadratic, because it would have n^2 calls to find, and check whether they're connected. For each site on the top, I'd check each site on the bottom, but still too slow. Instead, what we do is create a virtual site on the top and on the bottom. Then, when we want to know whether this system percolates, we just check whether the virtual top site is connected to the virtual bottom site.

So how do we model opening a new site? Well to open a site we just connect it to all its adjacent open sites. So that's a few calls to union but that's easy to implement. And then with that simple relationship we can use exactly the code that we developed to go ahead and run a simulation for this connectivity problem. That's where we get the result that, by running enough simulations for a big-enough n, that this, percolation threshold is about.

So, what is percolation threshold p* ? About 0.592746 for large square lattices – a constant known only via simulation. Fast algorithms enable accurate answers to scientific question. If we use a slow union-find algorithm we won't be able to run it for very big problems, and we won't get a very accurate answer.

Analysis of Algorithms

The idea of understanding the running time of a computation goes way back – probably even before Babbage.

Babbage said, "... as soon as an analytical engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will arise by what course of calculation can these results be arrived at by the machine in the shortest time." If you look at Babbage's machine called the analytic engine, it's got a crank on it. And literally the concern that Babbage had in knowing how long a computation would take is, how many times do we have to turn the crank. It's not that different in today's world: the crank may be something electronic that's happening a billion times a second. But still, we're looking for how many times does some discreet operation have to be performed in order to get a computation done.

The practical reason that we want to be analyzing algorithms and understanding algorithms is to avoid performance bugs. There's a long list of success stories in designing algorithm with better performance – enabling thus a solution to problems that wouldn't have been solved otherwise.

One of the first and most famous is the so called FFT algorithm. That's an algorithm for breaking down the wave form of n samples of a signal into periodic components. And that's at the basis for DVDs, JPEGs, and many other applications. There's an easy way to do it that takes time proportional to n^2, but the FFT algorithm takes only n log n steps. And the difference between n log n and n^2 is, is the difference between being able to solve a large problem and not being able to solve it. A lot of the digital technology, digital media technology that we have today is enabled by that fast algorithm.

Scientific Method in CS

Observe some feature of the natural world – in this case, it's going to be the running time of our program on a computer.
Hypothesize some model that's consistent with the observations.
Predict events using thus the hypothesis – usually a running time for larger problem size, or on a different computer.
Verify the predictions by making more observations.
Validate until we're comfortable that our model's hypothesis and observations all agree.
Experiments must be reproducible & hypothesis must be falsifiable.

The future of the natural world that we're studying is some particular computer that exists in the natural world. It changes the algorithm from an abstraction to some kind of actual physical thing happening, like electrons racing around inside the computer.

Coping with Dependence on Inputs

For many problems, the running time can vary widely depending on the input.

Input models: we can carefully model the kind of input to be processed. This approach is challenging because the model may be unrealistic.
Worst-case performance guarantees: running time of a program is less than a certain bound (as a function of the input size), no matter what the input. Such a conservative approach might be appropriate for the software that runs a nuclear reactor or a pacemaker or the brakes in your car.
Randomized algorithms: one way to provide a performance guarantee is to introduce randomness, e.g., quicksort and hashing. Every time you run the algorithm, it will take a different amount of time. These guarantees are not absolute, but the chance that they are invalid is less than the chance your computer will be struck by lightning. Thus, such guarantees are as useful in practice as worst-case guarantees.
Amortized analysis: for many applications, the algorithm input might be not just data, but the sequence of operations performed by the client. Amortized analysis provides a worst-case performance guarantee on a sequence of operations.

Proposition 0: in the linked-list implementation of Bag, Stack, 
and Queue, all operations take constant time in the worst case.

Proposition 1: in the resizing-array implementation of Bag, Stack, and Queue, 
starting from an empty data structure, any sequence of <i>N</i> operations takes time 
proportional to <i>N</i> in the worst case (amortized constant time per operation).

Turing Complete

A Turing Complete system means a system in which a program can be written that will find an answer (although with no guarantees regarding runtime or memory).

Alan Turing created a machine that can take a program, run that program, and show some result. But then he had to create different machines for different programs. So he created "Universal Turing Machine" that can take any program and run it. Programming languages are similar to those machines (although virtual). They take programs and run them. Now, a programing language is called "Turing complete," if it can run any program (irrespective of the language) that a Turing machine can run given enough time and memory.

For example: Let's say there is a program that takes 10 numbers and adds them. A Turing machine can easily run this program. But now imagine that for some reason your programming language can't perform the same addition. This would make it "Turing incomplete" (so to speak). On the other hand, if it can run any program that the universal Turing machine can run, then it's Turing complete.

Most modern programming languages (e.g. Java, JavaScript, Perl, etc.) are all Turing complete because they each implement all the features required to run programs like addition, multiplication, if-else conditions, return statements, ways to store/retrieve/erase data and so on... Turing and other early computer scientists would build a specific machine each time they wanted to solve a specific problem. We’re used to one machine that can be forever reprogrammed.

I wrote about Turing Completeness here.

Problem & Observations

Given N distinct integers, how many triples sum to exactly zero?

// Triples Sum Brute-force Algorithm

public class ThreeSum {
 public static int count(int[] a) {
     int N = a.length;
     int count = 0;
     for (int i = 0; i < N; i++)
         for (int j = i+1; j < N; j++)
             for (int k = j+1; k < N; k++)
               if (a[i] + a[j] + a[k] == 0)
               count++;
     return count;
 }

 public static void main(String[] args) {
     In in = new In(args[0]);
     int[] a = in.readAllInts();
     StdOut.println(count(a));
 }
}

How to Time a Program?

public class Stopwatch  // (part of: stdlib.jar)
Stopwatch()             // create a new stopwatch
double elapsedTime()    // time since creation (in seconds)

// Implementation:
public static void main(String[] args) {
     In in = new In(args[0]);
     int[] a = in.readAllInts();
     Stopwatch stopwatch = new Stopwatch();   // 1
     StdOut.println(ThreeSum.count(a));
     double time = stopwatch.elapsedTime();   // 2
     StdOut.println("elapsed time " + time);  // 3
}

If this were some scientific problem where we were counting something that happen in the natural world. The number of ants in an ant hill or whatever, then we'd have only a few data points and we would try to understand whats was going on by doing a plot of or running time with quite interested in on the y-axis and problem size with the x-axis. What scientists usually do because many problems fall into this class is do the plot as a lg, lg plot.

If you do it as a lg, lg plot very often you'll get a straight line. And the slope of the straight line is the key to what's going on. In this case, the slope of the straight line is three and so you can run what's called a regression to fit a late, the straight line through the data points. And then, it's not difficult to show to do the math to show that if you get a straight line and the slope is b, then your function is proportional to a N^b.

That's called the power law. And that's true of many, many scientific problems including most algorithms. So here's a little bit of the math for that: the straight line means that since we did a lg, lg plot with powers of two, that lg(T(N) = b lg N + c. And we have our empirical values of b and c and then if you raise both sides of that equation to two to that power then you get t(n) = a constant times n^b. So right away just from observation we have a pretty good model for the running time for our program, we can figure and do the math and figure out that it seems as though the running time is about ten^-10 n^3 seconds. We can use that hypothesis to go ahead and make predictions. Just plug in for different values of n and it says it will take us 400 seconds for 16,000 in our triple sum brute-force algorithm. 400 seconds is plenty of time but now we can go ahead and invest and run that experiment and sure enough we're pretty close to that 408 seconds when we run it. And now we can make a prediction for 32,000 or for whatever else we might be interested in. The model helps us do predictions without investing the expense to run the experiments. In fact, in this situation if there is a power law, and again in a very great majority of computer algorithm running times is going to be a power law.

System independent effects (determines exponent in power law):

Algorithm.
Input data. System dependent effects (determines constant in power law):
Hardware: CPU, memory, cache, …
Software: compiler, interpreter, garbage collector, …
System: operating system, network, other apps, …

Bad news: is difficult to get precise measurements. Good news: is much easier and cheaper than other sciences.

Mathematical Models

In the 60s computer systems were really becoming complicated for the first time, and computer scientists were concerned about whether we really were going to be able to understand what's going on. Knuth was very direct in saying that this is something that we certainly can do – we can calculate the total running time of a program by identifying all the basic operations, figuring out the cost, figuring out the frequency of execution and summing up the cost times frequency for all the operations.vYou have to analyze the program to determine what set of operations and the cost depends on the machine and the computer in the system is what we talked about before. The frequency leads us to mathematics because it depends on the algorithm and input data. From Knuth, we know that in principle, we can get accurate mathematical models for the performance of algorithms or programs and operations.

If you wanted to run experiments in the 60s or so, you would actually have to look at the computer's manual (every computer came with a manual that said precisely how long each instruction would take). But nowadays, it's a little more complicated, so there's some way to determine the costs of the basic operations.

How many instructions as a function of input size N?

“It is convenient to have a measure of the amount of work involved in a computing process, even though it be a very crude one. We may count up the number of times that various elementary operations are applied in the whole process and then given them various weights. We might, for instance, count the number of additions, subtractions, multiplications, divisions, recording of numbers, and extractions of figures from tables. In the case of computing with matrices most of the work consists of multiplications and writing down numbers, and we shall therefore only attempt to count the number of multiplications and recordings.” — Alan Turing

Turing knew that we want to have a measure of the amount of work involved in the process. He recognized that you didn't want to necessarily go through and do it in full detail – it's still helpful to have a crude estimate. So, you could count up the number of times that every operation is applied, give it weights, and count the that and so forth. But maybe we should just count the ones that are most expensive that's what Turing said in 1947, and realistically that's what we do nowadays. So rather than going in and counting every little detail, we take some basic operation that's maybe the most expensive and or the one that's executed the most often – the one whose cost and frequency is the highest, and use that as a proxy for running time.

Essentially, making the hypothesis that the running time is going to grow like a constant times, so in this case we're going to pick array axises. So, that's the first simplification, and the second simplification is that we're going to ignore low order terms in the formulas we derive. There's an easy way to do that: it's called the tilde notation, and the idea is when n is large in a formula like this, the n^3 term is much, much higher than the n term or sixteen. Therefore, we use cost model and tilde notation to simplify counts. Count some basic operation as a proxy for the running time is the best way to describe the cost model.

In principle, accurate mathematical models are available. In practice, formulas can be complicated, advanced mathematics might be required, and exact models best left for experts. For all the algorithms we'll try to communicate a reasonable approximate model that can be used to describe the running time.

Approximate models in this course: T(N) ~ c N^3

Order of Growth

When we analyze algorithms, actually not too many different functions arise, and actually that property allows us to really classify algorithms according to their performance as the problem size grows. Good news is there's only these few functions turn up about the algorithms that we are interested in. We can craft things that have other functions and there are counter examples to this. But really a great number of the algorithms that we consider are described by these few functions and that are plotted here. And when we are talking about the order of growth, we are not talking about the leading constant. Normally we say the running time of the algorithm is proportional to n log n.

The set of functions: 1, log N, N, N log N, N^2 , N^3 , and 2^N suffices to describe the order of growth of most common algorithms.

If: f(N) ~ c g(N) for some constant c > 0, then the order of growth of f(N) is g(N).

Ignores leading coefficient.
Ignores lower-order terms.

// The order of growth of the running time of this code is N^3:

int count = 0;
for (int i = 0; i < N; i++)
    for (int j = i+1; j < N; j++)
        for (int k = j+1; k < N; k++)
             if (a[i] + a[j] + a[k] == 0)
             count++;

Moore’s Law Squared

Order-of-Growth classifications need linear or linearithmic algorithms to keep pace with Moore's law – used to solve huge problems.

"Three orders of magnitude in machine speed and three orders of magnitude in algorithmic speed add up to six orders of magnitude in solving power. A model that might have taken a year to solve 10 years ago can now solve in less than 30 seconds."

Binary Search: Implementation & Mathematical Analysis

public static int binarySearch(int[] a, int key) {
    int lo = 0, hi = a.length-1;
    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;
        if (key < a[mid]) hi = mid - 1; // one "3-way compare"
        else if (key > a[mid]) lo = mid + 1; // one "3-way compare"
        else return mid;
     }
    return -1;
}

// Invariant: if key appears in the array a[], then a[lo] ≤ key ≤ a[hi].

Proposition: binary search uses at most 1 + lg N key compares to search in a sorted array of size N.
Def.: T(N) = # key compares to binary search a sorted subarray of size ≤ N.
Binary search recurrence: T(N) ≤ T(N / 2) + 1 for N > 1, with T(1) = 1.

Pf sketch: assume N is a power of 2.

T(N) ≤ T(N / 2) + 1 [ given ] 
	≤ T (N / 4) + 1 + 1 [ apply recurrence to first term ] 
	≤ T (N / 8) + 1 + 1 + 1 [ apply recurrence to first term ] 
	⋮ 
	≤ T (N / N) + 1 + 1 + … + 1 [ stop applying, T(1) = 1 ] 
	= 1 + lg N

The sorting-based N^2 log N algorithm for 3-SUM is significantly faster in practice than the brute-force N^3 algorithm.

With N = 8000,
brute-force takes 51.1 seconds;
sorting-based takes 0.96 seconds.

Guiding principle: typically, better order of growth ⇒ faster in practice.

Analysis of Memory

Computer scientists think of a million bits as 2^20, and a billion as 2^30 because that's a number of possible things that you can fit into 30 bits and everything is consistent with their calculations. Just in recent years we've mostly switched to a model where machines are 64-bits and pointers are eight bytes. That allows us to address much more memory, but pointers use much more space and actually this transition caused a lot of problems initially because programs were using way more space than people thought they should. We will not have to go through this kind of transition because 64 bits is definitely enough to address anything that we might need to address, 2^64 is really a huge number.

Bit: 0 or 1.
Byte: 8 bits.
Megabyte (MB): 1 million or 2^20 bytes.
Gigabyte (GB). 1 billion or 2^30 bytes.

64-bit machine: we assume a 64-bit machine with 8-byte pointers.

Can address more memory.
Pointers use more space. ... some JVMs "compress" ordinary object pointers to 4 bytes to avoid this cost.

Java Virtual Machine

A Java Virtual Machine (JVM) is free to store data any way it pleases internally, big or little endian, with any amount of padding or overhead, though primitives must behave as if they had the official sizes. Example: the JVM or native compiler might decide to store a boolean[] in 64-bit long chunks like a BitSet. It does not have to tell you, so long as the program gives the same answers.

It might allocate some temporary Objects on the stack.
It may optimize some variables or method calls totally out of existence replacing them with constants.
It might version methods or loops - i.e., compile two versions of a method, each optimized for a certain situation, then decide up front which one to call.

Then of course the hardware and OS have multilayer caches, on chip-cache, SRAM cache, DRAM cache, ordinary RAM working set and backing store on disk. Your data may be duplicated at every cache level. All this complexity means you can only very roughly predict RAM consumption.

Integer:

The 16-byte result is a little worse than expected because an int value can fit into just 4 extra bytes. Using an Integer costs me a 300 percent memory overhead compared to when I can store the value as a primitive type.

Long:

16 bytes also – clearly, an actual object size on the heap is subject to low-level memory alignment done by a particular JVM implementation for a particular CPU type. It looks like a Long is 8 bytes of Object overhead, plus 8 bytes more for the actual long value. In contrast, Integer had an unused 4-byte hole, most likely because the JVM I use forces object alignment on an 8-byte word boundary.

String:

A String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead. For a non-empty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.

Multidimensional Arrays:

Offers another surprise. Developers commonly employ constructs like int[dim1][dim2] in numerical and scientific computing. In an int[dim1][dim2] array instance, every nested int[dim2] array is an Object in its own right. Each adds the usual 16-byte array overhead. When I don't need a triangular or ragged array, that represents pure overhead. The impact grows when array dimensions greatly differ.

e.g.,:
    a int[128][2] instance takes 3,600 bytes. Compared to 
    the 1,040 bytes an int[256] instance uses (which has the same capacity), 
    3,600 bytes represent a 246 percent overhead. In the extreme case of 
    byte[256][1], the overhead factor is almost 19! Compare that to the C/C++ 
    situation in which the same syntax does not add any storage overhead.

Some garbage-collection schemes may impose a minimum object size which is separate from padding. During garbage collection, once an object gets copied from an old location to a new one, the old location may not need to hold the data for the object any more, but it will need to hold a reference to the new location; it may also need to store a reference to the old location of the object wherein the first reference was discovered and the offset of that reference within the old object (since the old object may still contain references that haven't yet been processed).

Memory Usage

Typical Memory Usage for Objects in Java.

Object Overhead: 16 bytes.
Reference: 8 bytes.
Padding: each object uses a multiple of 8 bytes.

// A Date object uses 32 bytes of memory.

public class Date {        // 16 bytes (object overhead)
    private int day;       // 4 bytes (int)
    private int month;     // 4 bytes (int)
    private int year;      // 4 bytes (int)
    ...                    // 4 bytes (padding)
}                          // --------------------------
                           // 32 bytes

Total memory usage for a data type value:

Primitive type: 4 bytes for int, 8 bytes for double, …
Object reference: 8 bytes.
Array: 24 bytes + memory for each array entry.
Object: 16 bytes + memory for each instance variable.
Padding: round up to multiple of 8 bytes.
String: 32 bytes + (24 + 2N) bytes. This is for the array that contains the characters – hence, a total of 56 + 2N bytes.
Linked list: a nested non-static (inner) class such as the Node class requires an extra 8 bytes of overhead (for a reference to the enclosing instance).

Shallow memory usage: don't count referenced objects. Deep memory usage: if array entry or instance variable is a reference, count memory (recursively) for referenced object.

Depending on context, we may or may not count the memory references by an object (recursively). For example, we count the memory for the char[] array in the memory for a String object because this memory is allocated when the string is created. But, we would not ordinarily count the memory for the String objects in a StackOfStrings object because the String objects are created by the client.

Empirical Analysis

Execute program to perform experiments.
Assume power law and formulate a hypothesis for running time.
Model enables us to make predictions.

Mathematical Analysis

Analyze algorithm to count frequency of operations.
Use tilde notation to simplify analysis.
Model enables us to explain behavior. Scientific method.
Mathematical model is independent of a particular system; applies to machines not yet built.
Empirical analysis is necessary to validate mathematical models and to make predictions.

Recommended Lecture: Analysis of Algorithms.

Big O Notation

In a nutshell, Big O is used to classify algorithms based on their scalability, and estimate the worst case running time as a function of the input size. Allows us to understand how the resource requirements of a program / algorithm scale. Remember, complexity affects performance, but not the other way around.

Counting Statements of an Algorithm

private static int sum (int[] numbers){
    int sum = 0;
    for(int i = 0; i < numbers.length; i++){
        sum = sum + numbers[i];
    }
    return sum;
}

The for-loop is executed one extra time at the very end, which gives us two extra steps.
One to access the array length, and one to compare it with the control variable.
Thus we will add 2 to the 3 steps outside of the loop.
Outside loop: 3 + (2)
Loop body: 6
Number of Steps: 6N + 5
Linear time function.

We need to predict which method takes more work and thus more time for sufficiently large input sizes. For small input sizes, we can’t predict whether one method outperforms another solely based on the fact that one requires constant time and the other linear time.

Expression	Tilde Notation
14 N^3 - 4 N^2 + 7	~ 14N^3
N^2 + 345	~ N^2
0.5 N^3	~ 0.5 N^3
0.5 N^3 + 999999 N^2 + 3000N	~ 0.5 N^3
123 N^2 + 42N -13	~ 123 N^2
40 N^2	~ 40 N^2

Assume you have a method that requires 19 steps to execute.
What is the corresponding Big-O?
// O(1)
Assume you have a method that requires 13n + 7 steps to execute.
What is the corresponding Big-O?
// O(N)

O(1) –> constant –> no N
O(N) -> theres 1N –> Linear Time
O(N^2) –> Quadratic Time
O(log N) –> Logarithmic
O(N log N) –> Linearithmic

// Binary search is N log N
// Double or half is a log in terms of algortims
// Linear search O(N)

Identifying the Big O

In order to keep the code segments concise, they don't always include the declaration/initialization of the variable.
Some code segments show a method but many don't. Always assume that the code is called and executed in an appropriate context.

Problem A is linear (N + N/2 + N/4 + ...).

int sum = 0;
for (int n = N; n > 0; n /= 2)
   for (int i = 0; i < n; i++) 
      sum++;
      
      
Problem B is linear (1 + 2 + 4 + 8 + ...).
int sum = 0;
for (int i = 1; i < N; i *= 2)
   for(int j = 0; j < i; j++)
      sum++;
      
      
Problem C is linearithmic (the outer loop loops lg N times).      
int sum = 0;
for (int i = 1; i < N; i *= 2)
   for (int j = 0; j < N; j++)
      sum++;

int x = 0;
while (x < n) {
   for(int i = 0; i < n; i++) {
      a[i]++;
   }
   x++;
}

// O(N^2)

for(int i = 1; i < n; i *= 2) {
   StdOut.print(i + " ");
}
StdOut.println();

// O(log N)

for(int i = n; i > 0; i /= 2) {
   StdOut.print("* ");
}
StdOut.println();

// O(log N)

int x = n * n;

// O(1)

int count = 0;
for(int i = n; i > 0; i--) {
   for(int j = 1; j < n; j * 2) {
      count++;
   }
}

// O(N log N)

// a1 and a2 are two integer arrays of size n
for (int i = 0; i < n; i++) {
   int index = BinarySearch.indexOf(a, b[i]);
   if (index >= 0) {
     System.out.println(b[i] + " was found on index " + index);
   }
}

// O(N log N)

for(int i = 1; i < n; i *= 2) {
   for(int j = 3; j < 8; j++) {
      a[i] += StdRandom.uniform(j);
   }
}

// O(log N)

int x = 0;
while (x < n) {
   for(int i = 0; i < x; i++) {
      System.out.print("|.");
   }
   System.out.println();
   x++;
}

// O(N^2)

for(int i = n-1; i >= 0; i /= 2) {
   if (i % 2 == 0)
      a[i]++;
   else
      a[i]--;
}

// O(log N)

if (n < 10)
   System.out.println("0" + n);
else if (n < 100)
   System.out.println(n);

// O(1)

public static void printAll(int[] a) {
   int N = a.length;
   for (int i = 0; i < N; i++) {
      for (int j = i+1; j < N; j++) {
         for (int k = j; k < 100; k++) {
            if (a[i] + a[j] + a[k] == 0) {
               StdOut.println(a[i] + " " + a[j] + " " + a[k]);
            }
         }
      }
   }
}

// O(N^2)

public int compareTo(Point2D that) {
   if (this.y < that.y) return -1;
   if (this.y > that.y) return +1;
   if (this.x < that.x) return -1;
   if (this.x > that.x) return +1;
   return 0;
}

// O(1)

private void sink(int k) {
   while (2*k <= N) {
      int j = 2*k;
      if (j < N && greater(j, j+1)) j++;
      if (!greater(k, j)) break;
      exch(k, j);
      k = j;
   }
}

// O(log N)

public static void printAll(int[] a) {
   int N = a.length;
   Arrays.sort(a);
   for (int i = 0; i < N; i++) {
      for (int j = i+1; j < N; j++) {
         int k = Arrays.binarySearch(a, -(a[i] + a[j]));
         if (k > j) StdOut.println(a[i] + " "+ a[j] + " "+ a[k]);
       }
    }
}

// O(N^2 log N)

public String toString() {
   String s = "";
   for (int i = 0; i < N; i++)
      s = s + data[i] + " ";
   return s;
}

// O(N)

Stacks and Queues – Linked List & Arrays

A List is a data structure represented as an ordered collection of values (a.k.a. items, entries, elements), where a value may occur more than once. Lists can add/remove/find elements, as well as many other operations like concatenation.

Inserting an element at a specific location of an array is a fairly expensive operation, and their sizes are fixed when you create them. If you allocate too much space and utilize too little is a waste of memory.

Linked List is a sequence of nodes such that each node contains a reference or a link to the next node in the list. Every element in the list (known as a node) has two parts – one contains data/info (e.g. String, int, Object), and the other part contains the reference (known as next, contains some memory address pointing to the next node in the list).

In order to determine where a Linked List starts we use a reference to the beginning of the list (head).
Linked Lists can actually be re-sized once they have been created.
A List is an ordered collection, able to have duplicates.

Determine the Big-O for ArrayList operations Operation Cost as O( ) read (anywhere in the array) O(1) add/remove (at the logical end of the array) O(1) add/remove (in the interior of the array) O(N) resize O(N) find by position O(1) find by target (element) O(N)

It can feel counterintuitive that adding/removing in 
the interior of the array has a linear order of growth. 
That's because the remaining elements need to be shifted.
//  
Resizing an array requires creating a new array and copying 
the elements over. That has a linear order of growth.
//
Finding a target also takes linear order of growth. Since 
there is no indication that the array is sorted, we need 
to look at each element until we find it or determined 
that it is not there.

Determine the Big-O for LinkedList operations Operation Slingly Linked List Doubly Linked List read (anywhere in the linked list) O(N) O(N) add/remove (at the head) O(1) O(1) add/remove (at the tail) O(N) O(1) add/remove (in the interior of the structure) O(N) O(N) resize O(N) O(N) find by position O(N) O(N) find by target (element) O(N) O(N)

Reading anywhere in the linked list takes a linear order of growth 
because we need to get to the element that should be read by navigating 
from the first element to the next and the next etc. until we are there. 
In the case of a doubly-linked list, we could also start at the end, 
but that still leaves us with n/2 steps in the worst case.
-
Reading anywhere in the linked list takes a linear order of growth 
because we need to get to the element that should be read by navigating 
from the first element to the next and the next etc. until we are there. 
In the case of a doubly-linked list, we could also start at the end, 
but that still leaves us with n/2 steps in the worst case.

The time/effort required to insert an element at 
the beginning of an array list is best described with:
// O(n)

The time/effort required to insert an element at 
the end of a linked list is best described with:
// O(n)

Assume you need keep track of 100 000 elements 
(the order matters). New elements are rarely added, 
but you constantly need to read and overwrite 
existing elements. What should you use?
// ArrayList

Array Resizing

When should you increase the array size? When it is full.
How big should the new array be? Twice the size.
When should you reduce the array size? When it is a quarter full.
How big should the new array be? Half the size.

Stacks

In the context of a stack we call the adding and removing of elements "push and pop."
Stack removes the element that was most recently added.
Queue removes the element that was least recently added
Last In First Out (LIFO) is the best description of a stack.
Pushing an item on the stack takes constant time.
Popping an item from the stack takes constant time.
Checking whether the stack is empty takes constant time.

// Assume you have a stack as described below:

------
12
------
25
------
123
------
5
------
17
------

// What happens when the following statement is executed: push(17);
// The element 17 is added to the stack.

// Assume you have a large amount of data that needs to be added / removed from a stack.
// The job should be completed as quickly as possible.

// Which implementation should you use? The stack that is implemented using 
// linked nodes or the stack that is implemented using an array?
// Answer: The stack that uses an array is faster.

Loitering

Loitering prevents the garbage collector from reclaiming memory that is no longer used.
Loitering is the idea that we keep references to an object that is no longer used.
Loitering is an issue when removing an element from a stack that is implemented with an array.
When are you most likely to encounter loitering? When removing elements from a data structure that uses internally an array to store the elements.

Modular Programming

Many clients can reuse the same implementation.
Completely separate the interface and the implementation.
Allows us to create libraries of algorithms and data structures that can be reused to build more complicated algorithms and data structures.

// Node is an inner class of LinkedStackOfStrings. 
// It is defined like this:

private class Node {
   String item;
   Node next;
}

// How many bytes of memory does a stack node use 
// (excluding the string itself, which is owned by the client)?
// Answer: 40 (with margin: 0).

Queues

For adding and removing elements of a queue we use the following names: enqueue and dequeue.
First In First Out (FIFO) is the best description of a queue.

// Assume you have a queue as represented below:

------------------------------------------------------------------------
    null    null    the    best    of    times    null    null    null
------------------------------------------------------------------------
     0       1       2      3       4      5       6        7      8
------------------------------------------------------------------------

// What happens if the method dequeue() is called?
// The word "the" gets removed 
// The array element on index 2 is set to null 
// The field head gets updated

Suppose that you implement a queue using a null-terminated singly-linked list, maintaining a reference to the item least recently added (the front of the list) but not maintaining a reference to the item most recently added (the end of the list). What are the worst case running times for enqueue and dequeue? Linear time for enqueue and constant time for dequeue.

Resizing Arrays

The best guideline to shrink an array is to halve the size of the array when it is 1/4 full.
Every operation takes guaranteed constant time: linked-list implementation.
Less wasted space: resizing-array implementation.
Doubling the size of the array when it is full, and halving the size when it is 1/2 full leads to thrashing.

Assume a client needs to use a stack in a situation where immediate response is of critical importance (e.g. when implementing the landing of an auto-pilot). Which implementation should be used? A linked-list implementation.

// Consider the following code that defines a ResizingArrayStackOfStrings:

public class ResizingArrayStackOfStrings {
  private String[] s;
  private int N;
  ...
}

// How much memory does the stack use in the worst case as a function of N?
// ~ 32 N bytes

// If the array is completely full, how many bytes does the stack use to store N strings? 
// ~ 8 N bytes

Iterators & Applications

Which of the following best describes a Bag? Order doesn't matter.
The Bag API could be implemented as a Stack without a push method, or as a Queue without an enqueue method.
Having lots of operations in the same API is not a good idea, since it makes it difficult to know much about the performance.
The interface Iterable has one single method: iterator.
Iterators allow us to access the elements of an array or collection.
Iterators have three methods: hasNext(), next(), remove().
Making data structures iterable allows the client to write elegant, and compact code.

// Iterable has a method that returns an Iterator.
public interface Iterable<Item> {
 Iterator<Item> iterator();
}

// An Iterator has methods: hasNext() and next().
public interface Iterator<Item> {
 boolean hasNext();
 Item next();
 void remove();    // remove() is an optional method
}

Stack Iterator - Linked list implementation

import java.util.Iterator;

public class Stack<Item> implements Iterable<Item> {
 ...
 public Iterator<Item> iterator() { 
     return new ListIterator(); 
 }
 private class ListIterator implements Iterator<Item> {
     private Node current = first;

     public boolean hasNext() { 
         return current != null; 
     }
     public void remove() { 
         /* not supported */ 
     }
     public Item next() {
         Item item = current.item;
         current = current.next;
         return item;
     }
 }

}

Stack Iterator - Array Implementation

public class Stack<Item> implements Iterable<Item> {
 …
 public Iterator<Item> iterator() { 
    return new ReverseArrayIterator(); 
 }
 private class ReverseArrayIterator implements Iterator<Item> {
     private int i = N;

     public boolean hasNext() { 
         return i > 0; 
     }
     public void remove() { 
     /* not supported */ 
     }
     public Item next() { 
         return s[--i]; 
     }
 }

}

Recursion

Recursions allow us to write clear and concise code.

A base case is something that can be solved directly.
Recursive methods/functions can, indeed, call themselves.
You can always use an explicit stack to remove recursion.

There are four common pitfalls that need to be avoided:

Missing base case
Excessive memory usage
No guaranteed convergence (that's when there are situations when the base case might not be reached).
Excessive re-computation (when the same calculations are performed over and over again).

Arithmetic Expressions

Goal: evaluate infix expressions.

( 1 + ( ( 2 + 3 ) * ( 4 * 5 ) ) )

Two-Stack Algorithm

Dijkstra's two-stack algorithm is used to evaluate arithmetic expressions. To be more precise, it computes the same value if the operator occurs after the two values...

Value: push onto the value stack.
Operator: push onto the operator stack.
Left parenthesis: ignore.
Right parenthesis: pop operator and two values; push the result of applying that operator to those values onto the operand stack.

Applications of Dijkstra's algorithm:

Postscript, forth, calculators, java virtual machine, …

public class Evaluate {

 public static void main(String[] args) {
 Stack<String> ops = new Stack<String>();
 Stack<Double> vals = new Stack<Double>();

    while (!StdIn.isEmpty()) {
        String s = StdIn.readString();
        if (s.equals("(")) ;
        else if (s.equals("+")) ops.push(s);
        else if (s.equals("*")) ops.push(s);
        else if (s.equals(")")) {
             String op = ops.pop();
             if (op.equals("+")) vals.push(vals.pop() + vals.pop());
             else if (op.equals("*")) vals.push(vals.pop() * vals.pop());
        }
    else vals.push(Double.parseDouble(s));
    }
 StdOut.println(vals.pop());
 }

}

Correctness is when an algorithm encounters an operator surrounded by two values within parentheses, which leaves the result on the value stack...

( 1 + ( ( 2 + 3 ) * ( 4 * 5 ) ) )

... as if the original input were:

( 1 + ( 5 * ( 4 * 5 ) ) )

Repeating the argument:

( 1 + ( 5 * 20 ) )
( 1 + 100 )
101

// Extensions: 
// more ops, precedence order, associativity.

Observation:

All of the parentheses are redundant!
1 2 3 + 4 5 * * +

Elementary Sorts – Comparable<E> & Comparator<T>

When your class implements Comparable, the compareTo method of the class is defining the "natural" ordering of that object. That method is contractually obligated (though not demanded) to be in line with other methods on that object, such as a 0 should always be returned for objects when the .equals() comparisons return true. A Comparator is its own definition of how to compare two objects, and can be used to compare objects in a way that might not align with the natural ordering. For example, Strings are generally compared alphabetically. Thus the "a".compareTo("b") would use alphabetical comparisons. If you wanted to compare Strings on length, you would need to write a custom comparator.

In short, there isn't much difference. They are both ends to similar means. In general implement comparable for natural order, (natural order definition is obviously open to interpretation), and write a comparator for other sorting or comparison needs.

Callbacks are references to executable code.
In Java it is possible to write a sort method that knows how to compare items of different types like Double, String, and File. One way to do that is by using types that implement a certain interface. What interface is it? Comparable <Item>.
This is the sort problem: Rearrange an array of N items into ascending order according to a defined key that is part of the item.
Professor Sedgewick explains why he uses only the static methods less and exchange to refer to data in his sorting algorithms. What was the reason? It guarantees that the array includes the same data before and after the sort.
When implementing the compareTo method certain rules have to be followed:
v.compareTo(w) needs to throws an exception if either v or w is null.
v.compareTo(w) needs to returns 0 is v is equal to w.
v.compareTo(w) needs to be a total order.

Generic Syntax

Include type parameters (no raw types) <>.
e.g., List<String> list = new ArrayList<>();
Type arguments need to be reference types.
e.g., List<Integer> list = new List<int>();

Comparable<E>

Specifies the natural order. Method: compareTo. How many parameter does compareTo have? One. Which values does compareTo return? Positive, zero, or negative if this object is greater, equal, or less than the argument passed.

Comparator<T>

Specifies an alternative order (not the natural order). Method: compare. How many parameters does compare have? Two. Which values does compare return? Positive, zero, or negative if the first argument is greater, equal, or less than the second argument.

- Class String provides a comparator. 
How is it different from the natural order? 
It ignores the case (upper-case / lower-case).

- Class Collections provides a comparator. 
How is it different from the natural order? 
It orders in reverse order (back to front based on the natural order).

Selection Sort

The algorithm works as follows:

1: Find the minimum value in the list
2: Swap it with the value in the first position
3: Repeat the steps above for the remainder of the list 
(starting at the second position and advancing each time)

Best and Worst: О(n^2).
Selection sort has minimal data movement. It requires a linear number of exchanges.
In selection sort the index i increments from 0 to N-1. At any given time all elements with an index smaller than i are in final order.
Assume you use selection sort on an array that is already sorted. What performance can you expect? Quadratic time.
Selection sort has 2 invariants. Assumption: ↑ scans from left to right.
No entry to right of ↑ is smaller than any entry to the left of ↑.
Entries on the left of ↑ (including ↑) are fixed and in ascending order.

N^2: all the time – no matter what 1 swap every time Good if we gonna minimize swaps- if we are concerned about memory

Insertion Sort

The performance of insertion sort depends on the order of the elements in the original array. If the array is in ascending order, insertion sort makes (N-1) compares and 0 exchanges.
A partially sorted array is an array where the number of inversion is <= c N. For partially sorted arrays, insertion sort runs in linear.
Insertion sort has 2 invariants. Assumption: ↑ scans from left to right.
Entries on the left of ↑ (including ↑) are in ascending order.
Entries to right of ↑ have not been seen.

In order to sort a randomly ordered array 
with no duplicate keys, insertion sort uses:

Mostly sorted array = O(N)
Not sorted array = O(N^2)

Insertion Sort asks the question: 
Is my neighbor > than me?

Merge Sort

Runtime: N lg N. Mergesort uses extra space proportional to N. A sorting algorithm in-place if it uses no more than c log N extra memory.

Merge sort is fast, but it requires more memory. Bottom-up mergesort requires no recursion

It is important to not create the auxiliary array in the recursive method because that could lead to extensive cost of extra array creation and poor performance. In order to be able to use merge sort all the elements in the array do not need to be unique (merge sort can thus have duplicates).

In general it is a good idea for Java programmers to use assertions.

Assertions throw an exception if the boolean condition is false.
Assertions should be used to check internal invariants.
Assertions help to detect logic bugs.
Assertions document code.

// PROBLEM #1
For tiny arrays mergesort is too complicated. 
There is too much overhead with the recursive calls.
One way to make mergesort even more efficient is to 
cut off at a certain array size and use insertion sort.

Which cutoff value can increase the 
performance of mergesort by about 20%? 
// A: 7.

// PROBLEM #2
Assume you are merging the elements from the 
auxiliary array back to the original array.

What happens if the elements on index i and j are identical? 
// A: The element on index i gets copied and i gets incremented.

Merge sort in a prototype for algorithm design called "divide and conquer."

Merge Sort – Comparator

Comparator allows us to specify an alternative order in which elements of a given type T should be sorted, and also decouples the definition of the type T from the definition of what it means to compare two objects of that type. Comparator Comparable Definition of the compare method can be outside of the data type T. Used to implement the natural order. Used if you want to be able to sort a given data type T by a number of different sorting criteria. Definition of the compareTo method has to be put in the data type T. How to use Comparator with the sort method of class Arrays? Create a Comparator object and pass it as second argument to Arrays.sort().

Merge Sort – Stability & Sorting Complexity

Lower bounds for mergesort: N log N, N. Merge sort is optimal with respect to the number of compares. Merge sort is not optimal with respect to space usage. If additional information is known about the input elements the lower bound N lg N might not hold. A sorting algorithm is stable if equal items are never moved past each other.

Stable algorithms:

Merge Sort
Insertion Sort - what makes them stable?
They preserve the relative order of items with equal keys.
They ensure that if an array has been sorted by a given key and then we sort it by a second field it maintains the original sort for all elements that have equal keys in that second field.

Which of the following should you do if you want to ensure that your implementation of the merge method results in a stable mergesort? If 2 keys are equal it takes the element from the left subarray. In the context of computational complexity the upper bound is a cost guarantee provided by some algorithm for X. It is an upper bound on how difficult it is to solve problem X. What does it mean if we say that sorting has an upper bound of ~ N lg N? It means that there is at least one algorithm that can sort in time proportional to N lg N.

// PROBLEM #1
Assume you are eager to discover the next big break-through 
in algorithms. A team that works on the development of a sorting 
algorithm that guarantees ⅓ N lg N compares offers you the 
opportunity to join them as a research assistant.

Should you accept? 
// No, fuck that.

Quick Sort

Random shuffling can make it extremely unlikely that the worst case happens.
Many duplicates can lead to quadratic run-time if not implemented correctly.
In the worst case the number of compares is quadratic.

Mergesort requires less comparisons than quicksort. In practice, however, quick sort is significantly faster than mergesort.

Step 1: Shuffle the array.
Step 2: Partition the array so that, for some j. Entry a[j] is in place. No larger entry to the left of j. No smaller entry to the right of j.
Step 3: Sort each subarray recursively.

Quicksort partitioning demo Repeat until i and j pointers cross. ・Scan i from left to right so long as (a[i] < a[lo]). ・Scan j from right to left so long as (a[j] > a[lo]). ・Exchange a[i] with a[j]. When pointers cross. ・Exchange a[lo] with a[j].

private static int partition(Comparable[] a, int lo, int hi) {
    int i = lo, j = hi+1;
    while (true) {
        while (less(a[++i], a[lo]))   // find item on left to swap
            if (i == hi) break;
        while (less(a[lo], a[--j]))   // find item on right to swap
            if (j == lo) break;
        if (i >= j) break;            // check if pointers cross        
        exch(a, i, j);                // swap
    }
exch(a, lo, j);    // swap with partitioning item
return j;          // return index of item now known to be in place
}

Partitioning in-place. Using an extra array makes partitioning easier (and stable), but is not worth the cost. Terminating the loop. Testing whether the pointers cross is trickier than it might seem. Equal keys. When duplicates are present, it is (counter-intuitively) better to stop scans on keys equal to the partitioning item's key. Preserving randomness. Shuffling is needed for performance guarantee. Equivalent alternative. Pick a random partitioning item in each subarray. Quicksort Gets the sort done in-place Mergesort Is stable Mergesort Requires extra space for sorting Quicksort Needs to be shuffled before sorting

Assume you partition an array that includes duplicate keys.
When i or j reach a key that is equal to the partitioning key, we should stop.

Assume you have an ordered array of unique elements (no duplicates). 
You attempt to quicksort this array without shuffling.
Where will the first partitioning element (the first pivot) end up? At the front.

Quicksort is faster than Mergesort. But how much faster?
Let's assume sorting a very large array with mergesort takes 36 minutes.
How long would it approximately take to sort the same array with quicksort? 24 minutes.

Assume you have an array of distinct elements and you just partitioned it exactly once.
Where is the smallest item in the array? Consider what 
happens if the pivot is the smallest element...
It is left of the partitioning element (pivot) or on the left-most postion.

Quick Sort – Selection & Duplicate Keys

QuickSelect is similar to Quicksort. However, we only go in one sub-array or the other depending on where index k is.
QuickSelect takes linear time on average.
QuickSelect relies on random shuffling.
QuickSelect finds the k-th largest item in a given array.
The worst case performance of QuickSelect is quadratic.

// Assume you have an implementation of quicksort that
// does the random shuffling before the actual sort starts.
// Quicksort can still go quadratic.

// Using randomized quicksort with 3-way 
// partitioning can reduce the runtime 
// from linearithmic to linear for a broad 
// class of applications.

Quick Sort – System Sorts

Arrays.sort() has both a method for data types that implement Comparable and a method to uses a Comparator.
Arrays.sort() is intended to be a general purpose sorting method. What sorting algorithm is used to implement Arrays.sort()?
Quicksort for primitive types and Mergesort for objects.
Sorting algorithms are used in a wide variety of applications:
Identify statistical outliers.
Find duplicates in mailing list.
Data compression.
Display Google PageRank results.

// Assume you have tens of millions of sensor measurements hat have already been sorted.
// Every day about 1000 new measurements are added.
// Before the update can be considered complete the list has to be sorted again. 

// You want to find the fastest algorithm that uses the least amount of memory.
// What sorting algorithm should you use? Insertion sort.

// Assume you need a sorting algorithm for a huge amount of data. 
// Because of the size of your data set you want to use the 
// fastest algorithm that uses the least amount of memory.

// Which sorting algorithm should you choose? Quicksort

Priority Queue – Binary Heap & API

Binary heaps are complete binary trees represented in arrays.
Keys are stored in the nodes satisfying the heap order property.
Describes insertion into a heap: add node at end, then swim it up.
The largest key is stored on position is 1.
The parent of the node at index k is at index k/2.
Order-of-growth of running time for a priority queue that is implemented with a binary heap:
Insert - log N del max: log N max: 1
A binary tree is either empty or a node with two links: one to a left binary tree and one to a right binary tree.
Complete Trees:
Complete trees are perfectly balanced, except for bottom the level.
The height only increases when N is a power of 2.

To remove an element from the heap use the sink method. When removing an element it is important to avoid loitering. It's not a good idea to create defensive copies of immutable instance fields when implementing a heap – we need to make sure the client cannot change a key once it has been added to the heap.

There are two ways to implement priority queues – ordered and unordered arrays. However, they have significant drawbacks:

Ordered Arrays are too slow during insertion.
Unordered Arrays are too slow when deleting.

Applications that use priority queues:

Maintain largest M values in a sequence.
Huffman Codes.
Dijkstra's Algorithm.

Java immutable data types: String, Integer, Vector, Color...

Java mutable data types: StringBuilder, Randomized Queue, Queue, Stack, Priority Queue...

Elementary Symbol Tables – API & Implementations

A symbol table that is implemented with an unordered linked list uses the equals method to compare keys.
A symbol table that is implemented with an ordered array uses the compareTo method to compare keys.
Insert and Search are the basic operations of symbol tables.
Use immutable types for symbol table keys.
When implementing an overridden equals method for user-defined types, it is not good practice to compare the most frequently used fields first.
The type of the value can be any generic type at all. For the Key type we make one of the following assumptions: either it is Comparable and uses compareTo(), or it is a generic type that uses equals() to test equality; or it is a generic type, that uses equals to test equality and hashCode to scramble the keys.
The most expensive ordered symbol table operation in a binary search implementation is insert.

// PROBLEM #1
A symbol table can be implemented by using an ordered array.
Assume you have such a symbol table and N elements have already been inserted.
What is the average cost of searching and inserting? 

Answer: log N and N / 2

// PROBLEM #2
A symbol table can be implemented by using an unordered linked list.
Assume you have such a symbol table and N elements have already been inserted.
What is the average cost of searching and inserting?

Answer: N / 2 and N

Elementary Symbol Tables – BST & Ordered Operations

An Ordered Array has the best guaranteed performance when searching for an element in a symbol table.
All keys in the left subtree are smaller than the key of the parent node.
In a BST, each node has a left and a right subtree.
A BST is a binary tree in symmetric order.
A BST can be null.
Description of an insertion into a BST: if less, go left; if greater, go right; if null, insert.
In Java, a BST is a reference to a root Node. Generally, each node consists of 4 fields: a key, a value, a left subtree, and a right subtree.
The number of compares that is required to search for an element is: depth of node + 1
If your goal is to create an efficient binary search tree, then the best input set is: all elements in random order.
For a symbol table that is implemented with a BST, all ordered symbol table operations except for the iteration take time proportional to: the height of the tree.
In order to provide an efficient implementation of size(), rank(), and select(), we must add a field count to each node.

One of the following is not a BST. Which one is it?
The answer is D, since the letters might not be listed in order.

In which order are the elements of this BST returned when 
using the iterator to access all the elements?

1 3 4 6 7 8 10 13 14

Hibbard Deletion

1. Search for the node <i>t</i> that contains the key <i>k</i>.
2. If <i>t</i> has 2 children, find the successor <i>x</i> of <i>t</i>.
3. Delete <i>x</i> from <i>t</i>'s right subtree.
4. Put <i>x</i> in the position where <i>t</i> used to be.

When alternating inserting and deleting elements using Hibbard Deletion, over time the BST becomes less balanced. Deleting an element from a BST using Hibbard deletion takes time proportional to sqrt(N).

Balanced Search Trees

When we add a new node, we always add it with a red link.
The height of a left-leaning red-black BST is always less or equal to 2 log N.
There are two invariants – things that are always true for a left-leaning red-black BST: symmetric order and perfect black balance.
When connecting two elements (keys) with a red connection, the larger key is the root.
In a left-leaning red-black BST red links are always left leaning, and black links can be left-leaning or right-leaning.
No node has 2 red links connected to it.
Left-leaning red-black BSTs are a representation of a 2-3 tree as a BST.
A 3-node is implemented by connecting two nodes with a red link.
"Perfect black balance" means that every path from the root to a null link has the same number of black links.
If the color is set to true (RED) that means that the incoming link is red.
Sometimes we need to take a left-leaning red link and temporarily lean it right.
For left-leaning red-black BSTs the implementation of many symbol table operations like search, ceiling, selection, etc. are identical to the implementation in elementary BSTs. The color of the link is ignored.
The same identical code is used to implement searching in elementary BSTs and in left-leaning red-black BSTs; nonetheless left-leaning red-black BSTs have a better search performance (are faster) than elementary BSTs.
2-3 trees are generalizations of the binary search trees (BSTs).
2-3 trees allow for 2-nodes as well as 3 nodes.
Every path from the root to a leave has the same length.
2-3 trees are an implementation of balanced trees in guaranteed logarithmic time.
An in-order traversal of the a 2-3 tree yields the keys in ascending order
Characteristics of a 3-node in a 2-3 tree: All keys in the right subtree are bigger than the bigger key. All keys in the middle subtree are bigger than the smaller key and smaller than the bigger key. All keys in the left subtree are smaller than the smaller key.

PROBLEMS/ANALYSIS

During the insertion of elements (keys) we sometimes end up with 
red links that are leaning in the wrong direction. 
What should be done?
––> Perform one or more rotations. This will re-align the tree.

Assume you have a left-leaning red-black BST that includes N elements (keys).
All rotations in a left-leaning red-black tree are local operations.
What does 'local' mean in this context?
––> Rotations take a constant amount of operations that is independent of N.

While performing certain operations in left-leaning red-black BSTs you 
sometimes encounter a node that has 2 red links coming out.
––> That corresponds to a (temporary) 4-node of a 2-3 search tree.

What causes the height of a 2-3 tree to increase?
––> Splitting a temporary 4-node at the root.

Assume you have a 2-3 tree, that contains N elements (N keys).
Which big-O best describes the splitting of a temporary 4-node?
––> O(1).

Left Leaning Red-Black BSTs

A red-black tree is a kind of self-balancing binary search tree. Each node stores an extra bit representing "color" ("red" or "black"), used to ensure that the tree remains balanced during insertions and deletions.

// TODO

Geometric App - 1D Search

When processing geometric data using symbol tables and BSTs, the idea is that the keys are geometric data; not just simple strings or numbers.
1D range search is an extension of the ordered symbol table API. The new features are range search and range count.
1D range search examples:
Find all points on a line that are greater than n1 and less than n^2.
Find all conference rooms that accommodate at least 300 people but no more than 500.
Find all employees that earn between $100,000 and $250,000 in a database.

Ordered Array

Fast range count and range search but O(N) on insert. Binary Search Tree is O(log N) on insert and range count, O( R + log N) on range search. Unordered List Fast insert but O(N) on range count and range search.

// Which of the following methods implements 
// 1D range count? Answer: C.

// A) 
public int rangeCount (Key lo, Key hi) {
    if (contains (hi)) {
        return hi - lo + 1;
    } else {
        return hi - lo;
    }
}

// B)
public int rangecount (Key lo, Key hi) {
    if (contains(hi)) {
        return rank(hi) - rank(lo);
    } else {
        return rank(hi) - rank(lo) - 1;
    }
}

// C)
public int rangeCount (Key lo, Key hi) {
    if (contains(hi)) {
        return rank(hi) - rank(lo) + 1;
    } else {
        return rank(hi) - ranki(lo).
    }
}

// D)
public int rangeCount (Key lo, Key hi) {
    return min(hi) - max(lo);
}

Geometric App - KD Trees

2D Trees:

On odd levels 2d trees, use the y-coordinate as key (divide top | bottom).
On even levels 2d trees use the x-coordinate as key (divide left | right).
2d trees recursively partition the plane into two half-planes.

KD Trees:

Allow us to do efficient processing of sets of points in space.
Are an extension of BSTs.
Are very flexible and useful.
In the grid implementation we divide the space into an M-by-M grid of squares.
If M is too big, it will require too much memory.
If M is too small, it will require too much time.
A typical nearest neighbor search in a 2d tree is log N.
The grid implementation works great if the points are evenly distributed – typically this is not the case.

Assume we have N points in a given space and 
we want to divide the space up into a M x M grid.

// What would be a good rule of thumb when choosing 
// the right size of M for the M x M grid?
––> M should be sqrt(N).

Balanced Search Trees - Geometric App & Line Segment Intersection

In the sweep-line algorithm we sweep a vertical line from left to right.
When we execute the sweep-line algorithm and we hit the right end point of a horizontal line segment, we remove the line segment from the BST.
When a vertical line segment is hit, perform a range search on the BST to find all intersecting horizontal lines.
In the sweep-line algorithm, we consider each x-coordinate as an event.
Whenever we hit the left end point of a horizontal line segment, we insert the y-coordinate of that line segment into a BST.

// Given N horizontal and vertical line segments, find all intersections.
// Match the order of growth of the running time to the algorithm.

   Quadratic ––> Check all pairs of line segments for intersection.
Linearithmic ––> sweep-line algorithm.

Hash Tables

A hash table is a data structure that allows for the very fast retrieval of data, no matter how much data there is. For that reason hash tables are widely used in database indexing, compilers, authentication, caching, compilation error checking and much more. The insertion, deletion and retrieval of data from a hash table occurs in constant time (best case).

Consider a simple one-dimensional array variable to find an item in the list: you can employ a brute force approach such as a linear search. This would involve checking each item in term for a very big array – this could take a very long time... Now, suppose you want to retrieve a value from an element in the array and you happen to know the index number of that element. You can look up the value very quickly, indeed. You know the index number is independent of the size of the array and independent of its position in the array. But how can you know which elements of the array contains the value? The answer is that each index number can be calculated using the value itself. So that the index number is in some way related to the data.

Solving a collision by placing an item somewhere other than its calculated address is called open addressing because every location is open to any item. Open addressing can use a variety of techniques to decide where to place an item. Linear probing: if the calculator address is occupied, then a linear search is used to find the next available slot. If linear probing gets to the end of the array and it still can't find a free space, then it might cycle around to the beginning of the array and continue searching from there to look up an item in this hash table – the hash function is applied again, but if we've had collisions and some items are not in their calculator positions, then finding an item will also involve linear probing – that is a linear search.

The more items there are in a hash table, the more likely you are to get collisions when you insert even more data. One way to do with this is to make the hash table bigger than needed for the total amount of data. Make it big, such that only 70% of the hash table is ever occupied.

The size of the data array is known as the load factor. If the hash table is implemented as a resizable dynamic data structure, it could be made to increase in size automatically when the load factor reaches a certain threshold. In an ideal world, every item would be stored in the hash table according to its calculated index. In this best case scenario, the time-taking to find any particular item is always the same. Imagine a worst case scenario, too – depending on the nature of the data used to calculate the index values and depending on the appropriateness of the hash algorithm, some items may require a linear search of the whole table in order to be found.

Another way to do with collisions is known as "separate chaining." What we have here is a pointer to the first node of one resolving collisions. If the calculator address is occupied, then linear probing involves trying the next place along, and if necessary: the next and then the next, and so on until an empty slot is eventually found. However, this can result in what is known as primary clustering. In other words, keys might bunch together inside the array while large proportions of it remain unoccupied. A good resolution may involve looking at say every third slot along until the free space is found (plus three rehash).

When choosing or writing a hash function, there are some objectives to bear in mind:

It should minimize collisions so less time is spent on collision resolution – ultimately data retrieval will be quicker.
It should give you a uniform distribution of hash values – therefore the data will be spread across the hash table as uniformly as possible.
It should be easy to calculate, and it should include a suitable method able to resolve any collisions that may occur.

Hash Table – Separate Chaining

Separate chaining is a collision resolution strategy that makes use of elementary linked lists.
Hashing is preferred over red-black trees if we have short keys, where the hash function is easy to compute and we don't need any of the ordered symbol table operations .
Linear probing typically has better cache performance.
Separate chaining is less sensitive to poorly-designed hash functions.

Left-leaning red-black BST You write code for an auto-pilot. It is very important that all symbol table look-ups execute in the specified time. Hash Table You write code for a game that uses a lot of animation. Speed is your first concern when selecting a symbol table implementation.

// Assume a new item with a new key is added to 
// the hash table, that uses separate chaining.

// What is the description of the process?
--> map key to integer i 
--> insert new item at the front of i-th chain

// -------------------------------------------------------------------

// Assume you have 1000 items that need to be stored in a hash 
// table and you use separate chaining. Which array size should 
// you choose for your hash table if you want constant time searches,
// and you don't want to allocate too much extra memory?

--> 200

// For complicated keys it can be expensive to calculate a good hash code.
// There are different ways to respond to such a situation. What options should you consider?

- Consider using left-leaning red-black BSTs for your symbol table implementation. 
- Store the hash code so it has to be calculated only once.

// -------------------------------------------------------------------

// The performance of hash tables is based on the uniform hashing assumption.
// This can cause issues in a number of situations.
// Below you find 2 examples where the hash function can cause a problem:

- If the hash code function is known and exploited to create a denial-of-service attack.
- If a guaranteed performance is required (e.g. hard real time systems).

Hash Table – Linear Probing & Hash Functions

The basic plan of hashing is the following: we use the key to calculated the appropriate array index and that's where we'll store the item.
Hash functions calculate a table index based on the key. Each table index should be equally likely for each key.
Java provides customized implementations of hashCode for classes that are commonly used as keys like Integer, Double, String, etc. In Java the type double is 8 bytes long. However, the hashCode returned is a 4 byte int. How is this 4 byte (32 bit) integer calculated? Return xor of the most significant 32 bits with the least significant 32 bits.
The calculation of the hash code should not only be well distributed but also efficient (fast). In order to optimize the performance of the String hashCode implementation, Java adds a field to class String to store the hash value.
The method call key.hashCode() returns a value between −2^31 and 2^31 − 1. However, the array that we use to store our items has length M. Which expression is best suited to transform the hash code to a valid index of our array? Answer: (key.hashCode() & 0x7fffffff) % M.

The Birthday problem states that we can expect two balls in the same bin after N tosses. Once we have approximately 24 people in a room we can expect that 2 of them celebrate their Birthdays on the same day.

PROBLEM #1
// There are a number of issues that come with hashing.
// Below you find four options.
// Only three of them describe issues related to hashing.

    – compute the hash function 
    – override compareTo
    – test for equality
    – collision resolution

// override compareTo, does NOT belong here.

PROBLEM #2
// Here are 3 suggestions to create a simple hash function for employees. 
// Which one should you choose?

    – Use the zip code as a hash function for employees
    – Use the last 4 digits of the social security number as a hash function for employees 
    – Use the area code as a hash function for employees

// Use the last 4 digits of the social security number as a hash function for employees.

// All classes in Java inherit the method hashCode(), which returns an integer.
// The Java language has built-in rules for the implementation of hashCode.

// Below you find some statements regarding the method hashCode. 
// One of them is false. Which of the following statements is NOT true?

    if ( x.equals(y))
         then we know for sure that x.hashCode() == y.hashCode()

    if (x.hashCode() != y.hashCode()) 
         then we know for sure that (! x.equals(y))

    if (! x.equals(y))
         then we know for sure that x.hashCode() != y.hashCode()

// Option 3 is false.

Assume you have a linear probing hash table with uniform hashing.
The hash table has size M and the number of keys in the table is N (0 < N < M)

--> The average number of misses before a key can be inserted depends on N / M.

// -------------------------------------------------------------------

If the array is too large compared to the 
number of keys, we use up too much memory.
If the number of keys is too large compared to 
the size of the array, the search is too slow.

Let's assume that the size of the array is M and the number of keys is N.

Therefore,
    if N / M ~ 0.5, the number of probes for a search hit is about 3/2.

About linear probing hash tables:

If the given index is free the element (key) is stored right there.
If the given index is NOT free the element (key) is stored in the next available index.
The hash code is used to map the key to an index of the array.
Elements (keys) are stored in an array.

The best description of searching in a linear probing hash table is:

Calculate index ( i ) based on hash code of the key.
If the element is on index i, it is found; otherwise check the next index until the element is found, or an empty index is reached.

// CLASS EXERCISE
0  1  2  3  4  5
               E         
   A           E
   S           E
   Y           E
   Y           Q
   Y     U     Q
   Y     U     E
   S     U     E
   S  T  U     E
   S  T  I     E
   S  T  O     E
   S  N  O     E


E    (5 * 13) % 6 = 5
A    (1 * 13) % 6 = 1
S    (19 * 13) % 6 = 1
Y    (25 * 13) % 6 = 1
Q    (17 * 13) % 6 = 5
U    (21 * 13) % 6 = 3
E    (5 * 13) % 6 = 5
S    (19 * 13) % 6 = 1
T    (20 * 13) % 6 = 2
I    (9 * 13) % 6 = 3
O    (15 * 13) % 6 = 3
N    (14 * 13) % 6 = 2

Searching App – Sets, Indexing & Sparse Vectors

Matrices are often implemented with 2D arrays. However, if the matrix includes many zeros it is more efficient to use an alternative implementation – something like sparse vectors.
Vectors are often implemented using an array. However, sometimes it is more efficient to implement a vector with a symbol table. A good reason to implement a vector with a symbol table is vector sparse.
File indexing application: for its implementation we use a symbol table, where the key is of type String, and the value is of type Set<File>.
Sets can be used to implement exception filters. Exception filters filter a given text based on elements in a list.
Sometimes we want to print all words except the ones on the list – this is called black listing.
Sometimes we want to print only the words from that list – this is called white listing.

Examples of exception filters:

block all websites that are inappropriate for kids.
find all transactions that used a stolen credit card.
spell checker (list all words that are not in the dictionary).

Description of the dictionary application:

It reads in keys from StdIn and finds the corresponding entries from the dictionary.
It allows the user to specify which columns include the keys and values.
It allows the user to specify a csv file that includes the dictionary data.

About csv files:

Different csv files can have different amounts of columns.
csv stands for Comma Separated Values.
All rows in a csv file have the same amount of columns.
A set is a collection of distinct keys.
A set has only keys but no associated values.

Undirected Graphs – Introduction & API

An undirected graph is a set of nodes and a set of links between the nodes. Each node is called a vertex, each link is called an edge, and each edge connects two vertices. The order of the two connected vertices is unimportant. An undirected graph is a finite set of vertices together with a finite set of edges.

Two vertices are connected if there is a path between them.
What is the degree of a vertex v? The number of vertices connected directly to v. Cycle A path whose first and last vertices are the same. Path A sequence of vertices connected by edges. Undirected Graph A set of vertices connected pairwise by edges. Adjacency matrix is a V x V matrix of boolean value that indicates for each of the vertices whether it is connected to any other of the vertices. Nonetheless, this representation isn't used that often in real-world applications since the matrix takes too much memory.

The Graph API provides an overloaded constructor that reads in a graph from an input stream.

What's the best way to describe the expected file format? 
v1, .. vn are integers representing vertices that are connected by edges.

Answer:
i) number of vertices
ii) number of edges
iii) v1 v2
       .
         .
           .
...

// ------------------------------------------------------------------------------------------

Prof. Sedgewick's graph API represents vertices with integers between 0 and V - 1.

However, sometimes graphs use other types like letters or 
Strings to represent vertices. How can we use Prof. Sedgewick's 
graph classes when the client provides the vertices as Strings?
e.g.: 
    graph data where the vertices are cities and the connections indicate 
    whether there exists a train connection between the cities.

Answer:
// You could use a symbol table that matches the 
// strings to integers and then use his graph classes

There are three different common ways to represent a graph: Graph Memory Requirement Add Edge Edge Between V and W Iterating over vertices adjacent to V List of Edges E 1 E E Adjacency Matrix V^2 1 * 1 V Adjacency Lists E + V 1 degree(v) degree(v) Iterating over all the vertices that are adjacent to vertex v: List of Edges To iterate through the adjacent vertices you have to go through all the edges. Adjacency Matrix To iterate through the adjacent vertices you have to go through all the vertices. Adjacency Lists To iterate through the adjacent vertices you only have to go through the adjacent vertices. Adjacency List An adjacency list represents a graph as an array of linked lists. The index of the array represents a vertex and each element in its linked list represents the other vertices that form an edge with the vertex. The simplest adjacency list needs a node data structure to store a vertex and a graph data structure to organize the nodes. We stay close to the basic definition of a graph - a collection of vertices and edges {V, E}. For simplicity, we use an unlabeled graph as opposed to a labeled one i.e. the vertices are identified by their indices 0, 1, 2, 3. We use Java Collections to store the Array of Linked Lists.

Advantages:

An adjacency list is efficient in terms of storage because we only need to store the values for the edges.
For a sparse graph with millions of vertices and edges, this can mean a lot of saved space.
It also helps to find all the vertices adjacent to a vertex easily.

Disadvantages:

Finding the adjacent list is not quicker than the adjacency matrix because all the connected nodes must be first explored to find them.

// We can represent this graph in the form of a 
// linked list on a computer as shown below:
0 –> 1 –> 2 –> 3
1 –> 0 –> 2
2 –> 0 –> 1
3 –> 0

0, 1, 2, 3 are the vertices and each of them forms 
a linked list with all of its adjacent vertices. 

For instance:
    vertex 1 has two adjacent vertices 0 and 2. 
Therefore:
    1 is linked with 0 and 2 in the figure above.

Undirected Graphs – Connected Components

Determine whether v1 is connected to v2 in constant time.
Partition vertices into connected components.
They are like Union Find where all the trees are flat.
They do not use breadth first search to find the connected components.

Class Connected Components & Class Path (Depth First Search)

Both have a field of type boolean[] that keeps track whether a vertex has already been visited.
Both have a field of type int[] that associates a number with the vertices. However, which number is associated with the vertices differs.
Class Path stores the number of the previous vertex.
Class CC stores the component number.

Connected Components & Union Find

Both allow us to find out whether two vertices are connected.
Union Find provides more functionality – it allows us to add vertices and to check whether they are connected in arbitrary order.
Connected Components starts out with the finished graph and only checks whether two vertices are connected.
Only Connected Components can determine in constant time whether two vertices are connected.

Depth First Search: puts unvisited vertices in a bag.

Breadth First Search: puts unvisited vertices in a linked list.

Undirected Graphs – Breadth First Algorithm & Depth First Search

Breadth First Search

Breadth first search examines vertices in increasing distance from s.
Breadth first search is not a recursive algorithm.
Breadth first search solves the shortest path problem.

Breadth First Algorithm Design

put s on queue and marks s as visited;
repeat until queue is empty,
remove vertex v from queue,
add to queue all unmarked vertices adjacent to v;
mark them as visited.

Depth First Search

In order to implement depth-first search we use a class called Path, which has two methods: pathTo and hasPathTo.
It mimics maze exploration.
It does not looks at all adjacent vertices before it looks at vertices that are farther away from v.
It marks v as visited and recursively visits all unmarked vertices that are adjacent to v.
It is a systematic search through a graph.

Depth First Algorithm Design

create a Graph object (G);
pass G to a graph-processing routine;
query the graph-processing routine for information.

Class Path has two arrays (fields) that keep track of vertex 
related information while executing the depth-first search.
- an array that stores for each vertex whether the vertex has already been visited
- an array that stores the prior vertex (where we came from)

Class Path returns the path as an Iterable<Integer>. 
Iterable is an interface. 
What is actually returned in an instance of a class 
that implements the interface Iterable<E>.

// What is the type of the variable 
// that stores and return the path?
// Stack<Integer>

Directed Graphs – Introduction & API

Digraph Examples:

Create a graph that shows the spread of an infectious disease.
Create a graph that shows all placed phone calls within a company.
Create a graph that models the street connections of lower Manhattan.
Create a graph that shows how to navigate through a maze IS NOT an example of a digraph.

Digraph Set of vertices connected pairwise by directed edges Indegree Number of arrows (directed edges) coming in to the vertex Outdegree Number of arrows (directed edges) leaving the vertex

Both the Graph API and the Digraph API include the following method:

public Iterable<Integer> adj (int v)

In class Graph it returns all the vertices adjacent to a given vertex.
In class Digraph it returns all vertices, that can be reached directly from v.

Many real-world graphs are large and sparse. Because of that, we use an adjacency list to implement the Digraph. The Digraph has a method called reverse. It reverses the whole graph.

Directed Graphs – Digraph Search

Breadth first search (BFS) computes the shortest path from s to all other vertices in linear time.

What is the Mark-sweep algorithms used for? Garbage collection.
Every undirected graph is a digraph with edges in both directions.
Sometimes we need the shortest path from any vertex in a given set of sources (vertices) to any of the other vertices.
For this, we can use BFS but when initializing the queue we must enqueue all source vertices.
Sometimes we need to use breadth first search (BFS) but we don't know all the vertices because there are too many.
What should be done? Use BFS with an implicit digraph (enqueue vertices as they are discovered).

Depth First Search allows us to write simple solutions to a number of digraph problems.

The following can use DFS as a basis for a simple solution:
Reachability.
Finding a path from v1 to v2.
Detect direct cycles.
NOTE: a problem like finding the shortest path from v1 to v2 cannot be solved with depth first search.

Directed Graphs – Topological Sort

Topological order:

A digraph has a topological order, if and only if it has no directed cycle.
Cycle detection is an important application of depth first search in digraphs.
The topological sorting problem works on a DAG – it redraws the DAG so that all edges point upwards.

About DAGs:

No cycles.
Directed edges.
Used in topological sorting.

In order to implement the topological sort we do the following:

Run DFS.
Return the vertices in reverse post-order.
While running DFS, vertices are pushed on the stack as soon as they are done.

Assume you use a digraph to schedule a set of tasks, that have precedence constraints; in such a graph the tasks should be the vertices and the constraints should be the edges.

Minimum Spanning Trees (MST)

A minimum spanning tree is the minimum weight set of edges that connects all the vertices in a weighted graph. A spanning tree is a subgraph of an undirected graph with positive edge weights. An edge-weighted graph is a graph where we associate weights or costs with each edge. A minimum spanning tree (MST) of an edge-weighted graph is a spanning tree whose weight (the sum of the weights of its edges) is no larger than the weight of any other spanning tree.

MST – Assumptions

The graph is connected: the spanning-tree condition in our definition implies that the graph must be connected for an MST to exist. If a graph is not connected, we can adapt our algorithms to compute the MSTs of each of its connected components, collectively known as a minimum spanning forest.
The edge weights are not necessarily distances: geometric intuition is sometimes beneficial, but the edge weights can be arbitrary.
The edge weights may be zero or negative: if the edge weights are all positive, it suffices to define the MST as the subgraph with minimal total weight that connects all the vertices.
The edge weights are all different: if edges can have equal weights, the minimum spanning tree may not be unique. Making this assumption simplifies some of our proofs, but all of our our algorithms work properly even in the presence of equal weights.

MST – Underlying Principles

Adding an edge that connects two vertices in a tree creates a unique cycle. Removing an edge from a tree breaks it into two separate subtrees.

Visit the documentation to learn more.

MST – Edge Weighted Graph

An edge-weighted graph is a vertex-indexed arrays of bags of edges. Each edge is stored exactly twice.

Class Edge implements the interface Comparable<Edge>.
Class Edge has a method either() that returns either one of the endpoints.
Class Edge has a method other(int v) that returns the other endpoint that is not v.
Class Edge does not have a getter for each of the end points.

// How to print all the edges of a minimum spanning tree?

EdgeWeightedGraph G = new EdgeWeightedGraph(in);
MST mst = new MST(G);

// Use the iterator mst.edges() to print all edges.

MST – Greedy Algorithms

The cut property states that, given any cut, the crossing edge of min weight is in the MST. In the context of a minimum spanning trees, a cut in a graph is a partition of its vertices into 2 sets.

The following method colors black all edges in the the MST of any connected edge-weighted graph with V vertices: Starting with all edges colored gray, find a cut with no black edges, color its minimum-weight edge black, and continue until V-1 edges have been colored black.

2 simplifying assumptions for minimum spanning trees (MST):
// a) edge weights are distinct
// b) graph is connected

One of the statements below is NOT true.
Which one is the incorrect statement?

    If b) is not true a minimum spanning forsest is computed 
    If a) is not true there can be multiple MSTs 
    If b) is not true a MST does not exist 
    If a) and b) are true a MST exists and it is unique

// The incorrect answer is: 
// If b) is not true a MST does not exist.

Greedy algorithm does compute the MST.
However, choosing the cut and finding the min-weight edge can be 
difficult and expensive - especially in very large graphs.

The video lists 3 efficient implementations that differ in 
the way they choose the cut and find the min-weight edge. 
Two of them are listed below. Select BOTH of those 
implementations among the options below.

// Kruskal's algorithm & Prim's algorithm

In the greedy MST algrithm we start with all edges colored gray.
Then we do the following:

    find a cut with no black crossing edges;
    color its min-weight edge black.

MST - Euclidean Algorithm

There are N points in the plane.
The graph is an implicit dense graph.
We want to find the subset of edges that connects all the points and that is minimal.

MST – Kruskal's Algorithm

The time to compute a minimum spanning tree using Kruskal's algorithm is proportional to E log E. Such algorithm can be used to group V objects into k clusters. It is also true that a randomized linear-time minimum spanning tree algorithm exist.

Implementation of Kruskal's Algorithm

Consider the edges in ascending order or weight.
Add the next edge to the MST unless doing so would create a cycle. Possible data structure implementations:
Queue<Edge>, MinPQ<Edge>, and UF

MST – Prim's Algorithm

Prim's algorithm always connects to the tree the edge that is closest to the tree.

Eager Implementation of Prim's Algorithm

Start with vertex 0 and greedily grow tree T.
Add the minimum weight edge, that has exactly one endpoint in T.
Repeat until T includes V-1 edges. In such implementation of Prim's algorithm, the priority queue used has at most one entry per vertex.

Lazy Implementation of Prim's Algorithm

The implementation of Lazy Prim's algorithm computes the minimum spanning tree in time proportional to E log E. However, it requires extra space proportional to E. It is called lazy implementation because it allows edges on the priority queue that are obsolete.

Start with vertex 0 and greedily grow tree T.
Add the minimum weight edge, that has exactly one endpoint in T.
Repeat until T includes V-1 edges.

In order to be able to efficiently update elements on the priority queue, we use an indexed priority queue called IndexMinPQ.

IndexMinPQ allows the client to update the key by specifying the index and the new value.
IndexMinPQ has a method contains that checks whether a given index is on the priority queue.
IndexMinPQ does not have 2 constructors: i.e., a default constructor and a constructor that specifies the maximum amount of keys that will ever be in the priority queue.
IndexMinPQ uses parallel arrays to access quickly the information that is needed

Prim's algorithms requires V insert, V delete-min and E decrease-key.

The array implementation is optimal for dense graphs.
The binary heap implementation is much faster for sparse graphs.
In performance-critical situations consider a 4-way heap. Its running time depends on the implementation of the priority queue.

Shortest Paths – Dijkstra's Algorithm

Dijkstra's algorithm computes a shortest paths tree (SPT) in any edge-weighted digraph that has no negative weights. Dijkstra's algorithm and Prim's algorithm are essentially the same. The main difference is the rule that is used to select the next vertex for the tree.

Dijkstra's algorithm chooses the closest vertex to the source via a directed path. Prim's algorithm chooses the closest vertex to the tree via an undirected edge. Dijkstra's algorithm, Prim's algorithm, Kruskal's algorithm, DFS, and BFS all compute a graph's spanning tree.

In Dijkstra's algorithms each edge e = v -> w is relaxed exactly once (when v is relaxed).
Dijkstra's algorithm does not compute the shortest paths from all vertices to a given destination vertex.
Dijkstra's algorithm computes a spanning tree.
Dijkstra's algorithm computes a shortest paths tree in any edge-weighted digraph with non-negative weights.

Implementation of Dijkstra's Algorithm

To implement Dijkstra's algorithm, we use an IndexMinPQ as our data structure. The performance of Dijkstra's algorithm depends on the implementation of the priority queue. In practice most people use for large, sparse graphs: a binary heap or 4-way heap.

Consider vertices in increasing order of distance from s.
Add vertex to tree and relax all edges pointing from that vertex.
DirectedEdge is an immutable class.
Whenever we add a new edge to an EdgeWeightedGraph, we add the new edge to the adjacency list of the from-vertex (v).
The length of a path is proportional to the sum of the weights.
In order to compute the minimum spanning tree, you must use an edge-weighted graph.
In order to find the shortest path from one vertex to another, you must use an edge-weighted digraph.

To relax an edge v –> w means:

Test whether the best known way from s to w is to go from s to v, then take the edge from v to w, and, if so, update our data structures.

// Assume you have an object of type SP. It has been instantiated like this:

SP sp = new SP (G, s);
// Assume that v is a vertex of graph G.

// What does the following method call return?
sp.pathTo(v)

// All edges along the shortest path from s to v.

// Assume G is an edge-weighted digraph.

// The array distTo[] includes the shortest path distances from s if and only if:
distTo[s] = 0
// for each vertex v, distTo[v] is the length of some path from s to v.

// Which of the following options below is the missing condition of the proposition above?
for each vertex w, distTo[w] <= distTo[v] + e.weight()
for each edge e = v -> w, distTo[w] <= distTo[v] + e.weight() 
for each vertex v, distTo[w] <= distTo[v] + e.weight() 
for each edge e = v -> w, distTo[v] <= distTo[w] + e.weight()

// Answer: 
for each edge e = v -> w, distTo[w] <= distTo[v] + e.weight()

Generic Algorithm to Compute the Shortest Paths Tree from S

Initialize distTo[s] = 0 and distTo[v] = infinity for all other vertices v;
Repeat until optimality conditions are satisfied;
Relax any edge.

Let G be an edge-weighted digraph and s a designated vertex of G. A shortest-path tree (SPT) is a subgraph from G, that contains all the vertices reachable from s, and that forms a directed tree rooted at s such that every tree path is a shortest path in the digraph. For each G and s there exists a corresponding shortest-path tree.

Jargons & Resources

Real-time computing describes problems with hard time constraints, where a computation is considered to have failed not just if it gives the wrong results, but also if it obtains those results outside of specified time bounds. A real-time system as a whole often is required to have fast operation as well as predictable behavior.
Connected components: maximum set of objects that are mutually connected.
Concurrent computing: a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts. This is a property of a system—whether a program, computer, or a network—where there is a separate execution point or "thread of control" for each process. A concurrent system is one where a computation can advance without waiting for all other computations to complete.
Parallel computing: type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism.

Lectures from the book "Algorithms, 4th Edition" and API's documentation.

THEORY

Algorithms