Data Structure

算法数据结构学习

一数学基础

我们需要建立函数之间的一种 相对的级别，衡量两个函数的 相对增长率（relative rate of growth）

1、如果存在正常数 $c$ 和 $n_0$ 使得当 $N\ge n_0$ 时， $T(N)\le cf(N)$ ，记为 $T(N)=O(f(N)$ 。

2、如果存在正常数 $c$ 和 $n_0$ 使得当 $N\le n_0$ 时， $T(N)\ge cg(N)$ ，记为 $T(N)=\Omega(g(N))$ 。

3、 $T(N)=\Theta(h(N))$ 当且仅当 $T(N)=O(h(N))$ 且 $T(N)=\Omega(h(N))$ 。

也可以理解成，当 $T(N)=O(f(N))$ 时， $f(N)$ 是 $T(N)$ 的一个上界。反之，当 $T(N)=\Omega(g(N))$ 时， $g(N)$ 是 $T(N)$ 的一个下界。

Tips：不要将常数或者低阶项在入 $O(f(N))$ 中；我们可以取极限（ $\lim_{N\to \infin}$ ）来确定两个函数的相对增长率（可能需要用到洛必达）

在算法衡量中，我们关注的是时间，但是我们并不能直接拿时间作为衡量，因为跑程序的计算机算力之间也有差距，我们通常看算法的步骤

2 List、Stack and Queue

One of the basic rules concerning programming is that no routine should ever exceed a page. This is accomplished by breaking the program down into many modules.

An abstract data type ( ADT ) is a set of operations. Abstract data types are mathematical abstractions. No where in an ADT’s definition is there any mention of HOW the set of operations is complemented.

2.1 List ADT

The ADT ( Abstract Data Type ) of list is shown below:

#ifndef _LIST_H_
#define _LIST_H_
// include <...>
typedef int ElementType;
typedef struct Node{
    ElementType val;
    struct Node *next;
    // struct Node *pre; // double link...
}Node;

Node *MakeEmpety(Node *head);
int IsEmpty(Node *head);
int IsLast(Node *head);
Node *Find(Node *head, ElementType X);
Node *FindPrevious(Node *head, ElementType X);
void Insert(Node *head, ElementType X);
void Delete(Node *head, ElementType X);
void Destroy(Node **PtrToHead);
    
#endif

Node* MakeEmpty()

Pay attention to the type Node* , which means this function will return a pointer which point to Node . It is usually used as:

1	Node *head = MakeEmpty();

If you want to pass a pointer to function, you nedd to use Node **PtrToHead because if you use Node *head ，the function will only copy the value of pointer and what you have done to this pointer in function will only work in function. That means your pointer head will not change when the function return.

So you need to pass the address of pointer to change the pointer head as:

void MakeEmpty(Node **PtrToHead); // YES
void MakeEMpty(Node *head); 	  // This is wrong!!!!
Node *head=NULL;
MakeEmpty(&head);

void Insert(Node *head, ElementType X)

We can summarize Insert() fucntion as three steps:

Find position to insert
Malloc a new node and assign value to it ( include value and pointer )
Reconnect the list ( when you are not sure about it, drawing a picture will make you understand what you need to do )

and it is almost the same with Delete()

Exchange of two cells in a list

To exchange two cells in a list, we need four pointer to track the two cell and reconnect them in the list.The operations can be performed in two steps: ( exchange pos1 and pos2 )

Find pos1-1 and pos2-1, assign them to p1 and p2
Exchange pos1 and pos2 links ( include a link point to itself and a link point to its next )
- Keep a backup of p1->next and p1->next->next
  - Assign p1->next to temp, assigntemp->next to temp_next
- Assign p2->next to p1->next and assign temp to p2->next
- Assign p1->next->next to temp->next, and assign temp_next to p1->next->next

The code is shown below:

void Exchange(Node *head, int pos1 ,int po2){
    Node *p1 = head->next;
    Node *p2 = head->next;
    Node *temp = NULL;
    Node *temp_next = NULL;
    // Find the cell in pos1-1
    for(int i=0; i<pos1-1; ++i) p1=p1->next;
    // Find the cell in pos2-1
    for(int i=0; i<pos2-1; ++i) p2=p2->next;
    // exchange links
    temp = p1->next;
    temp_next = p1->next->next;
    p1->next = p2->next;
    p2->next = temp;
	temp->next = p1->next->next;
    p1->next->next = temp_next;
}

Reverse - Nonrecursive

To reverse a singly linked list in O(N), three pointers are necessary. p1,p2 point to the ends of the reversed link, and p3 points to the next cell.

1
2
3

graph LR
	h[head]-->r[p2,Reversed Link,p1]
	r-->n[p3,next cell]

void Reverse_Nonrecursive(){
    Node *p1,*p2,*p3;
    // at first, reversed link has only one cell (head->next),
    // so, two ends of link point to the same cell.
    p1 = head->next;
    p2 = head->next;
    if(head->next->next==NULL) return; // List must have at least two cells
    p3 = head->next->next;
    while(p3){
        p1->next = p3->next;
        p3->next = p2;
        p2 = p3;
        p3 = p1->next;
    }
    // At the end, remember to assgin reversed link to head
    head->next = p2;
}

Reverse - recursive

We have talked about reversion of a signly list in non recursive. We will discuss the recursive version.

In the recursive version, it is important for us to be clear about how to iterate over the link and how to end the iteration.

It is easy to think if I want to revrese a N cells link, we can make the N-1 cells reverse. When the link has only two cells ,we can easily reverse the link.

Node *Reverse(Node *head){
    if( head==NULL || head->next==NULL){
        return head;
    }
    Node *newhead = Reverse(head->next);
    head->next->next = head;
    head->next = NULL;
    return newhead;
}
// h1 is a link
h1->next = Reverse(h1->next)； // to keep h1 pointing the link (revresed)

Example 1：The Polynomial ADT （多项式）

We can use list to calculate the polynomial. Each term in polynomial is contained in one cell, and the cells are sorted in decreasing order of exponents.

graph LR
	head[Head];Node1[coefficient1, exponent1];
	Node2[coefficient2, exponent2];
	head-->Node1;
	Node1-->Node2;
	Node2-->End[...]

struct Polynomial{
    int coefficient;
    int exponent;
    struct Polynomial *next;
}

Once we get the polynomial ADT, we can try to present a single-variable polynomial and do some computing.

Example 2 : Radix Sort （基数排序）

Rdix sort is sometimes called as card sort.

2.2 Stack ADT

A stack is a list with restriction that inserts and deletes can be performed in only one position, namely the end of the list called the top.

The fundamental operation: push ( insert ), pop ( delete )

graph LR
	top[Top]-->e3[element3]
	e3-->e2[element2]
	e2-->e1[element1]

Stack is also called LIFO ( last in, first out )

Linked List Implementation of Stacks

MakeEmpty
Push is implemented as an insertion into the front of a linked list, where the front of the list servers as the top of the stack.
Top is performed by examining the element in the front of a linked list
Pop is implemented as an deletion of the front of the list.

The drawback of this implementation is that the calls to malloc and free are expensive, especially in comparison to the pointer manipulation routinue.

#ifndef _STACK_H_
#define _SATCK_H_

#include <stdio.h>
#include <stdlib.h>

typedef int ElementType;
typedef struct Node{
    ElementType val;
    struct Node *next;
} Node;
typedef Node Stack;

Stack *CreateEmpty();
void MakeEmpty(Stack *stack);
void Push(Stack *stack, ElementType X);
ElementType Pop(Stack* stack);
ElementType Top(Stack* stack);
int IsEmpty(Stack *stack);
void Destroy(Stack **PtrToStack);

void PrintStack(Stack *stack);

#endif

Array Implement of Stacks

This is a popular solution. The only potential hazard with this strategy is that we need to declare an array size ahead of time.

Initialization: tos (top of stack) equals to -1
Push operation: tos+1 and Stack[tos] = x
Pop operation: return Stack[tos], and then tos-1

One problem that affects the efficiency of implementing stacks is error testing. Our linked list implementation acrefully checked for errors.

#ifndef _STACK_H_
#define _SATCK_H_

#include <stdio.h>
#include <stdlib.h>

#define STACK_SIZE 32

typedef int ElementType;
typedef struct Array{
    int Capacity;
    int TOS;    // Top of stack
    ElementType *array;
} Array;
typedef Array Stack;

Stack *CreateEmpty(int Capacity);
void MakeEmpty(Stack *stack);
void Push(Stack *stack, ElementType X);
ElementType Pop(Stack* stack);
ElementType Top(Stack* stack);
int IsEmpty(Stack *stack);
void Destroy(Stack **PtrToStack);

void PrintStack(Stack *stack);

#endif

2.3 Queue

In queue, insertion is done at one end, whereas deletion is performed at the other end;

graph RL
	I[Insertion];Q[Queue];D[Deletion]
	D-->Q
	Q-->I

Like stack, queue can be implemented by list or array.

Linked List Implementation

Array Implementation

In the array implementation, queue data sturcture has an array queue[] , the position front and rear, which represent the ends of the queue. We keep track of the number of front and end actually in queue. Each operation makes front and rear and queue[] change.

enqueue: To enqueue a element x, we increment q_size and rear
dequeue: To dequeue a element x, we return queue[front] ,then increment front and decrement q_size

Remember to use circular array. And there are two warnings about the circular array implementation of queues.

First, it is important to check the queue for emptiness, because a dequeue when the queue is empty will return an undefined value.
Secondly, some programmers use different ways of representing the front and rear of a queue.

In applications where you are sure that the number of enqueue is not longer than the size of the queue, obviously the wraparound is not necessary.

3 Trees

Tree is a data structure that the running time for most operations is O(log n) on average. Trees in general are very useful abstractions in computer science.

3.1 Implement of Trees

One way to implement a tree would be to have in each node, beside its data, a pointer to each child of the node. The ADT is shown below:

struct tree_node{
    ElementType val;
    struct tree_node *first_child;
    struct tree_node *next_sibling;
}

Application of the Trees

One of the popular uses is the directory structure in many common OS, including UNIX, Linux…

The Traversal of the Trees

Preoder traversal: Work at a node is performed before (pre) its children are processed.
Postoder traversal: Wora at a node is performed after (post) its children are processed.

3.2 Binary Tree

A binary tree is a tree in which no node can have more than two children. A property of a binary tree that is somtimes important is that the depth of an average binary tree is considerably smaller than N. The average depth is $O(\sqrt{n})$ and that for a special type of binary tree, namely the binary search tree, the average value of the depth is $O(\log n)$ .

graph TD
	root[root]; left[Tree1]; right[Tree2];
	root--left-->left;
	root--right-->right;

Implementation of Binary Tree

struct TreeNode{
    ElementType val;
    struct TreeNode *left;
    struct TreeNode *right;
}

The Expression Tree

We can use a tree to represent a expression. The leavse of an expression tree is operands and the other node contain operators. (The expression of $b\times c+a$ is shown below )

graph TD
	r((+)); m((x));
	a((a)); b((b)); c((c));
	r-->a; r-->m; m-->b; m-->c

Impementation of Expression Trees

We have known how to use a stack to convert a infix to postfix. Based on this algorithm, we can contruct an expression tree.

convert infix to postfix in stack
get a symbol onece a time, and create a node for it
- If symbol is a operand, keep it in stack and continue.
- if symbol is a operator

3.3 The Search Tree ADT - Binary Search Trees

Let us assume that :

Each node in the tree is assigned a key value
All the keys are distinct, and deal with duplicates later.

The property that makes a binary tree into a binary search tree is that every node, X, in the tree, the values of all the keys in the left is smaller than the key value in X, and the values of all the kets in the right is larger than the key value in X.

The ADT of Tree:

#ifndef _BINARY_TREE_H
#define _BINARY_TREE_H

typedef int KeyType;
typedef struct TreeNode{
    KeyType key_value;
    struct TreeNode *left;
    struct TreeNode *right;
} TreeNode;

TreeNode *MakeEmpty(TreeNode *T);

TreeNode *Find(TreeNode *T, KeyType X);
TreeNode *Find_min(TreeNode *T);
TreeNode *Find_max(TreeNode *T);

TreeNode *Insert(TreeNode *T, KeyType X);
TreeNode *Delete(TreeNode *T, KeyType X);

KeyType Retrieve(TreeNode *P);
    
#endif

Find

Find can return a pointer to the node in T that key x, or NULL if there is no such node. The structure of the tree makes this simple. If T is NULL ,then just return T. Otherwise, check the key value of T, if it is x, return T. Otherwise, check key values in children of T.

What’s more, we can easily find the min or max just search the left or right child repeatedly.

TreeNode *Find(TreeNode *T, KeyType X){
    // if T is NULL or T's key_value is X, return T
    if( T==NULL || T->key_value==X)
        return T;
    else{
        // X is smaller than T->key_value, search T->left
        if( X < T->key_value ) return Find(T->left, X);
        // X is larger than T->key_value, search T->right
        if( X > T->key_value ) return Find(T->right, X);
    }
}

TreeNode *Find_min(TreeNode *T){
    // T==NULL? must check first because if T is NULL, we can not access T->left
    if( T == NULL) return T;	
    if( T->left==NULL) return T;
    else return Find_min(T->left);
}

TreeNode *Find_max(TreeNode *T){
    if( T == NULL) return T;
    if( T->right==NULL) return T;
    else return Find_max(T->right);
}

Insert

To insert X into tree T, we proceed down the tree as you would with a Find.

If x is found, do nothing.
Otherwise, insert X at the last spot on the path traversed.

Duplicates can be handled by keeping an extra field in the node to recored indicating the frequency of occurence. Although it will add the space of trees, but it worth it. (It is better than add duplicates in the tree, which may make the tree very deep)

TreeNode *Insert(TreeNode *T, KeyType X){
    if(T==NULL){
        // create a new node for X
        T = (TreeNode*)malloc(sizeof(TreeNode));
        T->key_value = X;
        T->left = T->right = NULL;
    }else
    {
        if( X < T->key_value ) T->left = Insert(T->left, X);
        if( X > T->key_value ) T->right = Insert(T->right, X);
    }
    return T; // do not forget this line !!!
}

Delete

When we try to delete a node in Tree T. We weil meet 4 situations:

if we can not find X in tree T, just do nothing
if the node is a leaf, just free it.
if the node has one child, we need to replace this node with its child and free this node;
if the node has two children, we need to find a proper node to replace this node and free this node.
- The general strategy is to replace the key_value of this node with the smallest key of the right subtree, and recursively delete that node.

The deletion can be separated to 2 step, one is to search the node with key value X and 2 is to delete this node.

TreeNode *Delete(TreeNode *T, KeyType X){
    if(T==NULL) printf("T is nULL\n");	// if T is NULL, do nothing
    else
    {	// T is not NULL
        // search for X recursively
        if( X < T->key_value ) T->left = Delete(T->left, X);
        else if( X > T->key_value ) T->right = Delete(T->right, X);
        else // find node, and then check its child
        {
            TreeNode *temp = NULL;
            if(T->right && T->right) // two children
            {
                // The commom strategy is to replace this node with
                // the node with min key_value in the right subtree
                // delete the min node recursively in the end
                temp = Find_min(T->right); 
                T->key_value = temp->key_value;
                T->right = Delete(T->right, T->key_value);
            }else{	// one or zero child
                // Just delete it and replace this node with its 
                // only child or NULL
                temp = T;
                if(T->right) T=T->right;
                else T=T->left;
                free(temp);
            }
        }    
    }
    return T;	// remember to return a pointer
}

3.4 AVL-Tree

As we calculated above, the expected depth of a binary tree is $\log_2N$ . However, it is hard for us to build a tree and keep its depth close to $\log_2N$ without supervision. A good idea is to add a filed to note the depth of this node. Given this new variable, we can define that a good bianry tree is which the depth of left subtree and right subtree are close. $|D_{left}-D_{right}|\le1$ .

When a insertion or deletion makes the tree unbalance, a way to correct it is to rotation the tree. A single rotation will be fine in a simple situation, whereas we need double rotation in a complicated situation.

Nodes of AVL Tree

If an AVL tree has a height $h$ , the minum number of nodes is $S(h)$ :

$S(h)=S(h-1)+S(h-2)+1$

$S(0)=1,S(1)=2$

Situations

There are four situations after the insertion make the tree unbalanced. If node A is unbalanced, we need to consider the insertion:

insert x into the left child of the left subtree of A
insert x into the right child of the right subtree of A
insert x into the right child of the right subtree of A
insert x into the left child of the right subtree of A

Case 1 and 3 are considered as outside insertion, and case 2 and 4 are considered as inside insertion

When a outside insertion makes the tree unbalanced, we need to SingleRotate, whereas a inside insertion need DoubleRotate

// static: function only word in which declare it.
static TreeNode *SingleRotateWithLeft(TreeNode *T){
    TreeNode *newT = T->left;
    T->left = newT->right;
    newT->right = T;
    // update height
    T->height = MAX(Height(T->left), Height(T->right)) +1;
    newT->height = MAX(Height(newT->left), Height(newT->right)) +1;
    // return newT
    return newT;
}
static TreeNode *DoubleRotateWithLeft(TreeNode *T){
	T->left = SingleRotateWithRight(T->left);
    return SingleRotateWithLeft(T)
}

3.5 Tree Traversals

Inorder Traversal : Process the left subtree first, then perform processing at the current node, finally process the right subtree.
Preoder Traversal : The node is processed before its children.
Postorder Traversal : Process both subtrees first before we can process a node.

3.6 B-Tree

A B-Tree of order m is a tree with the following structural properties:

The root is either a leaf or has between 2 and m children.
All non leaf nodes ( except the root ) have between [m/2] and m children
All leaves are at the same depth.

4 Hashing

We will discuss the hash table ADT, which surpport only a subset of operations allowed by binary search trees.

The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in const average time .

Choose a function, decide what to do when two keys hash to the same value (collision) and the size of table.

1 2	// index for table typedef unsigned int Index;

4.1 Hash Function

It is usually a good idea to ensure that the table size is prime (质数) . If the input key values are integers, then simply returning $key \mod H\_SZIE$ is a commom strategy. If the input key values are string, one option is to add up all ASCII values of the characters in the string.

Horner’s rule: $sum_{i=0}^{KeySize-1}Key[KeySize-i-1]\cdot32^i$

Index Hash(const char *key, int TableSize){
    Index HashVal=0;
    while(*key != '\0'){
        HashVal = (HashVal << 5) + *key++;
    }
    return HashVal % TableSize;
}

4.2 Open Hashing (Separate Chaining)

In open hashing, the table element is a linked list. If the memory is tight, we can ignore the header, generally we will add headers. Sometimes new elements are inserted at the front of the list, since it is convenient and also because frequently it happens that recently inserted elements are the most likely to be accessed in the near future.

#ifndef _OPENHASH_H_
#define _OPENHASH_H_

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MIN_TABLE_SIZE 61

typedef unsigned int Index;
typedef char *ElementType;
// linked list 
typedef struct ListNode{
    ElementType Element;
    sturct ListNode *next;
} ListNode;
// Hash Table
typedef struct HashTable{
    int TableSize;
    ListNode **Table; 		// Pay attention to the double '*', the member variable 'Table' is a ptr of the address of place to store the headrs, which is also a ptr of a ListNode.
} HashTable;
// function
HashTable *InitTable( int TableSize );
void DestroyTable( HashTable *H);
ListNode *Find(ElementType Key, HashTable *H);
void Insert(ElementType key, HashTable *H);
ElementType Retrieval( ListNode *P);
// optional, it depends on the way of insertion.
Index Hash_String(const char *key, int TableSize);

#endif

4.3 Closed Hashing (Open Addressing)

Closed hashing, also known as open addressing, is an alternative to resolving collision with linked lists. In closed hashing, if a collision occurs, alternate cells are tried until an empty cell is found.

Fomalization:

h_i(x)=(Hash(x)+f(i))\mod \text{Table\_Size}

Hash function is mapping the key into number, so the f deals with the collision. There are three example of f:

Linear Probing: If the collision occurs, we will chose the next available spot. $f(i)=i$
Quadratic Probing: This is what we would expect-the collision function is quadratic. The popular choice is $f(i)=i^2$
Double hashing: $f(i)=i\cdot h_2(x)$ The function must never evaluate to zero. It is also important to make sure all cells can be probed. A function such as $h_2(x)=R-(x\ mod\ R)$ , with R a prime smaller than H_SIZE, will. work well.

4.4 ReHashing

4.5 Extendible Hashing

5 Heap - Priority Queues

The Heap must allow at least the two operations: Insert and Delete_min ( return and remove the minimum element in the heap ).

graph RL
	M[Min_Elemnt]; H[Heap]; I[Insert_Element];
	I--insert-->H;
	H--delte-min-->M

Heap ADT

#ifndef _HEAP_H_
#define _HEAP_H_

#include <stdio.h>
#include <stdlib.h>

typedef int ElementType;
typedef struct Heap{
    int Capacity;
    int Size;
    ElementType *data;
} Heap;
// initialize
Heap* InitHeap(Heap *H);
void Destroy(Heap *H);
void MakeEmpty(Heap *H);
// basic operations
void Insert(ElementType X, Heap *H);
ElementType DeleteMin(Heap *H);
ElementType FindMin(Heap *H);
// check
int IsEmpty(Heap *H);
int IsFull(Heap *H);

#endif

InitHeap: Build a empty binary heap. Because of the property of heap, we need to set a value smaller than any datas in data[0]. The first data is inserted in position 1.

Heap *InitHeap( int MaxSize ){
    Heap *H = (Heap*)malloc(sizoef(heap));
    if(!H) return NULL;
    H->Capacity = MaxSize;
    H->Size = 0;
    // there need N+1 cells
    H->data = (ElementType*)malloc((H->Capacity+1) * sizeof(ElementType)); 
    if(!H->data) return H;
    H->data[0] = DATA_MIN;
    return H;
}

Insert: To keep the smallest value in the position 1 ( top ), we need carefully insert new data in binary heap. A way to insert data properly is to find a proper position to insert. First, we make a new empty cell in the rear ( H->data[ H->Size+1 ] ), compare inserted value X with the empty cell’s parents ( i = i/2 ), if the X is smaller than a parant, move parent to its child which is in position i. Keep moving unitil X is larger than a parent. The index i is where the X can insert.

void Insert( ElementType X, Heap *H ){
    int i;
    if(IsFull(H)) return;
    for(i=++H->Size; X<H->data[i/2]; i=i/2){
        H->datap[i] = H->data[i/2];		// move
    }
    // Insert X in postion i
    H->data[i] = X;
}

DeleteMin: Compared with the Insertion, deleting a element is a little more complicated. When we deleted a element, there will be a empty cell in binary heap. We need to move element to fill this empty cell. One strategy is to move the smallest child to the empty cell, which was deleted. After move, there can be a new empty cell, so we need to repeat the move until there is no new empty cell.

ElementType DeleteMin(Heap *H){
    int i, child;
    ElementType Min, Last;
    if( IsEmpty(H) ) return H->data[0];
    Min = H->data[1];
    Last = H->data[H->Size--];
    for(i=1, i*2<=H->Size; i=child){
        // find smallest child
        child = i*2;
        if(child!=H->Size && H->data[child+1]<H->data[child]){
            ++child;
        }
        // Precolate one level
        if(Last > H->data[child]){
            H->data[i] = H->data[child];
        }else{
            break;
        }
    }
    H->data[i] = Last;
    return Min;
}

6 Sorting

Sort an arry of elements. To simplify matters, we will assume in our examples that the array contains only integers.

6.1 Insertion Sort – O(N^2)

Insertion sort consists of n-1 passes. For pass P=2 through n, insertion sort ensure that the elements in position 1 through P are in sorted order. This algorithm is simplest but not very efficient.

6.2 ShellSort

6.3 HeapSort

Every time pop the Min Element and save in rear. After N-1 passes, we will get the odered array.

A DeleteMin is cost $O(\log N)$ , and there are N times deleteMin. So the total cost is about $O(N\log N)$

void HeapSort(ElementType A[], int Size){
    Heap H;
    ElementType Tmp;
    H.Size = Size;
    H.Capacity = Size;
    H.data = A;
    for(int i=Size; i>1; --i){
        Tmp = DeleteMin(&H);
        H.data[i] = Tmp;
    }
}

6.4 MergeSort

Mergesort runs in $O(n\log n)$ worst-case running time, and the number comparisons used is nearly optional. It is a fine example of a recursive algorithm.

The basic idea of this sort is to divide the array into two parts recursively. And merge the two divided parts into a temporary array in order. Obviously, every time we are going to merge two parts, it is sure that the two parts are in order.

The complementation can be divided into three parts:

typedef int ElementType;
// the API of mergesort
void MergeSort(ElementType A[], int N);
// divid array into two parts, sort them and merge them
void Sort(ElementType A[], ElementType Tmp[], int Left, int Right);
// merge two parts
void Merge(ElementType A[]. ElementType Tmp[], int Lpos, int Rpos, int RightEnd);

The details of these functions are shown below:

void MergeSort(ElementType A[], int N){
    // 1, Build a temporary array to merge elements
    ElementType *Tmp = (ElementType*)malloc(sizeof(ElementType)*N);
    if(!Tmp) {printf("out of space!\n")l return; }
    // Start sorting. 
    // Because the start pos is 0, the end postion is N-1.
    Sort(A, Tmp, 0, N-1);
    // release the temporary array
    free(Tmp);
}
void Sort(ElementType A[]. ElementType Tmp[], int Left, int Right){
    // assign a variavle to keep the center
    int center;
    if(Left < Right){
        center = (Left+Right)/2;
        Sort(A, Tmp, Left, center);
        Sort(A, Tmp, center+1, Right);
        Merge(A, Tmp, Left, center+1, Right);
    }
}
void Merge(ElementType A[], ElementType Tmp[], int Lpos, int Rpos, int RightEnd){
    // first, we need to mark the end of left part and the begin of the right part. Mark the totall number of two parts elements.
    // initialize the TmpPos
    int LeftEnd, NumElements, TmpPos;
    LeftEnd = Rpos-1;
    TmpPos = Lpos;
    NumElements = RightEnd-Lpos+1;
    // main loop
    while( Lpos<=LeftEnd && Rpos<=RightEnd ){
        // copy the larger element into Tmp
        if(A[Lpos]<=A[Rpos]){
            Tmp[TmpPos++] = A[Rpos++];
        }else{
            Tmp[TmpPos++] = A[Lpos++];
        }
    }
    // copy the left element in two parts.
    while( Lpos<=LeftEnd ) Tmp[TmpPos++] = A[Lpos++];
    while( Rpos<=RightEnd ) Tmp[TmpPos++] = A[Rpos++];
    // copy back to A[] from Tmp[]
    for(int i=0; i<NumElements; ++i, --RightEnd){
        A[RightEnd] = Tmp[RightEnd];
    }
}

6.5 Quick Sort

The quick sort is like the merge sort. We will chose a element as pivot and partition the array into two parts. Repeat these operations recursivel. The key point is how to chose the pivot. A way is to chose the first element, although it is simple for coding, this algorithm will be in trouble if the data is in reversed order. It will cost $O(n^2)$ in the worst case. Another way is called median-of-three. We compare the elements in first, median and last position and sort then in place. Then keep the median as pivot. It will make the worst case easier.

Recursive Implementation

The classic implementation is to recursive algorithm. The implementation is shown below.

typedef int ElementType;
// swap two elements
static void _swap(ElementType *A, ElementType *B){
    ElementType tmp = *A;
    *A = *B;
    *B = tmp;
}
// InsertionSort(): To optimize the algorithm
void InsertionSort(ElementType Array[], int Size){
    ElementType Tmp;
    int i;
    for(int n=0; n<Size; ++n){
        Tmp = Array[n];
        for(i=n; i>0 && Array[i-1] > Tmp; --i)
            Array[i] = Array[i-1]; 
    }
}
// Median of three
ElementType Median_of_three(ElementType A[], int left, int right){
    int center=(left+right)/2;
    if(A[left]>A[center]) _swap(A+left, A+center);
    if(A[left]>A[right])  _swap(A+left, A+right);
    if(A[center]>A[right])_swap(A+center, A+right);
    _swap(A+center, A+right-1);
    return A[A+right-1];
}
// QuickSort Core
#define CUTOFF 3
void QuickSort_Core(ElementType A[], int left, int right){
    if(left+CUTOFF>right) InsertionSort(A+left, right-left+1);
    else
    {
        // find the pivot
        ElementType pivot = Median_of_three(A, left, right);
        // partition
        int i=left, j=right-1;
        while(1){
            while(A[i]<pivot) ++i;
            while(A[j]>=pivot)--j;
            if(i<j) _swap(A+i, A+j);
            else break;
        }
        _swap(A+i, A+right-1);
        // recursive
        QuickSort_Core(A, left, i-1);
        QuickSort_Core(A, i+1, right);
    }
}
void QuickSort(ElementType A[], int Size){
    if(Size<10) InsertionSort(A, Size);
    else
    {
        QuickSort_Core(A, 0, Size-1);
    }
}

Non-Recursive Implementation

We can convert every recursive implementation, in theory, into a non-recursive implementation, which consider the recursive calls of function as the pop and push in a stack. So does QuickSort.

Considering a stack, whose elements are the two ends (left, right) of the sub-array to be sorted. In recursive case, we just call the QuickSort function and pass the left and right, however, we can build a stack and push or pop element to process the quick sort. (By the way, the code shown below doesn’t include the implementation of stack. )

typedef struct Cell{
    int left;
    int right;
} Cell;
typedef struct Stack{
    int capacity;
    int Tos;
	Cell *Array;
} Stack;
void QuickSort_NR(ElementType A[], int Size){
	Stack *stack = InitStack(Size/2); // allocate Size/2 space for stack
    Cell LR,new; LR.right = Size-1; LR.left = 0;
    ElementType pivot;
    int i,j;
    Push(stack, LR); // push the first step indexs
    // quick sort
	if(Size<10) InsertionSort(A, Size);
    else
    {
    	while(IsEmpty(stack)!=1)
        {
            LR=Pop(stack);
            if(LR.left+3>LR.right){ InsertionSort(A, LR.right-LR.left+1); continue;}
            pivot = Median_of_three(A, LR.left, LR.right);
            i = LR.lefy; j = LR.right-1;
            while(1)
            {
                while(A[i]<pivot)  ++i;
                while(A[j]>=pivot) --j;
                if(i<j) _swap(A+i,A+j);
                else break;
            }
            // recover
            _swap(A+i, A+LR.right-1);
            // push new index into stack.
            new.left = LR.left; 
            new.right = i-1;
            if(new.left < new.right) Push(stack, new);
            new.left = i+1; 	
            new.right = LR.right;
            if(new.left < new.right) Push(stack, new);
        }   
    }
}

7 The Disjoint Set ADT

8 Graph Algorithm

In this chapter, we discuss several common problems in graph theory. Not only are these algorithm useful in practice, they are interesting because in many real-life apllications they are too slow unless careful attention is paid to the choice of data structures.

8.1 Definitions

A graph $G=(V,E)$ consists of a set of vertices and a set of edges. Each edge is a pair $(v,w)$ , where $v,w\in V$ .If the pair is ordered, then the graph is directed. Directed graph are sometimes reffered to as digraphs. Sometimes an edge has a third component, known as either a weight or a cost.

A path in a graph is a sequence of vertices $w_1,w_2,...,w_n$ such that $(w_i,w_{i+1})\in E $ for $1\le i <n$ . The length of such a path is the number of edges on the path, which is equal to n-1. We allow a path from a vertex to itself. If this path contains no edges, then the length is 0. This is a convenient way to define an otherwise special case. If the graph contains an edge $(v,v)$ from a vertext to itself, then the path v,v is sometime referred to as loop.

A cycle in a directed graph is a path of length at least 1 such that $(w_1,w_n)$ . This cycle is simple if the path is simple. For undirected graphs, we require that the edges be distinct. The logic of these requirements is that the path u,v, u in an undirected graph should not be considered a cycle, because $(u.v)$ and $(v,u)$ are the same edge. In a directed graph, these are different edges, so it makes sense to call this a cycle.

DAG: Directed Acyclic Graph

An undirected graph is connected if there is a path from every vertext to every vertex. A directed graph with this property is called strongly connected. If a directed graph is not strongly connected, but the underlying graph (without direction to arcs) is connected, then the graph is said to be weakly connected. A complete graph is a graph in which there is an edgs between every pair of vertices.

8.2 Representation of Graphs

We will consider directed graphs. One simple way to represent is to use a two-dimensional array. This is known as an adjacency matrix representation. For each edge (u,v) , we set A[u][v]=1 , otherwise the entry in the array is 0. If the edge has a weight associated with it, then we can set A[u][v] equal to the weight and use either a very large or a very small weight as a sentinel to indicate nonexitent edges.

When the graph is dense, in other words, when there are many edges in a graph, adjacency matrix is an appropriate representation. However, if the graph is sparse, a better solution is an adjacency list representation. For each vertex, we keep a list of all adjacent vertices. The space requirement is then O|E|+|V| .

The leftmost structure in figure shown below is merely an array of header cells. The representation should be clear. If the edges have weights, then this additional information is also stored in the cells.

Graph Representation: Edge list, Adjacency Matrix, and Adjacency lists

Adjacency lists are the standard way to represent graphs. Undirected graph acn be similarly represented. Each edges (u,v) appears in two lists, so the space usage essentially doubles. A common requirement in graph algorithm is to find all vertices adjacent to some given vertex v, and this can be done, in time proportional to the number of such vertices found, by a simple scan down the appropriate adjacency list.

Also, in real-life application, the vertices have names, which are unknown at compile time, instead of using numbers. Since we can not index an array by an unknown name, we must provide a mapping of names to numbers. And that is what we have done before, the Hashing ! We can map the key word into a hash table. With this transformation, all the graph algorithm will use only the internal numbers.

8.3 Topological Sort

A topological sort is an ordering of vertices in a directed acyclic graph, such that if there is a path from $v_i$ to $v_j$ , then $v_j$ appears after $v_i$ in the ordering.

A simple algorithm to find a topological ordering is first to find any vertex with no incoming edge. We can then print this vertex, and remove it, along with its edges, from the graph. Then we apply same strategy to the rest of the graph.

Formalize：

we define the indegree of a vertex v as the number of edges (u,v). We compute the indegree of all vertices in the graph. Assuming that the indegree array is initialized and that the graph is read into an adjacent list.

void Topsort(Graph G){
    int counter;
    Vertex V,W;
    for(counter=0; counter < NumVertex; ++counter){
        V = FindNewVertexOfIndegreeZero();
        if(V==NotAVertex){
            Error("Graph has a cycle");
            break;
        }
        TopNum[V] = counter;
        for each w adjacent to v
            Indegeree[W]--;
    }
}

Scanning down the array every time we call FindNewVertexOfIndegreeZero() is a inefficient algorithm. We can keep all the vertices of indegree 0 in a speicial box to remove this inefficiency, which can be implemented with stack or queue.

void TopSort(Graph G){
    Queue Q;
    int counter=0;
    Vertex V,W;
    Q = CreateQueue(NumVertex); MakeEmpty(Q);
    for each vertex
        if(Indegree[V]==0) Enqueue(E,Q);
    while(!IsEmpty(Q))
    {
        V = Dequeue(Q);
        TopNum[V] = ++counter; // Assign next number
        for each W adjacent to V
            if(--Indegree[W]==0)
                Enqueue(W,Q);
    }
    if( counter != NumVertex){
        Error("Graph has a cycle");
    }
    DisposeQueue(Q); // Free the memory
}

8.4 Shortest-Path Algorithm

The input is a weighted graph: associated with each edge (vi, vj) is a cost cij to traverse the arc. The cost of a path v1…vn is $\sum_{i=1}^{n-1}c_{i,j+1}$ . This is referred to as the weighted path length. The unweighted path length is merely the number of edges on the path, namely, n-1.

Unweighted Shortest Paths

There is a undirected graph G. The input is a vertex in graph G, and we want to find the shortest path from s to all other vertices. We are only interested in the number of edges contained on the path, so there are no weights on the edges.(We can juest assign all edges a weight of 1)

Breadth-First Search: It operates by processing vertices in layers; the vertices closest to the start are evaluated first, and the most distant vertices are evaluated last. This is much the same as a level-order traversal for trees.

Weighted Shortest Path

If the graph is weightedm, the problem becomes harder, but we can still use the ideas from the unweighted case.

We keep all of the same information as before. Thus, each vertex is marked as either known or unknown. A tentative distance $d_v$ is kept for each vertex, as before.

The general method to solve the single-source shortest-path problem is known as Dijkstra Algorithm, which is a greedy algorithm. Greedy algorithm genreally solve a problem in stages by doing what appears to be the best thing at each stage.

Dijkstra Algorithm：It proceeds in stages, just like the unweighted shortest-path algorithm. At each stage, Dijkstra’s algorithm selects a vertex v, which has the smallest $d_v$ among all the unknown vertices, and declares that the shortest path from s to v is known. The remainder of a stage consists of updating the values of $d_w$ .

In the weighted case, we set $d_w=d_v+c_{v.w}$ if this new value for $d_w$ would be an improvement. Put simply, the algorithm decides if the cost to w is lower when this vertex v is in the path.

Graph ADT:

To implement the Dijkstra algorithm, we need to implement the undirected weighted graph first.

#ifndef _GRAPH_H_
#define _GRAPH_H_

#define NotAVertex (-1)
#define INF 999

typedef int VertexType;
typedef int MatrixType;
typedef int WeightType;

// Graph: Adjacency Matrix
typedef struct Graph{
    int NumVertex;
    int NumEdge;
    VertexType *V;
    MatrixType **Matrix;
} Graph;

Graph *CreateGraph(int NumVertex);
void AddEdge(Graph *G, VertexType from, VertexType to, WeightType W);
void Remove(Graph *G, VertexType from, VertexType to);
void PrintGraph(Graph *G);

#endif

Dijkstra Algorithm

The basic idea to implement Dijkstra is to read graph and save it as a adjacency table. Traversal adjacency table, update the Dist and set Known to 1. If the Dist updates, we set Path to record this vertex.

#ifndef _DIJKSTRA_H_
#define _DIJKSTRA_H_

#include "graph.c"

typedef VertexType ElementType;
typedef MatrixType DistType;

// Linked List
typedef struct Node{
    ElementType val;
    MatrixType weight;
    struct Node *next;
} Node;

typedef struct TableEntry{
    Node *Header;
    int Known;
    DistType Dist;
    VertexType Path;
} Table;

void InitTable(Vertex Start, Graph 8G, Table *T);
void PrintPath(Vertex V, Table *T);
void PrintTable(Table *T, int Size);
void Dijkstra(Table *T, int Size);

#endif

static void ReadGraph(Graph *G, Table *T){
    // build table
    for(int i=0; i<G->NumVertex; ++i){
        T[i].Header = NewHeader(i,0);
        // build adjacency list
        for(int j=0; j<G->NumVertex; ++j){
            if(G->Matrix[i][j]!=INF && i!=j) Append(T[i].Header, j);
        }
    }
}
void InitTable(Vertex Start, Graph *G, Table *T){
    ReadGraph(G, T);
    for(int i=0; i<G->NumVertex; ++i){
        T[i].Known = 0;
        T[i].Dist = INF;
        T[i].Path = NotAVertex;
    }
    T[Start].Dist = 0;
}
void Dijkstra(Table *T, int Size){
    VertexType V,W;
    int find=0;
    // find smallest unknown distance vertex
    for(Vertex i=0; i<Size; ++i){
        if(T[i].Known==0){
            V=i;
            find=1;
            break;
        }
    }
    if(find==0) break;
    T[V].Known = 1;
    // for each W adjacenct to V
    Node *p = T[V].Header;
    while(p!=NULL){
        if(T[p->val].Known==0){
            if(T[V].Dist+p->weight<T[p->val].Dist){
                T[p->val].Dist = T[V].Dist+p=>weight;
                T[p->val].Path = V;
            }
        }
        p=p->next;
    }
}

9 Algorithm

9.1 Greedy Algorithm

In each phase, a decision is made that appears to be good, without regard for future consequences. Generally, this means that some local optimum is chosen, This “take what you can get now” strategy is the source of the name of this class of algorithm. When the algorithm is correct, we hope that the local optimum is equal to the global optimum.

A Simple Scheduling Problem

We are given jobs $j_1,j_2,j_3,...,j_n$ , all with known running times $t_1,t_2,..,t_n$ respectively. We have a single processor. What is the best way to schedule these jobs in order to minimize the average completion time?

Nonpreemptivce Scheduling : The schedule that arranges the job by shortest job first will always yield an optimal schedule.

Huffman Codes - File Compression

Approximate Bin Packing

On-line Algorithm
Off-line Algorithm

9.2 Divide and Conquer

A common technique used to design algorithm is divide and conquer.

Divide: Smaller problem are sovled recursively. (except, of course, base cases)
Conquer: The solution to the original problem is then formed the solutions to the subproblem.

Closest-Points Problem

The Selection Problem