算法数据结构学习
一 数学基础
我们需要建立函数之间的一种 相对的级别,衡量两个函数的 相对增长率(relative rate of growth)
1、如果存在正常数 和 使得当 时, ,记为 。
2、如果存在正常数 和 使得当 时, ,记为 。
3、 当且仅当 且 。
也可以理解成,当 时, 是 的一个上界。反之,当 时, 是 的一个下界。
Tips:不要将常数或者低阶项在入 中;我们可以取极限()来确定两个函数的相对增长率(可能需要用到洛必达)
在算法衡量中,我们关注的是时间,但是我们并不能直接拿时间作为衡量,因为跑程序的计算机算力之间也有差距,我们通常看算法的 步骤
2 List、Stack and Queue
One of the basic rules concerning programming is that no routine should ever exceed a page. This is accomplished by breaking the program down into many modules.
An abstract data type ( ADT ) is a set of operations. Abstract data types are mathematical abstractions. No where in an ADT’s definition is there any mention of HOW the set of operations is complemented.
2.1 List ADT
The ADT ( Abstract Data Type ) of list is shown below:
1 |
|
Node* MakeEmpty()
Pay attention to the type Node*
, which means this function will return a pointer which point to Node
. It is usually used as:
1 | Node *head = MakeEmpty(); |
If you want to pass a pointer to function, you nedd to use Node **PtrToHead
because if you use Node *head
,the function will only copy the value of pointer and what you have done to this pointer in function will only work in function. That means your pointer head
will not change when the function return.
So you need to pass the address of pointer to change the pointer head
as:
1 | void MakeEmpty(Node **PtrToHead); // YES |
void Insert(Node *head, ElementType X)
We can summarize Insert()
fucntion as three steps:
- Find position to insert
- Malloc a new node and assign value to it ( include value and pointer )
- Reconnect the list ( when you are not sure about it, drawing a picture will make you understand what you need to do )
and it is almost the same with Delete()
Exchange of two cells in a list
To exchange two cells in a list, we need four pointer to track the two cell and reconnect them in the list.The operations can be performed in two steps: ( exchange pos1
and pos2
)
- Find
pos1
-1 andpos2
-1, assign them top1
andp2
- Exchange
pos1
andpos2
links ( include a link point to itself and a link point to its next )- Keep a backup of
p1->next
andp1->next->next
- Assign
p1->next
totemp
, assigntemp->next
totemp_next
- Assign
- Assign
p2->next
top1->next
and assigntemp
top2->next
- Assign
p1->next->next
totemp->next
, and assigntemp_next
top1->next->next
- Keep a backup of
The code is shown below:
1 | void Exchange(Node *head, int pos1 ,int po2){ |
Reverse - Nonrecursive
To reverse a singly linked list in O(N), three pointers are necessary. p1
,p2
point to the ends of the reversed link, and p3
points to the next cell.
1 | graph LR |
1 | void Reverse_Nonrecursive(){ |
Reverse - recursive
We have talked about reversion of a signly list in non recursive. We will discuss the recursive version.
In the recursive version, it is important for us to be clear about how to iterate over the link and how to end the iteration.
It is easy to think if I want to revrese a N cells link, we can make the N-1 cells reverse. When the link has only two cells ,we can easily reverse the link.
1 | Node *Reverse(Node *head){ |
Example 1:The Polynomial ADT (多项式)
We can use list to calculate the polynomial. Each term in polynomial is contained in one cell, and the cells are sorted in decreasing order of exponents.
1 | graph LR |
1 | struct Polynomial{ |
Once we get the polynomial ADT, we can try to present a single-variable polynomial and do some computing.
Example 2 : Radix Sort (基数排序)
Rdix sort is sometimes called as card sort.
2.2 Stack ADT
A stack is a list with restriction that inserts and deletes can be performed in only one position, namely the end of the list called the top.
The fundamental operation: push ( insert ), pop ( delete )
1 | graph LR |
Stack is also called LIFO ( last in, first out )
Linked List Implementation of Stacks
- MakeEmpty
- Push is implemented as an insertion into the front of a linked list, where the front of the list servers as the top of the stack.
- Top is performed by examining the element in the front of a linked list
- Pop is implemented as an deletion of the front of the list.
The drawback of this implementation is that the calls to malloc and free are expensive, especially in comparison to the pointer manipulation routinue.
1 |
|
Array Implement of Stacks
This is a popular solution. The only potential hazard with this strategy is that we need to declare an array size ahead of time.
- Initialization: tos (top of stack) equals to -1
- Push operation: tos+1 and Stack[tos] = x
- Pop operation: return Stack[tos], and then tos-1
One problem that affects the efficiency of implementing stacks is error testing. Our linked list implementation acrefully checked for errors.
1 |
|
2.3 Queue
In queue, insertion is done at one end, whereas deletion is performed at the other end;
1 | graph RL |
Like stack, queue can be implemented by list or array.
Linked List Implementation
Array Implementation
In the array implementation, queue data sturcture has an array queue[]
, the position front
and rear
, which represent the ends of the queue. We keep track of the number of front
and end
actually in queue. Each operation makes front
and rear
and queue[]
change.
- enqueue: To enqueue a element x, we increment
q_size
andrear
- dequeue: To dequeue a element x, we return
queue[front]
,then incrementfront
and decrementq_size
Remember to use circular array. And there are two warnings about the circular array implementation of queues.
- First, it is important to check the queue for emptiness, because a dequeue when the queue is empty will return an undefined value.
- Secondly, some programmers use different ways of representing the front and rear of a queue.
In applications where you are sure that the number of enqueue is not longer than the size of the queue, obviously the wraparound is not necessary.
3 Trees
Tree is a data structure that the running time for most operations is O(log n) on average. Trees in general are very useful abstractions in computer science.
3.1 Implement of Trees
One way to implement a tree would be to have in each node, beside its data, a pointer to each child of the node. The ADT is shown below:
1 | struct tree_node{ |
Application of the Trees
One of the popular uses is the directory structure in many common OS, including UNIX, Linux…
The Traversal of the Trees
-
Preoder traversal: Work at a node is performed before (pre) its children are processed.
-
Postoder traversal: Wora at a node is performed after (post) its children are processed.
3.2 Binary Tree
A binary tree is a tree in which no node can have more than two children. A property of a binary tree that is somtimes important is that the depth of an average binary tree is considerably smaller than N. The average depth is and that for a special type of binary tree, namely the binary search tree, the average value of the depth is .
1 | graph TD |
Implementation of Binary Tree
1 | struct TreeNode{ |
The Expression Tree
We can use a tree to represent a expression. The leavse of an expression tree is operands and the other node contain operators. (The expression of is shown below )
1 | graph TD |
Impementation of Expression Trees
We have known how to use a stack to convert a infix to postfix. Based on this algorithm, we can contruct an expression tree.
- convert infix to postfix in stack
- get a symbol onece a time, and create a node for it
- If symbol is a operand, keep it in stack and continue.
- if symbol is a operator
3.3 The Search Tree ADT - Binary Search Trees
Let us assume that :
- Each node in the tree is assigned a key value
- All the keys are distinct, and deal with duplicates later.
The property that makes a binary tree into a binary search tree is that every node, X, in the tree, the values of all the keys in the left is smaller than the key value in X, and the values of all the kets in the right is larger than the key value in X.
The ADT of Tree:
1 |
|
Find
Find can return a pointer to the node in T that key x, or NULL if there is no such node. The structure of the tree makes this simple. If T is NULL ,then just return T. Otherwise, check the key value of T, if it is x, return T. Otherwise, check key values in children of T.
What’s more, we can easily find the min or max just search the left or right child repeatedly.
1 | TreeNode *Find(TreeNode *T, KeyType X){ |
Insert
To insert X into tree T, we proceed down the tree as you would with a Find.
- If x is found, do nothing.
- Otherwise, insert X at the last spot on the path traversed.
Duplicates can be handled by keeping an extra field in the node to recored indicating the frequency of occurence. Although it will add the space of trees, but it worth it. (It is better than add duplicates in the tree, which may make the tree very deep)
1 | TreeNode *Insert(TreeNode *T, KeyType X){ |
Delete
When we try to delete a node in Tree T. We weil meet 4 situations:
- if we can not find X in tree T, just do nothing
- if the node is a leaf, just free it.
- if the node has one child, we need to replace this node with its child and free this node;
- if the node has two children, we need to find a proper node to replace this node and free this node.
- The general strategy is to replace the key_value of this node with the smallest key of the right subtree, and recursively delete that node.
The deletion can be separated to 2 step, one is to search the node with key value X and 2 is to delete this node.
1 | TreeNode *Delete(TreeNode *T, KeyType X){ |
3.4 AVL-Tree
As we calculated above, the expected depth of a binary tree is . However, it is hard for us to build a tree and keep its depth close to without supervision. A good idea is to add a filed to note the depth of this node. Given this new variable, we can define that a good bianry tree is which the depth of left subtree and right subtree are close. .
When a insertion or deletion makes the tree unbalance, a way to correct it is to rotation the tree. A single rotation will be fine in a simple situation, whereas we need double rotation in a complicated situation.
Nodes of AVL Tree
If an AVL tree has a height , the minum number of nodes is :
Situations
There are four situations after the insertion make the tree unbalanced. If node A is unbalanced, we need to consider the insertion:
- insert x into the left child of the left subtree of A
- insert x into the right child of the right subtree of A
- insert x into the right child of the right subtree of A
- insert x into the left child of the right subtree of A
Case 1 and 3 are considered as outside insertion, and case 2 and 4 are considered as inside insertion
When a outside insertion makes the tree unbalanced, we need to SingleRotate, whereas a inside insertion need DoubleRotate
1 | // static: function only word in which declare it. |
3.5 Tree Traversals
- Inorder Traversal : Process the left subtree first, then perform processing at the current node, finally process the right subtree.
- Preoder Traversal : The node is processed before its children.
- Postorder Traversal : Process both subtrees first before we can process a node.
3.6 B-Tree
A B-Tree of order m is a tree with the following structural properties:
- The root is either a leaf or has between 2 and m children.
- All non leaf nodes ( except the root ) have between [m/2] and m children
- All leaves are at the same depth.
4 Hashing
We will discuss the hash table ADT, which surpport only a subset of operations allowed by binary search trees.
The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in const average time .
Choose a function, decide what to do when two keys hash to the same value (collision) and the size of table.
1 | // index for table |
4.1 Hash Function
It is usually a good idea to ensure that the table size is prime (质数) . If the input key values are integers, then simply returning is a commom strategy. If the input key values are string, one option is to add up all ASCII values of the characters in the string.
Horner’s rule:
1 | Index Hash(const char *key, int TableSize){ |
4.2 Open Hashing (Separate Chaining)
In open hashing, the table element is a linked list. If the memory is tight, we can ignore the header, generally we will add headers. Sometimes new elements are inserted at the front of the list, since it is convenient and also because frequently it happens that recently inserted elements are the most likely to be accessed in the near future.
1 |
|
4.3 Closed Hashing (Open Addressing)
Closed hashing, also known as open addressing, is an alternative to resolving collision with linked lists. In closed hashing, if a collision occurs, alternate cells are tried until an empty cell is found.
Fomalization:
Hash function is mapping the key into number, so the f deals with the collision. There are three example of f:
- Linear Probing: If the collision occurs, we will chose the next available spot.
- Quadratic Probing: This is what we would expect-the collision function is quadratic. The popular choice is
- Double hashing: The function must never evaluate to zero. It is also important to make sure all cells can be probed. A function such as , with R a prime smaller than H_SIZE, will. work well.
4.4 ReHashing
4.5 Extendible Hashing
5 Heap - Priority Queues
The Heap must allow at least the two operations: Insert and Delete_min ( return and remove the minimum element in the heap ).
1 | graph RL |
Heap ADT
1 |
|
InitHeap: Build a empty binary heap. Because of the property of heap, we need to set a value smaller than any datas in data[0]. The first data is inserted in position 1.
1 | Heap *InitHeap( int MaxSize ){ |
Insert: To keep the smallest value in the position 1 ( top ), we need carefully insert new data in binary heap. A way to insert data properly is to find a proper position to insert. First, we make a new empty cell in the rear ( H->data[ H->Size+1 ] ), compare inserted value X with the empty cell’s parents ( i = i/2 ), if the X is smaller than a parant, move parent to its child which is in position i. Keep moving unitil X is larger than a parent. The index i is where the X can insert.
1 | void Insert( ElementType X, Heap *H ){ |
DeleteMin: Compared with the Insertion, deleting a element is a little more complicated. When we deleted a element, there will be a empty cell in binary heap. We need to move element to fill this empty cell. One strategy is to move the smallest child to the empty cell, which was deleted. After move, there can be a new empty cell, so we need to repeat the move until there is no new empty cell.
1 | ElementType DeleteMin(Heap *H){ |
6 Sorting
Sort an arry of elements. To simplify matters, we will assume in our examples that the array contains only integers.
6.1 Insertion Sort – O(N^2)
Insertion sort consists of n-1 passes. For pass P=2 through n, insertion sort ensure that the elements in position 1 through P are in sorted order. This algorithm is simplest but not very efficient.
6.2 ShellSort
6.3 HeapSort
Every time pop the Min Element and save in rear. After N-1 passes, we will get the odered array.
A DeleteMin
is cost , and there are N times deleteMin
. So the total cost is about
1 | void HeapSort(ElementType A[], int Size){ |
6.4 MergeSort
Mergesort runs in worst-case running time, and the number comparisons used is nearly optional. It is a fine example of a recursive algorithm.
The basic idea of this sort is to divide the array into two parts recursively. And merge the two divided parts into a temporary array in order. Obviously, every time we are going to merge two parts, it is sure that the two parts are in order.
The complementation can be divided into three parts:
1 | typedef int ElementType; |
The details of these functions are shown below:
1 | void MergeSort(ElementType A[], int N){ |
6.5 Quick Sort
The quick sort is like the merge sort. We will chose a element as pivot and partition the array into two parts. Repeat these operations recursivel. The key point is how to chose the pivot. A way is to chose the first element, although it is simple for coding, this algorithm will be in trouble if the data is in reversed order. It will cost in the worst case. Another way is called median-of-three. We compare the elements in first, median and last position and sort then in place. Then keep the median as pivot. It will make the worst case easier.
Recursive Implementation
The classic implementation is to recursive algorithm. The implementation is shown below.
1 | typedef int ElementType; |
Non-Recursive Implementation
We can convert every recursive implementation, in theory, into a non-recursive implementation, which consider the recursive calls of function as the pop and push in a stack. So does QuickSort.
Considering a stack, whose elements are the two ends (left, right) of the sub-array to be sorted. In recursive case, we just call the QuickSort
function and pass the left
and right
, however, we can build a stack and push or pop element to process the quick sort. (By the way, the code shown below doesn’t include the implementation of stack. )
1 | typedef struct Cell{ |
7 The Disjoint Set ADT
8 Graph Algorithm
In this chapter, we discuss several common problems in graph theory. Not only are these algorithm useful in practice, they are interesting because in many real-life apllications they are too slow unless careful attention is paid to the choice of data structures.
8.1 Definitions
A graph consists of a set of vertices and a set of edges. Each edge is a pair , where .If the pair is ordered, then the graph is directed. Directed graph are sometimes reffered to as digraphs. Sometimes an edge has a third component, known as either a weight or a cost.
A path in a graph is a sequence of vertices such that $(w_i,w_{i+1})\in E $ for . The length of such a path is the number of edges on the path, which is equal to n-1. We allow a path from a vertex to itself. If this path contains no edges, then the length is 0. This is a convenient way to define an otherwise special case. If the graph contains an edge from a vertext to itself, then the path v,v is sometime referred to as loop.
A cycle in a directed graph is a path of length at least 1 such that . This cycle is simple if the path is simple. For undirected graphs, we require that the edges be distinct. The logic of these requirements is that the path u,v, u in an undirected graph should not be considered a cycle, because and are the same edge. In a directed graph, these are different edges, so it makes sense to call this a cycle.
DAG: Directed Acyclic Graph
An undirected graph is connected if there is a path from every vertext to every vertex. A directed graph with this property is called strongly connected. If a directed graph is not strongly connected, but the underlying graph (without direction to arcs) is connected, then the graph is said to be weakly connected. A complete graph is a graph in which there is an edgs between every pair of vertices.
8.2 Representation of Graphs
We will consider directed graphs. One simple way to represent is to use a two-dimensional array. This is known as an adjacency matrix representation. For each edge (u,v)
, we set A[u][v]=1
, otherwise the entry in the array is 0. If the edge has a weight associated with it, then we can set A[u][v]
equal to the weight and use either a very large or a very small weight as a sentinel to indicate nonexitent edges.
When the graph is dense, in other words, when there are many edges in a graph, adjacency matrix is an appropriate representation. However, if the graph is sparse, a better solution is an adjacency list representation. For each vertex, we keep a list of all adjacent vertices. The space requirement is then O|E|+|V| .
The leftmost structure in figure shown below is merely an array of header cells. The representation should be clear. If the edges have weights, then this additional information is also stored in the cells.
Adjacency lists are the standard way to represent graphs. Undirected graph acn be similarly represented. Each edges (u,v) appears in two lists, so the space usage essentially doubles. A common requirement in graph algorithm is to find all vertices adjacent to some given vertex v, and this can be done, in time proportional to the number of such vertices found, by a simple scan down the appropriate adjacency list.
Also, in real-life application, the vertices have names, which are unknown at compile time, instead of using numbers. Since we can not index an array by an unknown name, we must provide a mapping of names to numbers. And that is what we have done before, the Hashing ! We can map the key word into a hash table. With this transformation, all the graph algorithm will use only the internal numbers.
8.3 Topological Sort
A topological sort is an ordering of vertices in a directed acyclic graph, such that if there is a path from to , then appears after in the ordering.
A simple algorithm to find a topological ordering is first to find any vertex with no incoming edge. We can then print this vertex, and remove it, along with its edges, from the graph. Then we apply same strategy to the rest of the graph.
Formalize:
we define the indegree of a vertex v as the number of edges (u,v). We compute the indegree of all vertices in the graph. Assuming that the indegree array is initialized and that the graph is read into an adjacent list.
1 | void Topsort(Graph G){ |
Scanning down the array every time we call FindNewVertexOfIndegreeZero()
is a inefficient algorithm. We can keep all the vertices of indegree 0 in a speicial box to remove this inefficiency, which can be implemented with stack or queue.
1 | void TopSort(Graph G){ |
8.4 Shortest-Path Algorithm
The input is a weighted graph: associated with each edge (vi, vj) is a cost cij to traverse the arc. The cost of a path v1…vn is . This is referred to as the weighted path length. The unweighted path length is merely the number of edges on the path, namely, n-1.
Unweighted Shortest Paths
There is a undirected graph G. The input is a vertex in graph G, and we want to find the shortest path from s to all other vertices. We are only interested in the number of edges contained on the path, so there are no weights on the edges.(We can juest assign all edges a weight of 1)
Breadth-First Search: It operates by processing vertices in layers; the vertices closest to the start are evaluated first, and the most distant vertices are evaluated last. This is much the same as a level-order traversal for trees.
Weighted Shortest Path
If the graph is weightedm, the problem becomes harder, but we can still use the ideas from the unweighted case.
We keep all of the same information as before. Thus, each vertex is marked as either known or unknown. A tentative distance is kept for each vertex, as before.
The general method to solve the single-source shortest-path problem is known as Dijkstra Algorithm, which is a greedy algorithm. Greedy algorithm genreally solve a problem in stages by doing what appears to be the best thing at each stage.
Dijkstra Algorithm:It proceeds in stages, just like the unweighted shortest-path algorithm. At each stage, Dijkstra’s algorithm selects a vertex v, which has the smallest among all the unknown vertices, and declares that the shortest path from s to v is known. The remainder of a stage consists of updating the values of .
In the weighted case, we set if this new value for would be an improvement. Put simply, the algorithm decides if the cost to w is lower when this vertex v is in the path.
Graph ADT:
To implement the Dijkstra algorithm, we need to implement the undirected weighted graph first.
1 |
|
Dijkstra Algorithm
The basic idea to implement Dijkstra is to read graph and save it as a adjacency table. Traversal adjacency table, update the Dist
and set Known
to 1. If the Dist
updates, we set Path
to record this vertex.
1 |
|
1 | static void ReadGraph(Graph *G, Table *T){ |
9 Algorithm
9.1 Greedy Algorithm
In each phase, a decision is made that appears to be good, without regard for future consequences. Generally, this means that some local optimum is chosen, This “take what you can get now” strategy is the source of the name of this class of algorithm. When the algorithm is correct, we hope that the local optimum is equal to the global optimum.
A Simple Scheduling Problem
We are given jobs , all with known running times respectively. We have a single processor. What is the best way to schedule these jobs in order to minimize the average completion time?
Nonpreemptivce Scheduling : The schedule that arranges the job by shortest job first will always yield an optimal schedule.
Huffman Codes - File Compression
Approximate Bin Packing
- On-line Algorithm
- Off-line Algorithm
9.2 Divide and Conquer
A common technique used to design algorithm is divide and conquer.
- Divide: Smaller problem are sovled recursively. (except, of course, base cases)
- Conquer: The solution to the original problem is then formed the solutions to the subproblem.
Closest-Points Problem
The Selection Problem