Wednesday, May 29, 2013

Heap Data Structure

Heap is an efficient data structure used for managing data during the execution of algorithms like heapsort and priority queue. It supports insert and extract min (or max) operations efficiently. Heap achieves this by maintaining a partial ordering on the data set. This partial ordering is weaker than the sorted order but stronger than random or no order. Heap is a binary tree in which key on each node dominates key of its children.

There are two variations of heap, min-heaps, and max-heaps(as shown below).


                                    10  <  45,75                                      2001 > 1045,1175
                                    45  <  70,55                                      1045 > 70,55

Property of Heap

  1. The tree is completely filled on all levels except the lowest level which gets filled from left up to a point. So, it can be viewed as the nearly complete binary tree.
  2. Both heaps(min and max) satisfy a heap property. In max-heap, the key of each node is larger than the key of its children nodes. A min-heap is organized the opposite way.
  3. Smallest element in min-heap is at the root. Largest element in max-heap is at the root.
  4. Heap is NOT a binary search tree (it is a binary tree though ). So binary search doesn't work on the heap.

Heap Implementation

  1. Binary Tree: The obvious implementation of heap is binary tree (evident from above diagram). Each key will get stored in node with two pointers to its children. Height of heap is the height of root node; and height of a node is number of edges on the longest downward path from node to the leaf. So height of above trees are 3 ( lg8 ).
  2. Array: Nearly complete binary tree in case of heap enables it to represent it without any pointers. Data can be stored as an array of keys, and use the position of the keys to implicitly satisfy the role of the pointers. 
         
Node(i)left = 2* Node(i)  
Node(i)Right = 2*Node(i)+1


       As shown above, root of the tree is stored at the first position of the array, and its left and the right children in the second and third positions, respectively. Assumed that array starts with index 1 to make calculations simpler. So left child of i sits in position and right child in 2i+1. This way you can easily move around without pointers.        

           int first_child(int i){
               return 2*i;
           }
           
           int parent(int i){
              if(i == 1) return -1
              else return (int)i/2;
           } 

 Array representation of heap saves memory but is less flexible than tree implementation. Updating nodes in array implementation is not efficient. It helps to implement heap as array as its fully packed till last level. But it will not be so efficient for normal binary trees. 

Heap Construction

The fundamental point which needs to be taken care of is; the heap property should be retained during each new insertion. Let's take the case of the above min-heap. Now assume that you want to add one more item to the heap with key 5.

Step 1: Add 5 to the next slot. 
The new element will get added to the next available slot in the array. This  ensures the desired balanced shape of the heap-labeled tree, but heap property might no longer be valid.



Step 2: Swap Node 5 and 70 to satisfy the min-heap property. 
After swap min-heap property test passes for node 5. But min-heap property fails for the parent node of 5.



Step 3: Swap node 45 with 5.
Now node with key 5 is satisfied as its children have key values > 5.



Step 4: Swap root node with its left child.
Bingo. Heap is perfect now.




Insertion Complexity: Swap process takes constant time at each level. So each insertion takes at most O(log n) time.

Creation Complexity: Each insertion takes O(log n) time; then creation (or insertion of n nodes will take O(n log n) time.


Extracting Minimum from min-heap

Minimum of a min-heap sits at first position; so finding first minimum value is easy. Now the tricky part is how to find next minimum element. 

To find the next minimum; let's remove the first element. After this to make sure that binary tree is maintained, let's copy the last element to the first position. This is required to ensure that the tree is filled at all levels except the lowest level. But after this, the heap property might go for a toss as now a bigger element may sit at the root. Now the heap should re-arrange itself so that the min-heap property satisfies. 

Heapify:  This is a process in which a dissatisfied element bubbles-down to appropriate position. To achieve this the dominant child should be swapped with the root to push down the problem to the next level. This should be continued further till the last level. 
The heapify operation for each element takes lgn steps (or height of tree) so complexity is O(log n) time. 

Java Implementation: link

Heapsort

Exchanging the minimum element with the last element and calling heapify repeatedly gives an O( n logn) sorting algorithm, named as Heapsort. Worst case complexity is O( n logn ) time, which is best that can be expected from any sorting algorithm. It is an in-place sorting i.e. uses no extra memory 




No comments:

Post a Comment