Wednesday, May 29, 2013

Heap Data Structure

Heap is an efficient data structure used for managing data during the execution of algorithms like heapsort and priority queue. It supports insert and extract min (or max) operations efficiently. Heap achieves this by maintaining a partial ordering on the data set. This partial ordering is weaker than the sorted order but stronger than random or no order. Heap is a binary tree in which key on each node dominates key of its children.

There are two variations of heap, min-heaps, and max-heaps(as shown below).


                                    10  <  45,75                                      2001 > 1045,1175
                                    45  <  70,55                                      1045 > 70,55

Property of Heap

  1. The tree is completely filled on all levels except the lowest level which gets filled from left up to a point. So, it can be viewed as the nearly complete binary tree.
  2. Both heaps(min and max) satisfy a heap property. In max-heap, the key of each node is larger than the key of its children nodes. A min-heap is organized the opposite way.
  3. Smallest element in min-heap is at the root. Largest element in max-heap is at the root.
  4. Heap is NOT a binary search tree (it is a binary tree though ). So binary search doesn't work on the heap.

Heap Implementation

  1. Binary Tree: The obvious implementation of heap is binary tree (evident from above diagram). Each key will get stored in node with two pointers to its children. Height of heap is the height of root node; and height of a node is number of edges on the longest downward path from node to the leaf. So height of above trees are 3 ( lg8 ).
  2. Array: Nearly complete binary tree in case of heap enables it to represent it without any pointers. Data can be stored as an array of keys, and use the position of the keys to implicitly satisfy the role of the pointers. 
         
Node(i)left = 2* Node(i)  
Node(i)Right = 2*Node(i)+1


       As shown above, root of the tree is stored at the first position of the array, and its left and the right children in the second and third positions, respectively. Assumed that array starts with index 1 to make calculations simpler. So left child of i sits in position and right child in 2i+1. This way you can easily move around without pointers.        

           int first_child(int i){
               return 2*i;
           }
           
           int parent(int i){
              if(i == 1) return -1
              else return (int)i/2;
           } 

 Array representation of heap saves memory but is less flexible than tree implementation. Updating nodes in array implementation is not efficient. It helps to implement heap as array as its fully packed till last level. But it will not be so efficient for normal binary trees. 

Heap Construction

The fundamental point which needs to be taken care of is; the heap property should be retained during each new insertion. Let's take the case of the above min-heap. Now assume that you want to add one more item to the heap with key 5.

Step 1: Add 5 to the next slot. 
The new element will get added to the next available slot in the array. This  ensures the desired balanced shape of the heap-labeled tree, but heap property might no longer be valid.



Step 2: Swap Node 5 and 70 to satisfy the min-heap property. 
After swap min-heap property test passes for node 5. But min-heap property fails for the parent node of 5.



Step 3: Swap node 45 with 5.
Now node with key 5 is satisfied as its children have key values > 5.



Step 4: Swap root node with its left child.
Bingo. Heap is perfect now.




Insertion Complexity: Swap process takes constant time at each level. So each insertion takes at most O(log n) time.

Creation Complexity: Each insertion takes O(log n) time; then creation (or insertion of n nodes will take O(n log n) time.


Extracting Minimum from min-heap

Minimum of a min-heap sits at first position; so finding first minimum value is easy. Now the tricky part is how to find next minimum element. 

To find the next minimum; let's remove the first element. After this to make sure that binary tree is maintained, let's copy the last element to the first position. This is required to ensure that the tree is filled at all levels except the lowest level. But after this, the heap property might go for a toss as now a bigger element may sit at the root. Now the heap should re-arrange itself so that the min-heap property satisfies. 

Heapify:  This is a process in which a dissatisfied element bubbles-down to appropriate position. To achieve this the dominant child should be swapped with the root to push down the problem to the next level. This should be continued further till the last level. 
The heapify operation for each element takes lgn steps (or height of tree) so complexity is O(log n) time. 

Java Implementation: link

Heapsort

Exchanging the minimum element with the last element and calling heapify repeatedly gives an O( n logn) sorting algorithm, named as Heapsort. Worst case complexity is O( n logn ) time, which is best that can be expected from any sorting algorithm. It is an in-place sorting i.e. uses no extra memory 




Saturday, May 4, 2013

Class Loading and Unloading in JVM

Class Loader is one of the major component (sub system) of the Java Virtual Machine(JVM). This posts talks about loading and unloading of a class (or interface) into virtual machine. JVM specification refers to classes and interfaces as type.

A Java program is composed of many individual class files, unlike C/C++ which has single executable file. Each of these individual class files correspond to a single Java class and it gets loaded the first time you create an object from the class or the first time you access a static component (block, field or method) of the class.

A class loader's basic objective is to service a request for a class. JVM needs a class (ofcourse for instantiating it), so it asks the class loader by passing full name of the class. And in return class loader gives back a Class object representing the class(more detail here). Below diagram shows sequence of steps:



Class Loading

In the JVM architecture tutorial (link), I have discussed class loader subsystem briefly. JVM has a flexible class loader architecture that enables a Java application to load classes in custom ways. Each JVM has at 
least two class loaders. Let's cover both of them:
  • Bootstrap class loader : Part of JVM implementation and it is used to load the Java API classes only. It "bootstraps" the JVM and is also known as primordial, system or default class loader.
  • User-defined class loaders : There could be multiple user-defined class loaders. Class loaders are like normal Java classes, so application can install user-defined class loaders to load classes in custom way. You can dynamically extend Java application at run time with the help of user-defined class loaders.
JVM keeps track of which class loader loaded a given class. Classes can only see other classes loaded by the same class loader (for security concerns). Java architecture achieves this by maintaining name-space inside a Java application. Each class loader in a running Java application has its own name-space and class inside one class loader can't access a class from another class loader unless the application explicitly permits this access. This name-space restriction means :
  • You can load only ONE class named as say Fruit in a given name-space. 
  • But you can create multiple name-space by creating multiple class loaders in the same application. So, you can load three Fruit classes in three different class loaders. And all these Fruit classes will be unaware of the presence of rest two.
Now, obvious question is, how these multiple class loaders in the same application load classes ?
Class loaders use Parent-delegation technique to load classes. Each class loader except bootstrap class loader has a parent. Class loaders asks its parent to load a particular class. This delegation continues all the way to the bootstrap class loader, which is the last class loader in the chain. If the parent class loader can load a type, the class loader returns that type. Otherwise current class loader attempts to load the class itself. This approach ensures that you can't load your own String class, because request to load a new String class will always lead to the System/bootstrap class loader which is responsible for loading Java API classes. Also classes from Java API get loaded only when there is a request for one. 

What if, I create my own class in Java.util package ?
Java gives special privileges to classes in the same package. So does it mean my class say Jerk inside Java.util package will get special access and security will get compromised ?  
Java only grants this special access to class in the same package which gets loaded by the same class loader. So your Jerk class which would have got loaded by an user-class loader will NOT get access to java.lang classes of Java API. So code found on class path by the class path class loader can't gain access to package-visible members of the Java API. 

Class Unloading

Lifetime of a class is similar to the lifetime of an object. As JVM may garbage collects the objects after they are no longer referenced by the program. Similarly virtual machine can optionally unload the classes after they are no longer referenced by the program. 

Java program can be dynamically extended at run time by loading new types through user-defined class loaders. All these loaded types occupy space in the method area. So just like normal heaps the memory footprints of method areas grows and hence types can be freed as well by unloading if they are no longer needed. If the application has no references to a given type, then the type can be unloaded or garbage collected (like heap memory).

Until Java SE 7; classes (and Constant pool) were stored in Permanent Generation (or PermGen).  Tuning PermGen size was complicated so Java SE 8 has completely removed PermGen and now classes (Java HotSpot VM representation of class) are moved into native or heap memory, link.

Types (Class/Interface) loaded through the bootstrap loader will always be reachable and will never be unloaded. Only types which are loaded by user-defined class loaders can become unreachable and hence can be unloaded by JVM. A class instance can be reachable in following case:
  • Class instance will be reachable if the application holds an explicit reference to the instance. 
  • Class instance will be reachable if there is a reachable object on the heap whose type data in the method area refers to the Class instance. 

References:
http://www.artima.com/insidejvm/ed2/jvm5.html 

Related Articles:
how-garbage-collection-works
jvm-architecture
life-cycle-of-object-in-java

---

calling it a post; do give your feedback!!!

Wednesday, May 1, 2013

Life Cycle of an Object in Java

In Java, objects play the pivotal role; so understanding its instantiation, and how and when it gets garbage collected is important. This post covers the life cycle of objects inside Java Virtual Machine. But keep in mind that, successful creation of the object is possible only if class is already loaded in JVM (by class loader).


Object Creation 

Class instantiation activity gives life to an object and thereafter you can call its methods to manipulate states/attributes. Object creation is done by Java Virtual Machine(after class has been loaded). When JVM creates a new instance of a class, it allocates memory on the heap to hold the object's instance. Memory is allocated for all variables/attributes declared in object's class and in all its superclasses (including hidden attributes). Once the virtual machine has allocated memory for the new object, the machine initializes the instance variable to default initial values. After this VM gives proper initial value depending on how the object gets created:

   1.  new keyword
        Person p = new Person();
        String s = new String("life");
            This is one of the most frequently used approach to create an instance of a class. 

       2.  newInstance() on a Class / Reflection
            Class class = Class.forName("fully.qualified.class.name");
            Object obj = class.newInstance();
            Person p = (Person)obj;
             

       3. Cloning         
           Person pClone = (Person)p.clone(); 

           Make sure that, Person class implements Cloneable interface and also provides the definition of the clone() method. 

       4. Deserialization      
           FileInputStream fis = new FileInputStream("person.ser");
           ObjectInputStream ois = new ObjectInputStream(fis);
           Person per =(Person)ois.readObject(); 
          
           Deserialization is a technique to convert the encoded byte stream into the corresponding object as 
           shown above. Refer to the detailed tutorial.

       5. Implicit Instantiation
           All above techniques explicitly creates an object of a class. Apart from above, there are multiple implicit ways of object creation as well :
              
           1. Class loading : For every type that a JVM loads, it implicitly creates a new Class object to                  represent that type.
           2. String Literals : When JVM loads a class which has String literals (i.e. String PI = "3.14"), 
               it might create a new String object. 
           3. Expression Evaluation : Process of evaluation an expression like String concatenation
               can also create new objects. 
                   
     So above explicit/implicit object instantiation techniques give proper initial value to the object. JVM uses one of the three techniques to do this task, depending on how the object is being created.

    If the object is being created because of a clone() invocation, the virtual machine copies the values of the instance variables of the object being cloned into the new object. If the object is deserialized via a readObject() invocation on an ObjectInputStream, the virtual machine initializes non-transient instance object variables from values read from the input stream. Otherwise, the virtual machine invokes an instance initialization method on the object. The instance initialization method initializes the object's instance variables to their proper initial value. 
    The Java compiler generates at least one instance initialization method for every class it compiles. In the  Java class file, the instance initialization method is called "<init>". For each constructor in the source code of a class, the Java compiler generates one <init>() method.  

    Object in action

    Once object is created successfully, its ready for the real action.
    obj.haveFun();

    Garbage Collection of Object / End of life

    Application can allocate memory for object via explicit and implicit ways described under object creation, but they can not explicitly free that memory. To signal to JVM that an object is ready for garbage collection, object needs to get unrefrenced in either of below ways :

    1. Explicit unreferencing
        person = null;  //person is an instance of Person   

    2. Object goes out of scope
        An Object created inside a method goes out of scope once method returns to the calling method.
        So in below case once method() is over, p implicitly becomes null. 

        public void method(){
           Person p = new Person();
           //method action
        } 

       After unrefrencing, VM can reclaim that memory. Virtual machine can decide when to garbage collect unrefrenced object. But specification doesn't guarantee any predictable behavior. It is up-to the VM to decide when to reclaim memory from unrefrenced object or it might NOT reclaim memory at all.
    If the class declares a finalizer (i.e. public void finalize() method ) then Garbage Collector will execute the finalize() method on the instance of the class before it frees the memory space occupied by that instance. So clearly, exact time of garbage collection is unpredictable.