Wednesday, June 5, 2013

HashMap implementation in Java

Map is an associative array data structure used for mapping or associating value(s) with a key. We can insert key-value pairs in the map with the expectation that later on value can be looked up (or searched) given the key as the input. Insert(put) and search(get) are two most important operations on the map. In a normal array, keys are indices of the array and value is the object stored at the given index; whereas, in an associative array, keys could be of any type but still, we can quickly locate the value.

One of the primary and most important map implementations in Java is HashMap. This post focuses on its internals instead of focusing on API details (Java doc).

Building Blocks

Let's cover some of the important aspects of HashMap:

Array: HashMap internally stores all key-value pairs in an array. This array can be re-sized if required but its length should always be a power of two. The resizing happens when array is filled up to threshold level. 

    Entry[] table;  //Array in which key/value pair gets stored
    table = new Entry[capacity];  // Array creation based on the default/user supplied capacity

Entry: As shown above, HashMap stores key,value pair in an array of Entry. Entry stores key as well as value. Apart from this, it also stores hash value of key so that it doesn't need to calculate on every invocation. If multiple keys have same hashcode; it results in collisions.  To handle this scenario next pointer is used to chain them and form a linked list. 

   static class Entry<K,V>{
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;
   }

Capacity: Capacity denotes the length of the array or number of buckets in the hash map. Initial capacity is specified at time of creation (default value is 16). You can specify any capacity initially but it should be power of 2.


    Map<String,String> map = new HashMap<String,String>(capacity);

Load Factor: Load factor is a measure of, how full the hash map is allowed to get before its capacity is automatically increased. Default load factor is 0.75. If you create a default hashMap (capacity is 16 and load factor is 0.75) then re-sizing will happen if the 12 items have been added to the map. So REHASHING happens when number of entries in the map exceeds the product of load factor and the current capacity.

Threashold: The next value at which resize will happen.  ( (int)capacity*loadFactor )

Size : The number of key-value mappings contained in this map. The resize of the array depends on this number. It gets checked after every add method call.

  if( size >= threashold){  //if size after last add call >= threashold then resize table
         resize(2*table.length);
    }

Introspection

Above attributes of HashMap are not exposed for public use through API. This means you can't check how does HashMap evolves as you keep on adding key/value pairs.

But does it mean that:
You can't check these attributes at all?
You can't check which index in the array a new item gets added?
Certainly NOT.

Reflection comes to rescue here (Reflection in Java). I have written a utility class through which you can monitor the evolution of HashMap through Reflection. 

    Map<String,Integer> map = new HashMap<String,Integer>();

HashMapSimul.java uses default hashMap; which means, capacity will be 16 and loadFactor will be 0.75. Below example maps fruit names to the position on which it will get added in Map. Value is index on which item will get added in Map. I have used actual methods from the API to get the index.

 import java.lang.reflect.Field;  
 import java.lang.reflect.InvocationTargetException;  
 import java.lang.reflect.Method;  
 import java.util.HashMap;  
 import java.util.Map; 

 public class HashMapSimul {  
      static Map<String, Integer> map;  
      // Actual Method from HashMap library  
      static int indexFor(int h, int length) {  
           return h & (length - 1);  
      }  
      // Actual Method from HashMap library  
      static int hash(int h) {  
           h ^= (h >>> 20) ^ (h >>> 12);  
           return h ^ (h >>> 7) ^ (h >>> 4);  
      }  
      public static void interrospect() {  
           try {  
                Class<?> c = Class.forName("java.util.HashMap");  
                Field f = c.getDeclaredField("size");  
                f.setAccessible(true);  
                int size = f.getInt(map);  
                f = c.getDeclaredField("loadFactor");  
                f.setAccessible(true);  
                float loadFactor = f.getFloat(map);  
                f = c.getDeclaredField("threshold");  
                f.setAccessible(true);  
                int threshold = f.getInt(map);  
                f = c.getDeclaredField("modCount");  
                f.setAccessible(true);  
                int modCount = f.getInt(map);  
                Method m = c.getDeclaredMethod("capacity", null);  
                m.setAccessible(true);  
                int capacity = (Integer) m.invoke(map, null);  
                System.out.println("Size: " + size + "; LoadFactor: " +loadFactor  
                      + "; Threashold: " + threshold + "; ModCount: " + modCount  
                      + " ; capacity :" + capacity);  
           } catch (ClassNotFoundException e) {  
                e.printStackTrace();  
           } catch (SecurityException e) {  
                e.printStackTrace();  
           } catch (NoSuchFieldException e) {  
                e.printStackTrace();  
           } catch (IllegalArgumentException e) {  
                e.printStackTrace();  
           } catch (IllegalAccessException e) {  
                e.printStackTrace();  
           } catch (NoSuchMethodException e) {  
                e.printStackTrace();  
           } catch (InvocationTargetException e) {  
                e.printStackTrace();  
           }  
      }  
      public static void main(String[] args) {  
           map = new HashMap<String, Integer>();  
           map.put("guava", indexFor(hash("guava".hashCode()), 16));  
           map.put("mango", indexFor(hash("mango".hashCode()), 16));  
           map.put("pear", indexFor(hash("pear".hashCode()), 16));  
           map.put("orange", indexFor(hash("orange".hashCode()), 16));  
           System.out.println(" Map :" + map);  
           interrospect();  
           System.out.println("-------------------------------");  
           map.put("banana", indexFor(hash("banana".hashCode()), 16));  
           System.out.println(" Map :" + map);  
           interrospect();  
      }  
 }

Output
 Map :{orange=1, guava=5, mango=9, pear=15}
Size: 4; LoadFactor: 0.75; Threashold: 12; ModCount: 4 ; capacity :16
-------------------------------------
 Map :{banana=1, orange=1, guava=5, mango=9, pear=15}
Size: 5; LoadFactor: 0.75; Threashold: 12; ModCount: 5 ; capacity :16

Mapping above item to Array of Entry


2 comments:

  1. Hello!

    If you want to find out HashMap indepth then cover my tutorial
    Internal life of HashMap in java

    ReplyDelete
    Replies
    1. Shared link won't work, don't visit the shared URL link

      Delete