Tuesday, December 31, 2013

Java Bytecode or class file

This post, I will be focusing on the content of Java's class file, known as bytecodes.  Java Virtual Machine uses the stream of bytecode from the class file for executing the program. As a Java programmer one doesn't need to bother about internal structure and format of bytecodes at all, but it's worth knowing how it is organized under the hood.  

Reading Bytecodes

Before reading bytecodes, let's generate one first. I am taking a simple HelloWorld example for this. 

public class HelloWorld{
public static void main(String[] args){
System.out.println("Hello, World!");
}
}

Save above class in your editor to compile it (or compile manually through command prompt). Locate the generated class file and open the same in the text editor. Shown in the below screenshot :


Some of the text does look familiar but it doesn't make sense at all. In fact, it doesn't look like bytecode of the HelloWorld.java. It will make more sense if it had numbers( binary/hex). Even if you use the FileReader API of java to read the class file; the result will be the same. You will neither see the stream of bytes nor code mnemonics

Are we missing something ? Yes.
Class file consists of stream of bytecodes. So we need to read the file as an array of the byte as shown in below class.


import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import javax.xml.bind.DatatypeConverter;

/**
 * Utility to read the contents of class file as byte stream
 * 
 * @author Siddheshwar
 * 
 */
public class ReadClassAsByteStream {

 public static byte[] readFileAsByteArray(String fle) throws IOException {
  RandomAccessFile f = new RandomAccessFile(new File(fle), "r");

  try {
   int length = (int) f.length();
   byte[] data = new byte[length];
   f.readFully(data);
   return data;
  } finally {
   f.close();
  }
 }

 // test method
 public static void main(String[] args) throws IOException {
  String file = "D://workspace/JavaSample/src/HelloWorld.class";
  byte[] b = null;

  try {
   b = readFileAsByteArray(file);
   // convert byte array to Hex
   System.out.println(DatatypeConverter.printHexBinary(b));
  } catch (IOException e) {
   e.printStackTrace();
  }
 }
}

Output:
CAFEBABE00000033001D0A0006000F09001000110800120A001300140700150700160100063C696E69743E010003282956010004436F646501000F4C696E654E756D6265725461626C650100046D61696E010016285B4C6A6176612F6C616E672F537472696E673B295601000A536F7572636546696C6501000F48656C6C6F576F726C642E6A6176610C000700080700170C0018001901000C48656C6C6F20576F726C642107001A0C001B001C01000A48656C6C6F576F726C640100106A6176612F6C616E672F4F626A6563740100106A6176612F6C616E672F53797374656D0100036F75740100154C6A6176612F696F2F5072696E7453747265616D3B0100136A6176612F696F2F5072696E7453747265616D0100077072696E746C6E010015284C6A6176612F6C616E672F537472696E673B2956002100050006000000000002000100070008000100090000001D00010001000000052AB70001B100000001000A000000060001000000010009000B000C00010009000000250002000100000009B200021203B60004B100000001000A0000000A000200000003000800040001000D00000002000E

Bingo !!! Now, this looks like what we were looking for.
In the main method, I have used DatatypeConverter API to convert the byte array into hex format to print the content in more compact format. 

What is Bytecode

Bytecode is a series of instructions for the Java Virtual Machine and it gets stored in the method area (of JVM). Each instruction consists of a one-byte opcode followed by zero or more operands. The opcode indicates the action to be taken by JVM. The number of opcodes is quite small  (<256) and hence one byte is enough to represent opcodes. This helps to keep the size of the class file compact.

Important Observations on Generated Bytecode

  1. Java class file is a binary stream of byte. These bytes are stored sequentially in class file, without any padding between adjacent items. 
  2. The absence of padding ensures that class file is compact and hence can be quickly transferred over the network. 
  3. Items which occupy more than one byte are split into multiple consecutive bytes in big-endian style (higher bytes first ).
  4. Notice that, the first four bytes are "CAFEBABE" -known as the magic number. The magic number makes the non-Java class file easier to identify. If the class file doesn't start with this magic number then it's definitely not a Java class file. 
  5. The second four bytes of the class file contain the minor and major version numbers. 

Mnemonics Representation of Bytecode

Bytecode of HelloWorld program can be even represented as mnemonics in a typical assembly language style. Java provides class file disassember utility named as javap for doing it. javap utility provides multiple options for printing the content of class file. You can also use ASM Eclipse plugin to see disassembled bytecode of a class in Eclipse.

D:\workspace\JavaSample\src>javap -c HelloWorld.class
Compiled from "HelloWorld.java"
public class HelloWorld {
  public HelloWorld();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #3                  // String Hello World!
       5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: return
}

No comments:

Post a Comment