Monday, May 01, 2006

Java Security Basics

Java Security Series Part 1

Java takes a holistic approach when dealing with security and handles it at two basic levels as shown below

  1. Language construct level
  2. Runtime
Language Syntax Security Features

Building on the problems faced by languages such C/C++, many of the features of these languages are considered unsafe and left out. These features not only feed into the complexity of the language but also are the source for programmer errors. Some of the features that make Java secure are -

  1. Single Implementation Inheritance
  2. Strong Type checking
  3. No support for pointers
  4. Array bounds checking
  5. Other miscellaneous features such as safety against use of uninitialized variables
Because of this, it makes so many lesser options for the programmers to construct blunders. The compiler makes sure that the bytecode generated form the Java source is safe. However, since the actual byte codes are standardized in JVM specifications, for an informed hacker, it should be fairly easy to concoct byte code sequences. However, this situation is taken care of later during the class loading process when the class loaders send the loaded byte codes into the JVM for Byte Code Verification.

Runtime Security Features
The runtime facilities kick in as soon as the JVM is started and made to execute the Java application. Some of key contributors for a secure environment are -

  1. ClassLoaders
  2. ByteCode Verification Process
  3. Security Framework

Class Loaders

When JVM is instructed to start a Java application one of the first things it does is the creation of "System Classloader". This system class loader is then reponsible for loading up the byte code for the application java class file and executing it. There are typically three class loaders that get installed when JVM starts up. These are -

  1. Bootsrtrap Class loader - Since class loaders are themselves classes, we run into the classic chicken-egg problem. Who loads the class loader classes. This is solved by a native. This class loader is also called Null or Primordial class loader. This class loader is responsible for loading the Java system classes from rt.jar in jre/lib directory.
  2. Extensions Class loader - This is the child of Bootstrap class loader is basically a secure URL class loader responsible for loading classes in jre/lib/ext directory.
  3. System Class loader - This is the class loader which is responsible for acting on the CLASSPATH environment variable. It is also a secure URL class loader type and is the child of Extensions class loader. Though it is called System Classloader, it actually loads up application classes.

Class loader instances at runtime have parent child relationship between them with the boot strap class loader being the root parent. Below this class loader the extensions class loader and system class loader form a linear chain. By creating user class loader and then building up hierarchies, a very structured partitioning system can be formed with each class loader forming a node in the tree and all the classes loaded forming the leaves. Same class loaded by different class loader are actually of two different types and hence the partition. Another benefit of this kind of hierarchy is that it allows delegation when trying to load the classes. So, when a class is required to be loaded, by default the system class loader is requested. System class loader first delegates this loading to its parent class loader which is the extensions class loader which in turn delegates to the bootstrap classloader. If bootstrap class loader cannot load the class (because the class is not a system class), then the extensions class loader attempts and even if this fails then does the system class loader attempt to load. If a user classloader is being used, the same logic works here too. If the user class loader is constructed using the default constructor, then the parent is the system class loader, otherwise, the constructor takes in the parent class loader.

The loadClass API is actually in the java.lang.ClassLoader class, and typically it is not overloaded. Its implementation does the following -

  • See if the class is already loaded and is present in the class cache and if present and needed to resolve the class, resolve the class and return.
  • Otherwise, delegate loading to parent class loader.
  • If none of the parent class loaders can load the class, then call findClass. findClass is supposed to locate the actual class data and define the class and return the Class object corresponding to the class.
  • If needed to resolve the class, resolve the class and return.

Typically user class loaders need to only override findClass method. The overridden findClass method should -

  • "find" the raw bytes pertaining to the bytecode of class
  • "define" the class - process of creating a "Class" object for the class from the raw bytecodes and associating a protection domain (security domain of the class - discussed later)

A class object encapsulates the entire RTTI information. My understanding here is that it encapsulates both the Reflection information (means to find the capability of a class) and Introspection information (means to find the type identity of the class such as type name, inheritance structure etc used in instanceof operator). If the same class is loaded by different class loaders, then obviously the RTTI information will not be the same and hence, these will be two different types. Hence, in Java, a type of an object at runtime is determined both by its class and the class loader.

Now, before this class can be used and executed, it needs to be "resolved" - linked - process of taking the Class and including it into the runtime state of the JVM so that it can be executed. Thsi process involves basically three steps -

  • Byte Code Verification - making sure that the byte code does not break any JLS guarentees.
  • Preparation - allocation of all the static storages such as class variables etc and v-table initialization. The static storages allocated are initialized to default values. However, please note that static initializers are not run, as we are still not in a position to execute anything.
  • Symbol resolutions - checking the referred symbols and loading up these classes and interfaces. The JVM specification is not too strict as to when this can happen. The referred symbols can statically resolved right at this point, or more lazily at runtime. The strategy depends on the implementation

Finally before any code can be executed, the class needs to be initialized. This step involves all the static initializations and initializer list and block execution. Any super classes need to be loaded, verfified, prepared, resolved and initialized at this time. An interesting point here is that any interfaces implemented by the class or any super interfaces of the interface need not be initialized. This is because any interface fields are public static final and hence are compile time constant. In any case, the initialization should happen before -

  • T is a class and an instance of T is created.
  • T is a class and a static method declared by T is invoked.
  • A static field declared by T is assigned.
  • A static field declared by T is used and the field is not a constant variable.

After this point, the class is ready for execution.