Compilation phases
The compiler makes 3 passes for a full compilation:
- precompile (applies to both the compiler and interpreter)
- p1
- p2
The precompiler
The precompiler does a number of source code transforms which are semantically equal, in order to reduce the number of permutations the compiler should accomodate for. By no means the list below is limitative, but these are the ones I found so far:
- expansion of multiple SETF pairs into separate SETF statements
- the same for SETQ
- expansion of lambda expressions in the operator position into explicit FUNCALLs
Compilation phase 1: p1
This phase analyzes the code, recording which variables are created where, what their visibility is, which closures get created, what their arguments are and if any variables are special.
Compilation phase 2: p2
This phase creates byte code output.
It does this by:
- Using the outcomes from p1
- Breaking down forms into small pieces
- Creating java 'instructions' from the small pieces
The generated instructions differ from byte code in so far that they are not stored in arrays of octets, but instead in an array (or list) of 'instruction'-structures. The instructions generated by this phase are somewhat random and unoptimized.
Byte code munging phase
This phase does byte code calculations. One of the things this phase does, is generate the applicable byte codes - although it takes a half-harted stab at choosing the optimal ones, currently.
- Analyzing the byte code, deleting unused branches (jvm.lisp::OPTIMIZE-CODE)
- Optimize byte code by walking the byte codes, replacing inefficient sequences with more efficient ones
- Translating the instructions from the instructions array to an array with byte-codes (octets) Infrastructure to do this step can be found in compiler-pass2.lisp in the form of resolvers.
- Writing out a file according to the format specified for .class files
Resolvers
The third step in byte code munging is resolving byte codes to applicable byte codes. In this phase, similar byte codes - like bipush and sipush - are treated as the same. The actual applicable byte code is generated.
The stab at generating good byte code is half-harted, because byte codes like iconst_1, aload_0, etc, are not used for optimal output.