Threaded Code

Appeared in Volume 6/1, February 1993

Keywords: threads.

boortz@sics.se
Kent Boortz
23rd August 1992

Serge Le Huitouze writes:
I would like to know about "threaded code" implementations of Prolog.

This question is probably about how to write efficient emulators. The Prolog code in many implementations is compiled to "byte code". This is a sequence of data that is read at runtime by an emulator that performs the actions, i.e. executes the byte code. In an emulator written in C, like in SICStus, the byte codes are decoded in a switch statement like:

emulate:
  switch (*(pc++))
    {
    case GET_STRUCTURE:
      /* code to do a get_structure */
      goto emulate;
      .
      .
    case PROCEED:
      /* code to do a proceed */
      goto emulate;
    }
}

In pseudo assembler it could be compiled to:

r1 = jmparray;
*(r1 + 0) = get_structure;
 .
 .
*(r1 + n) = proceed;

emulate:
{
  r0 = *pc;
  if (r0 < GET_STRUCTURE) goto exit;
  if (r0 > PROCEED) goto exit;
  r0 <<= 2;
  r2 = *(r1 + r0);
  jmp(r2);
}

get_structure:
  /* code to do a get_structure */
  goto emulate;

proceed:
  /* code to do a proceed */
  goto emulate;

exit:

If the C compiler supports threaded code you can take the addresses of the labels and store them in an array. Indirect threaded code:

jmparray[GET_STRUCTURE] = get_structure;
 .
 .
jmparray[PROCEED] = proceed;

emulate:
  goto jmparray[*pc];
get_structure:
  /* code to do a get_structure */
  goto jmparray[*pc];

proceed:
  /* code to do a proceed */
  goto jmparray[*pc];

Note that the emulation code is placed at the bottom of the code for each byte instruction. This way the entry "emulate" is used only once and the jump to it is removed. In pseudocode:

r1 = jmparray;
*(r1 + 0) = &get_structure;
 .
 .
*(r1 + n) = &proceed;

emulate:
{
  r0 = *pc;
  r0 <<= 2;
  r2 = *(r1 + r0);
  jmp(r2);
}

get_structure:
  /* code to do a get_structure */
{
  r0 = *pc;
  r0 <<= 2;
  r2 = *(r1 + r0);
  jmp(r2);
}

proceed:
  /* code to do a proceed */
{
  r0 = *pc;
  r0 <<= 2;
  r2 = *(r1 + r0);
  jmp(r2);
}

Finally the predefined bytecodes could be replaced by the address to the labels at load time. This is "direct threaded code", in pseudocode:

emulate:
{
  r0 = *pc;
  jmp(r0);
}

get_structure:
  /* code to do a get_structure */
{
  r0 = *pc;
  jmp(r0);
}

proceed:
  /* code to do a proceed */
{
  r0 = *pc;
  jmp(r0);
}

A clever C compiler could compile the switch code and indirect threaded code to eqivalent code. The main problem is the range check of the argument to the switch. I think that if you use an enumerated type as the value to switch on, and all the values are covered in the cases the test should be removed. I don't think any C compilers do this. Note that each test is really at least two instructions, a compare and a jmp instruction.

The direct threaded code uses less code but instead each bytecode/jmpaddress requires a 32 bit field in the program area. The fetch could be slower and more memory consumed by the Prolog programs could lead to more frequent paging.

The only C compiler I know of that support threaded code is GCC 2.X.

No Prev