Wednesday, March 27, 2013

Function signatures and parameters

One of the things that is bothering me for quite a while is the fact that currently functions do not have type information attached to them. The abs function for example looks like this:

inline function abs(a) -> (((a < 0)?-a:a))

This function is of type inline, which means that its invocation will be replaced by the code it is declaring. The parameters are essentially just placeholders. But what happens when you invoke it with a bit type?

bit<16> a=0xFFFF;
bit<16> b=abs(a);

Well, this will cause an error on the generated code. So those kind of errors can only be discovered after the method has been inlined and the error message that can be produced is rather confusing because the user never sees the generated code. This is what type information is for. In Java and actually most programming languages you need to declare what type a parameter has.

void bla(int a, uint8_t b)

Additionally to be able to produce error messages much earlier, it gives the user a much better idea of what he can put into that method. So why does PSHDL not have this feature? Well, there is a rather strange problem that the width of a type introduces. For example the max method might have the following signature:

uint max(uint a, uint b)

This looks quite intuitive, but what happens when we want to use signed ints? Ok, let's add a second method (and implementation)

uint max(uint a, uint b)
int max(int a, int b)

Fair enough, but what about the variable bit-width ints? Lets make a signature that shows that you want to have any width number.

uint max(uint a, uint b)
int max(int a, int b)
int<> max(int<> a, int<> b)
uint<> max(uint<> a, uint<> b)

Now we have a total of 4 methods that all have the same implementation. One simplification would be to say that uint, which is 32 bit can be cast to uint<> implicitly. After all, array indices are cast from uint<> to uint implicitly as well. So that would reduce the number to 2 methods. Another optimization we could do is to create a "super type" num for primitives that can be compared and used in equations. This would reduce the number of methods to one again:

num<> max(num<> a, num<> b)

The downside however is that num is not of much use outside of method parameters. Introducing it can cause the impression that it could be used elsewhere. The same holds true for the notation of having the diamond width <>. But I think that is something the user might understand. But now that you have the <> width, what happens when no width is specified? Does that means it requires the 32 bit type? This is just calling for disaster when users are forgetting it. Actually you always would want to have the <>. But if it always has to be there, why not leave it away? That would be confusing because the the uint in parameters means something different than in other source parts. Better leave the <> there and throw a warning if it is absent.

Another problem remains though. The width of a and b could be different, it would be nice to be able to constrain that. One example:

uint<T> max(uint<T> a, uint<T> b)

This would constrain the width of the arguments to be the same. Determining whether two width are equal is non trivial. For example the width of one could be 2*P while the other is P*2. From an compiler point of view this is nontrivial, and in reality the expression for a width is likely to be more complex. So lets ignore this idea as well. While it is unfortunate that you may have to write the same function 3 times for all primitive types: uint, int and bit, it is much clearer than any of the other ideas.

Thursday, March 21, 2013

How much code do you actually write?

Just today I was wondering how much code I wrote for the PSHDL compiler. The numbers are pretty interesting. The core code of PSHDL is split like this:

I was curious on how much of that code I actually wrote on my own. And this is the result:

So it can easily be seen that a lot of my code is generated. The PSHDL AST code is generated from my DSL (Domain specific language) model. The ANTLR part is the code generated by the parser generator that I need for parsing input files. The XTend Output on the other hand is generated from the 4282 lines of input code. So to compare the input and generated output:

What can we learn from this?

A lot of code in the PSHDL compiler is generated from a sparse description. This has distinct advantages for maintenance. For example the code for the AST model contains lots of boilerplate code for things like hashing, equals, getter, modifier, constructors and other things. Writing this code is not just boring, writing it on your own is also very error prone. This is why PSHDL features generators that can generate this boilerplate code for you. One of our IPCores has 325 lines of PSHDL code. From this we generate the following code:

In a rough estimate the ratio of lines of PSHDL to lines of VHDL code is about 3 when the generator is not used. That means for 100 lines of PSHDL you get 300 lines of VHDL code. Another interesting thing to observe is the fact that very specialized generators can have crazy ratios from input to output, whereas the ratio gets down the more general a language gets. The ratio of 1:3 for PSHDL is possible because PSHDL is designed for (FPGA) Synthesis (and a little test-benching).

Another thing to consider is that the generated code can either be more compact or less compact than hand written output code. For XTend I found that the generated code is about twice as long as I would write it by hand. So you can say that 1 line of XTend is as powerful as 1 line of Java. But what is more important here is the readability. The XTend code is far more precise and easy to understand than the Java code. The generated VHDL code however is probably slightly smaller than what you would write by hand. This is caused by the fact that in for example a state machine typically 3 processes are used, while PSHDL will have just 2 in most cases.

Thursday, March 14, 2013

Four neat examples of PSHDL in action

If you still don't know why you should use PSHDL, here are four neat examples of what PSHDL can do for you! Generally the amount of generated VHDL code is up to 5 times of the original PSHDL code. For generators however this ratio is even worse (or better depending on how you see it). If you want to see the generated VHDL code, just paste it into the online compiler at: pshdl.org.

First example: Registers

module Calculus {
    param uint WIDTH=8;
    in uint<WIDTH> a,b;
    out register uint<WIDTH> sum=a+b;
}

If you want to describe a register in VHDL, you will have to go trough some effort to actually describe how the register is supposed to work. This includes creating a process with proper clk and reset signals. In PSHDL a register is created by placing the register keyword in front of a variable. That variable then will be realized with a proper register description.

But where does the clock come from? Well, in PSHDL there is a special reference called $clk that, if it is referenced, creates a 1 bit in port. The default register, without any further refinement, uses this as the clock. Of course you are free to declare any port to become the default $clk or specify another signal as clock. You can also change on what edge the register is working, wether the reset is active low or high, synchronous or asynchronous, the reset value etc..

Another thing that is different from VHDL is that you can place your ports anywhere in the file. This allows you to place code together that logically belongs together. I can hear you think: But I actually like the port declaration at entity level! There are 2 solutions for your concern, firstly, you can simply place all your ports at the top of the file if you want to, secondly you can annotate your module so that the compiler automatically creates an interface for you that will sit atop the module. But more about that in a later blogpost.

The most important thing to remember about PSHDL is that it reverses the synthesis results from VHDL, it is easy to create a register, while it is not possible to create a latch (unless you explicitly tell the compiler to do so). To a seasoned VHDL coder, this may seem like a triviality, but be clear:
Unexpected latches are the number one reason student designs don't work.

Second example: Interfaces

The idea of a component, the declaration of ports without providing the actual implementation, is called interfaces. An interface can be declared and implemented by a module. It can also be instantiated.

module de.tuhh.ict.BitAdder {
    in bit a,b,cin;
    out bit sum=a^b^cin; 
    out bit cout=(a&b)^(cin&(a^b));
}

interface de.tuhh.ict.IBitAdder {
    in bit a,b,cin;
    out bit sum, cout;
}

module de.tuhh.ict.rippleCarry{
    param uint WIDTH=16;
    in uint<WIDTH> a, b;
    uint<WIDTH> cTemp;
    out register uint<WIDTH> sum;
    //Instantiate the other module
    BitAdder adder[WIDTH];
    //Uncomment to instantiate the interface 
    //(you have to make sure that the synthesis tool 
    //can find the actual implementation)
    // IBitAdder adder[WIDTH];

    adder[0].a=a{0};
    adder[0].b=b{0};
    adder[0].cin=0;
    sum{0}=adder[0].sum;
    cTemp{0}=adder[0].cout;

    for (I={1:WIDTH-1}){
        adder[I].a=a{I};
        adder[I].b=b{I};
        adder[I].cin=cTemp{I-1};
        cTemp{I}=adder[I].cout;
        sum{I}=adder[I].sum;
    }
}

If you are wondering what the curly braces are doing: They can be used to access bits. What you can also see is that it is easy to use arrays. You can create arrays of instances and variables and access them with the rectangular braces, just like in other C based languages.

But what is more interesting here is that you don't need a port map. You simply say: Instantiate me a BitAdder or as in this case, a whole lot of BitAdders. Then you can access the ports of it with the dot notation.

If you change the BitAdder instance to IBitAdder, it is your task to provide an implementation of it with the same name. This is how you can instantiate any VHDL, Verilog or other black box. For VHDL however you get tool support to automatically create the interface for you.

Third example: Statemachines

module de.tuhh.ict.Statemachine{
    enum States={IDLE, WAITING, DOSOMETHING}
    //Do not assign a default state here unless you want to 
    //define a new state in each case statement
    register enum States state;
    in bit a;
    out register bit b;
    out bit c;
    switch (state) {
    case States.IDLE:
        if (a)
            state=States.WAITING;
    //You can let the Enum. away in switch cases where 
    //switch operates on an enum
    case WAITING: 
        if (!a)
            state=DOSOMETHING;
    case States.DOSOMETHING:
        b=1;
        c=1;
        state=States.IDLE;
    default:
        state=States.IDLE;
    }

    out bit x=0;
    if (state==IDLE)
        x=1;
}

In PSHDL there is no such thing as processes. As such the clock domains are only separated by the definition of the register. This makes it possible to describe a mealy state-machine (one where the outputs depend directly on the inputs) without the need for multiple processes. This increases readability as everything that is related to the state-machine can be found in one place.

Fourth example: Generators

In most FPGA designs these days, you will find some kind of processor. To extend this processor IPCores are used to accelerate certain functions. PSHDL makes it very easy to get a proper IPCore.

package de.tuhh.ict;
module BUSTest{
    include Bus bus=generate plb()<[
        row input{
            rw register uint<16> a;
            rw register uint<16> b;
        }
        row output {
            fill;
            r register uint<16> result;
        }
        column adder {
            input;
            output;
        }
        memory {
            adder[4];
        }
    ]>;
    for (i={0:3}){
        bus.result[i]=bus.a[i]+bus.b[i];
    }
}

This little example here, generates the infrastructure for a PLB based IPCore. In case you are wondering, AXI and ABP are supported as well. This code generates a total of 8 registers. Using a memory map, a description on how the peripheral can be accessed, is a pretty advanced feature, but It allows PSHDL to generate some very useful support files like this HTML representation of the memory layout:

Offset3116150Row
adder [0]
0 [0x00]a [0]b [0]input [0]
4 [0x04]unusedresult [0]output [0]
adder [1]
8 [0x08]a [1]b [1]input [1]
12 [0x0c]unusedresult [1]output [1]
adder [2]
16 [0x10]a [2]b [2]input [2]
20 [0x14]unusedresult [2]output [2]
adder [3]
24 [0x18]a [3]b [3]input [3]
28 [0x1c]unusedresult [3]output [3]

The generated Interface looks like this:

interface Bus{
   in register uint<16> result[4];
   inout register uint<16> b[4];
   inout register uint<16> a[4];
}

And it also generates the C code that you need to access the SW registers:

//Typedef
typedef struct input {
    bus_uint16_t    a;
    bus_uint16_t    b;
} input_t;
// Setter
int setInputDirect(uint32_t *base, int index, bus_uint16_t a, bus_uint16_t b);
int setInput(uint32_t *base, int index, input_t *newVal);
//Getter
int getInputDirect(uint32_t *base, int index, bus_uint16_t *a, bus_uint16_t *b);
int getInput(uint32_t *base, int index, input_t *result);
//Typedef
typedef struct output {
    bus_uint16_t    result;
} output_t;
//Getter
int getOutputDirect(uint32_t *base, int index, bus_uint16_t *result);
int getOutput(uint32_t *base, int index, output_t *result);
typedef struct adder {
    input_t input;
    output_t output;
} adder_t;

And for printing it nicely to STDOUT:

void printInput(input_t *data);
void printOutput(output_t *data);

Additionally it will also create all the files that you need to place it into your local Xilinx IP directory and instantiate it directly.

Monday, March 11, 2013

Thoughts about functions

In the last few days I thought about the following problem:

int a[5]={1,2,3,4,5};
int sum=a[0]+a[1]+a[2]+a[3]+a[4]+a[5];

It is quite clear what this code is supposed to do. But it scales very badly with the size of a. In a regular sequential programming language, you would solve the problem with a regular for loop like this:

int a[5]={1,2,3,4,5};
int sum=0;
for (I={0:4})
    sum+=a[I];

This is even correct PSHDL syntax. But due to its parallel nature, it does not do what the developer is expecting it to do here. In fact, this is just short for:

int a[5]={1,2,3,4,5};
int sum=0;
sum+=a[0];
sum+=a[1];
sum+=a[2];
sum+=a[3];
sum+=a[4];
sum+=a[5];

Well, this looks just as the same as the above. But you have to consider that the last assignment wins. A quick detour:

int a=0;
a=5;
a=7;

a is obviously 7, but it never has the value 5, not for a second, not for a pico second, just plain never! You can think about it like this: PSHDL walks through the code from the top to the bottom. When an assignment a=X is found, it remembers that upon the end of the code it will assign X to a. That is unless somewhere below it finds a=Z. In that case the X is assignment is totally forgotten and Z is assigned. The same holds true for our sum example. For PSHDL the slightly longer version is equivalent to:

int a[5]={1,2,3,4,5};
int sum=0;
sum+=a[5];

Sequential loops

One idea that comes to mind us to introduce a new keyword. For example sequential. When we now put that in front of for, the contents are interpreted sequentially. The example:

int a[5]={1,2,3,4,5};
int sum=0;
sequential for (I={0:4})
    sum+=a[I];

This would now do what the user expected it to do. But unfortunately we just gave the user a very powerful tool in his hands. There is nothing, well nothing that I want to code, that keeps him from writing:

int a[5]={1,2,3,4,5};
int sum=0;
sequential for (I={0:3}) {
    sum+=a[I];
    a[I+1]=0;
}

What is that supposed to do? The user now has the ability to write very dysfunctional code. It is very difficult to distinguish valid use cases from invalid ones, so better provide a clean concept for solving this problem.

Functions to the rescue?

However summing up an array is a very plausible thing to do and writing everything by hand is very error prune. Also it can not be done when the size of the array is parameterized. So my first idea for giving the user the ability to solve this problem properly was to allow to declare functions:

function uint doSum(uint a[]) {
    uint sum=0;
    for (I in a){
        sum+=I;
    }
    return sum;
}

int a[5]={1,2,3,4,5};
int sum=doSum(a);

This however has a few weaknesses:

  • The same syntax that is used for the parallel way of doing things is suddenly used for sequential stuff
  • The function can only sum uint arrays
  • It is very hard to create simulate-able code for this as the function would need to be unrolled into parallel code

While the second problem could be solved with type parameters, which would introduce other syntax problems, I really don't like the first problem. The user should be able to understand the model of PSHDL very quickly. Having to different kind of models does not help with that.

Maybe lambda functions?

Another thing I thought about: If using the same language to sometimes have parallel and other times have sequential meanings might confuse the user, why not use a different paradigm which can elegantly solve these kind of problems. In functional programming a solution might look like this:

int a[5]={1,2,3,4,5};
int sum=a.foldLeft{a,b|a+b};

This would have the benefit of being very short and precise. But it would demand from the user to have an idea how functional programming works. Nonetheless, the type problem would have been solved as well.

I have to admit that I like the lambda approach. But I am not sure whether many people will understand it. So to not cause utter confusion I decided that I will provide the user with sum, (left|right)Difference, xor, or, and and mul functions. Maybe those will map to a more generic version that takes an enum to select the operation like this:

int a[5]={1,2,3,4,5};
int sum=apply(a, PLUS);

The nice thing about the apply function is that you can also write it like this:

int sum=a.apply(PLUS);

Another nice aspect of it is that PLUS is an enum and as such you even provide the operation as parameter. This approach however has some disadvantages that arise from it's simplicity. How for example would we add every second item? Or how do something like a[0]*x + a[1]*x . Well those cases would require to decompose those expressions into temporary signals. On the other hand it would certainly be possible to have functions that can apply operations to an array. So that case would look like this:

int sum=a.map(MUL, x).apply(PLUS)