Monday, April 7, 2014

The future of PSHDL part 2 (modules and sequential behavior)

This is part 2 of the improvements that I plan for the next language release v0.2. The first part can be found here.

Thoughts about modules and sequential behavior

One of the questions I was asked during my presentation of PSHDL at the 30C3 was about creating a catalog of easily usable IP cores. After all, this is key to the success of Arduino, without its library it would only be a nice looking IDE, but not the success it is now. So this question is really a key-point in making PSHDL the Arduino for FPGAs.

When you take a look at OpenCores you will find plenty of cores that are freely available, but using them is hard. They are hard for multiple reasons. The first being documentation. After one spend such a long time developing an IP core that works, you really have to motivate yourself to write extensive documentation that allows others to make use of it. This usually includes lengthy documents about the usage scenario, the input data format, the output data format, the control signals, the expected flow of signals and many others. Those are usually described in english, which, depending on the author, can result in ambiguous descriptions.

So the best way to package an ip core is by having as many parameters computer parseable as possible. After all the language that most developers speak, is the language that they have written the IP core in. While some part of an IP core are very easy to formalize, such as the ports, other are harder to do. For example, when you want to describe the timing that is required by a module.

Everyone knows and uses state machines for describing sequential behavior in hardware. Those however can become very annoying when you have to interact with other state machines. Lets pretend we invented a very useful FPU that consumes a varying amount of time, depending on the operation. We also want to re-use that one FPU because because it is expensive. So for a simple math function like: f(x)=(a*b+c)^2 we have to write a state machine like this:

enum OpTypes={ADD, SUB, MUL};
interface FPU {
    in bit<32> a;
    in bit<32> b;
    in bit start;
    in enum OpTypes op;
    out bit<32> res;
    out bit done;
}

module MulAddSqr {
    FPU fpu;
    enum FunctionState={IDLE, MUL_START, MUL_WAIT, ADD_START, ADD_WAIT, SQR_START, SQR_WAIT};
    register enum FunctionState state;
    in bit<32> a;
    in bit<32> b;
    in bit<32> c;
    out register bit<32> res;
    in bit start;
    out bit done=state==IDLE;
    switch (state) {
        case IDLE:
            if (start)
                state=MUL_START;
        case MUL_START:
            fpu.a=a;
            fpu.b=b;
            fpu.op=MUL;
            fpu.start=1;
            state=MUL_WAIT;
        case MUL_WAIT:
            fpu.start=0;
            if (fpu.done) {
                state=ADD_START;
            }
        case ADD_START:
            fpu.a=fpu.res;
            fpu.b=c;
            fpu.op=ADD;
            fpu.start=1;
            state=ADD_WAIT;
        case ADD_WAIT:
            fpu.start=0;
            if (fpu.done) {
                state=SQR_START;
            }
        case SQR_START:
            fpu.a=fpu.res;
            fpu.b=fpu.res;
            fpu.op=MUL;
            fpu.start=1;
            state=SQR_WAIT;
        case SQR_WAIT:
            fpu.start=0;
            if (fpu.done) {
                state=IDLE;
                res=fpu.res;
            }
        default:
    }
}

That is a lot of code for something rather trivial. The reason this code is longer than it has to be, is the description of the state machine. It is not like we're really interested in what state the state machine is in, but really just that something is happening sequentially. Wouldn't it be awesome to be able to write something like this: (This is not yet implemented and subject to change, or maybe I will never implement it at all)

//... FPU und OptType declaration remain the same

statemachine bit<32> do(interface<FPU> fpu, -bit<32> a, -bit<32> b, -enum<OpTypes> op) {
    {
        fpu.a=a;
        fpu.b=b;
        fpu.op=op;
        fpu.start=1;
        nextState();
    }
    {
        if (fpu.done)
            return fpu.res;
    }   
}

module MulAddSqr {
    FPU fpu;
    in bit<32> a;
    in bit<32> b;
    in bit<32> c;
    out register bit<32> res;
    in bit start;
    out bit done=0; 
    statemachine do fpu_ctrl;
    statemachine mulAddSqr{
        $idle: {
            done=1;
            if (start==1)
                nextState();
        }       
        fpu_ctrl.run(fpu, a, b, OpTypes.MUL);
        fpu_ctrl.run(fpu, fpu.res, c, OpTypes.ADD);
        res=fpu_ctrl.run(fpu, fpu.res, fpu.res, OpTypes.MUL);
    }
}

Here I combine few things. Let's start with the state-machine keyword. Unlike a switch the state-machine does not have case labels. Instead every statement becomes a unique automatically generated label. If you want to move within certain states, you can optionally declare a label and use the nextState function with it. If you simply want to continue to the next state, you will have to call nextState without argument.

Internally state machines will be turned into modules. The inline state-machine mulAddSqr is replaced with the following equivalent code:

register enum mulAddSqr_states {$idle, 
    state_1_run, state_1_wait,
    state_2_run, state_2_wait, 
    state_3_run, state_3_wait } mulAddSqr_state;
enum mulAddSqr_states $nextState;
switch (mulAddSqr_state) {
    case $idle: {
        $nextState=state_1;
        done=1;
        if (start==1)
            mulAddSqr_state=$nextState;
    }
    case state_1_run: {
        $nextState=state_1_wait;
        do.fpu=fpu;
        do.a=a;
        do.b=b;
        do.op=OpTypes.Mul;
        do.run=1;
        mulAddSqr_state=$nextState;
    case state_1_wait: {
        $nextState=state_2_run;
        if (do.done)
            mulAddSqr_state=$nextState;
    }
}

The function like state-machine do is equivalent to the following module:

module do {
    @smStart
    in bit start;
    @smOp("a")
    in bit<32> a;
    @smOp("b")
    in bit<32> b;
    @smOp("op")
    in enum OpTypes op;
    @smDone
    out bit done;
    @smResult
    out bit<32> result;
    @smOp("fpu")
    import record FPU fpu;

    register enum states {$idle, state_1, state_2} state;
    enum states $nextState;
    switch (state) {
        case $idle: {
            $nextState=state_1;
            if (start)
                state=$nextState;
        } 
        case state_1: {
            $nextState=state_2;
            fpu.a=a;
            fpu.b=b;
            fpu.op=op;
            fpu.start=1;
            state=$nextState;
        }
        case state_2: {
            $nextState=$idle;
            if (fpu.done){
                result=fpu.res;
                done=1;
                state=$idle;
            }
        }
    }
}

There are still some issues left to investigate, but I have people working on that. I think the most important aspect of all this is that you can write re-usable state-machines and create sequential behavior much easier.

No comments:

Post a Comment