Wednesday, June 18, 2014

Should I remove inout?

When I started to design PSHDL, I had to have ports with direction in and out obviously. As VHDL and Verilog has inout as well, I thought, well, what's the harm? If you want to implement I2C, you definitely need to have that, right?

Well, it turns out that inout is a trouble maker. It works well on the top module, when it is assigned directly to a pin, but when you write a module that decodes the I2C protocol, this might not necessarily end up being the top module. You would want to wire the I2C sub-module up to the actual top module. But how do you do this?

The honest answer is, in PSHDL you currently can't. You can write:

interface I2C {
    inout bit scl, sda;
}

module top {
    I2C i2c;
    inout bit scl, sda;

    i2c.scl=scl;
    scl=i2c.scl;

    i2c.sda=sda
    sda=i2c.sda
}

This code is bad for multiple reasons. The first one is that it is two lines per port-mapping which is rather annoying. The second, and more important reason is: It doesn't work in VHDL. The generated code looks like this:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity top is
    port (
        scl : inout std_logic;
        sda : inout std_logic
    );
end;
architecture pshdlGenerated of top is
    signal \$map_i2c_scl\ : std_logic;
    signal \$map_i2c_sda\ : std_logic;
begin
    i2c : entity work.I2C
        port map (
            scl => \$map_i2c_scl\,
            sda => \$map_i2c_sda\
        );
    process(\$map_i2c_scl\, \$map_i2c_sda\, scl, sda)
    begin
        \$map_i2c_scl\ <= scl;
        scl <= \$map_i2c_scl\;
        \$map_i2c_sda\ <= sda;
        sda <= \$map_i2c_sda\;
    end process;
end;

Unfortunately this creates a multiple driver issue as \$map_i2c_scl\ is written in the port-map as well as in the the process later on. The proper VHDL solution for that is to map the signal directly:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity top is
    port (
        scl : inout std_logic;
        sda : inout std_logic
    );
end;
architecture pshdlGenerated of top is
begin
    i2c : entity work.I2C
        port map (
            scl => scl,
            sda => sda
        );
end;

One of the new features that I plan for v0.2 of the language are exports, which basically could accomplish exactly that. The idea behind exports is that you can say:

module top {
    I2C i2c;
    export i2c.scl;
    export i2c.sda;

    //or if you want to export a whole interface:
    export i2c;
}

This would then generate the desired VHDL code. While this might look like a valid solution for the problem of inouts, there are even more problems with them. For example, what happens when you read or write an exported signal? I tend to say that this should be allowed, but for inout's that would not be possible, and I hate rules that only work sometimes and not other times.

Every signal that is not an input is by default initialized with 0. This little trick prevents the creation of latches, as every signal always has a value assigned to it. But in the case of inout this feature is rather annoying as you sometimes don't want to drive that signal. The solution for that is the @VHDLLatchable annotation, which prevents the creation of the default 0 initialization. But this again feels more like a hack. Overall it can be observed that people are tempted to create things, that don't work well within FPGAs, such as shared busses, with high-z's. These don't really work within FPGAs as there are no tri-state lines. Those are mapped to two uni-directional signals with multiplexers.

Another problem are combinatorical loops. Those are especially easy to create with inouts.

Overall it appears that people can have many unwanted effects with inouts while gaining very little (they can only be used on top-level modules). Maybe the best idea is to simply remove them, and provide tooling to map a pair of in/out signals to one pin. A little annotation might help to designate pins that belong together, like this:

interface I2C {
    @iopin("scl") in bit scl_i;
    @iopin("scl") out bit scl_o;
    @iopin("sda") in bit sda_i;
    @iopin("sda") out bit sda_o;
}

Or maybe even a generic record (as will be part of v0.2) that consists of an in and out variable could be the solution.

interface ioPin {
    in bit sig_i;
    out bit sig_o;
}

interface I2C {
    record ioPin sda;
    record ioPin scl;
}
Please note that every signal can be assigned highZ(), which will turn into a VHDL 'z'. This can be used to tristate every output if necessary.

What do you think?

Monday, April 7, 2014

The future of PSHDL part 2 (modules and sequential behavior)

This is part 2 of the improvements that I plan for the next language release v0.2. The first part can be found here.

Thoughts about modules and sequential behavior

One of the questions I was asked during my presentation of PSHDL at the 30C3 was about creating a catalog of easily usable IP cores. After all, this is key to the success of Arduino, without its library it would only be a nice looking IDE, but not the success it is now. So this question is really a key-point in making PSHDL the Arduino for FPGAs.

When you take a look at OpenCores you will find plenty of cores that are freely available, but using them is hard. They are hard for multiple reasons. The first being documentation. After one spend such a long time developing an IP core that works, you really have to motivate yourself to write extensive documentation that allows others to make use of it. This usually includes lengthy documents about the usage scenario, the input data format, the output data format, the control signals, the expected flow of signals and many others. Those are usually described in english, which, depending on the author, can result in ambiguous descriptions.

So the best way to package an ip core is by having as many parameters computer parseable as possible. After all the language that most developers speak, is the language that they have written the IP core in. While some part of an IP core are very easy to formalize, such as the ports, other are harder to do. For example, when you want to describe the timing that is required by a module.

Everyone knows and uses state machines for describing sequential behavior in hardware. Those however can become very annoying when you have to interact with other state machines. Lets pretend we invented a very useful FPU that consumes a varying amount of time, depending on the operation. We also want to re-use that one FPU because because it is expensive. So for a simple math function like: f(x)=(a*b+c)^2 we have to write a state machine like this:

enum OpTypes={ADD, SUB, MUL};
interface FPU {
    in bit<32> a;
    in bit<32> b;
    in bit start;
    in enum OpTypes op;
    out bit<32> res;
    out bit done;
}

module MulAddSqr {
    FPU fpu;
    enum FunctionState={IDLE, MUL_START, MUL_WAIT, ADD_START, ADD_WAIT, SQR_START, SQR_WAIT};
    register enum FunctionState state;
    in bit<32> a;
    in bit<32> b;
    in bit<32> c;
    out register bit<32> res;
    in bit start;
    out bit done=state==IDLE;
    switch (state) {
        case IDLE:
            if (start)
                state=MUL_START;
        case MUL_START:
            fpu.a=a;
            fpu.b=b;
            fpu.op=MUL;
            fpu.start=1;
            state=MUL_WAIT;
        case MUL_WAIT:
            fpu.start=0;
            if (fpu.done) {
                state=ADD_START;
            }
        case ADD_START:
            fpu.a=fpu.res;
            fpu.b=c;
            fpu.op=ADD;
            fpu.start=1;
            state=ADD_WAIT;
        case ADD_WAIT:
            fpu.start=0;
            if (fpu.done) {
                state=SQR_START;
            }
        case SQR_START:
            fpu.a=fpu.res;
            fpu.b=fpu.res;
            fpu.op=MUL;
            fpu.start=1;
            state=SQR_WAIT;
        case SQR_WAIT:
            fpu.start=0;
            if (fpu.done) {
                state=IDLE;
                res=fpu.res;
            }
        default:
    }
}

That is a lot of code for something rather trivial. The reason this code is longer than it has to be, is the description of the state machine. It is not like we're really interested in what state the state machine is in, but really just that something is happening sequentially. Wouldn't it be awesome to be able to write something like this: (This is not yet implemented and subject to change, or maybe I will never implement it at all)

//... FPU und OptType declaration remain the same

statemachine bit<32> do(interface<FPU> fpu, -bit<32> a, -bit<32> b, -enum<OpTypes> op) {
    {
        fpu.a=a;
        fpu.b=b;
        fpu.op=op;
        fpu.start=1;
        nextState();
    }
    {
        if (fpu.done)
            return fpu.res;
    }   
}

module MulAddSqr {
    FPU fpu;
    in bit<32> a;
    in bit<32> b;
    in bit<32> c;
    out register bit<32> res;
    in bit start;
    out bit done=0; 
    statemachine do fpu_ctrl;
    statemachine mulAddSqr{
        $idle: {
            done=1;
            if (start==1)
                nextState();
        }       
        fpu_ctrl.run(fpu, a, b, OpTypes.MUL);
        fpu_ctrl.run(fpu, fpu.res, c, OpTypes.ADD);
        res=fpu_ctrl.run(fpu, fpu.res, fpu.res, OpTypes.MUL);
    }
}

Here I combine few things. Let's start with the state-machine keyword. Unlike a switch the state-machine does not have case labels. Instead every statement becomes a unique automatically generated label. If you want to move within certain states, you can optionally declare a label and use the nextState function with it. If you simply want to continue to the next state, you will have to call nextState without argument.

Internally state machines will be turned into modules. The inline state-machine mulAddSqr is replaced with the following equivalent code:

register enum mulAddSqr_states {$idle, 
    state_1_run, state_1_wait,
    state_2_run, state_2_wait, 
    state_3_run, state_3_wait } mulAddSqr_state;
enum mulAddSqr_states $nextState;
switch (mulAddSqr_state) {
    case $idle: {
        $nextState=state_1;
        done=1;
        if (start==1)
            mulAddSqr_state=$nextState;
    }
    case state_1_run: {
        $nextState=state_1_wait;
        do.fpu=fpu;
        do.a=a;
        do.b=b;
        do.op=OpTypes.Mul;
        do.run=1;
        mulAddSqr_state=$nextState;
    case state_1_wait: {
        $nextState=state_2_run;
        if (do.done)
            mulAddSqr_state=$nextState;
    }
}

The function like state-machine do is equivalent to the following module:

module do {
    @smStart
    in bit start;
    @smOp("a")
    in bit<32> a;
    @smOp("b")
    in bit<32> b;
    @smOp("op")
    in enum OpTypes op;
    @smDone
    out bit done;
    @smResult
    out bit<32> result;
    @smOp("fpu")
    import record FPU fpu;

    register enum states {$idle, state_1, state_2} state;
    enum states $nextState;
    switch (state) {
        case $idle: {
            $nextState=state_1;
            if (start)
                state=$nextState;
        } 
        case state_1: {
            $nextState=state_2;
            fpu.a=a;
            fpu.b=b;
            fpu.op=op;
            fpu.start=1;
            state=$nextState;
        }
        case state_2: {
            $nextState=$idle;
            if (fpu.done){
                result=fpu.res;
                done=1;
                state=$idle;
            }
        }
    }
}

There are still some issues left to investigate, but I have people working on that. I think the most important aspect of all this is that you can write re-usable state-machines and create sequential behavior much easier.

Tuesday, April 1, 2014

The future of PSHDL (part 1)

With the PSHDL board campaign running, I think it is important to take a look at the future of PSHDL. In a series of posts I will show what I am working on right now and what can be expected to be realized within the next few month.

PSHDL Language features for V0.2

While I am busy with fixing the bugs that are being reported, I am also thinking about the next language features that I want to implement. In this blog entry I want to give a little preview of what I have in mind. Everything mentioned here is work in progress and subject to change, but I would be interested in what you think.

Any width types

One of the things that I see rather frequently are code snippets like this:

in bit<16> addr;
out uint<16> bla;
bla=(uint)addr;

This code does not do what the author intended it do to. It transforms the 16 bit value into a 32 bit integer, and then back to a 16 bit integer. Fortunately the synthesis is smart enough to ignore this, but when you replace 16 with 64, things can get ugly. So a new type will be introduced, the any width type, which allows to write something like this:

in bit<16> addr;
out uint<16> bla;
int<> temp=bla;
bla=(uint<>)addr;

The new type takes the width of the right-hand side and simple changes the value interpretation. It can also be used to create temporary new signals, but those signals are only allowed to be written exactly once, with the declaration.

Records or structs

Sometimes it makes sense to keep things together that belong together. For example an SPI Bus can have an interface like this:

interface SPI {
    in bit miso;
    out bit mosi;
    out bit sclk;
    out bit ss_n;
}

Now, if you want to conect some SPI busses internally, you would need to write something like this:

testbench SPITest {
    SPIMaster dut;
    SPISlave dummy;
    dut.miso=dummy.miso;
    dummy.mosi=dut.mosi;
    dummy.sclk=dut.sclk;
    dummy.ss_n=dut.ss_n;
}

With a record you could do something like this:

testbench SPITest {
    SPIMaster dut;
    SPISlave dummy;
    record SPI bus;
    bus.connectTo(dut);
    dummy.connectTo(bus);
}

Only signals with the same type, width and name are connected. The direction of all signals has to be the same or the opposite of the record. With that rule one might actually write:

testbench SPITest {
    SPIMaster dut;
    SPISlave dummy;
    dut.connectTo(dummy);
}

With the records another new feature can be implemented...

Conditional instances

When you design a library, for example a clock divider IP core, chances are that you will have to use a vendor specific IP core.

interface IClockDivider {
    in bit clk;
    out bit scaledClk;
}

module ClockDivider {
    export record IClockDivider div;
    switch (vendor)
        case Xilinx:
            import xilinx.*;
            PLL pll;
            pll.clock=div.clk;
            div.scaledClk=pll.clkX;
        default:
            assert("Only supporting Xilinx");
    }
}

The export keyword would make the record appear as regular signals on the module. The vendor is an enum, that is defined in pshdl.* namespace, whose value is specified via synthesis settings to the compiler.

Combined declaration and instantiation

Another simplification is that an enum can be declared in and instantiated at the same time. This eases the default case when you want to use your enum for a state-machine immediately.

register enum X {A,B} inst;
interface VHDL.work.Blínk {
    in bit clock;
    in bit reset;
    out bit led;
} blink;

To the future and beyond!

Another very important feature that is being worked on are re-usable modules. This is something that dedicates its own chapter and will be posted in the future.

Friday, March 28, 2014

PSHDL Board IndieGoGo Campaign

Last year during the 30C3 I announced the PSHDL board, and promised to create a crowd-funding campaign for it. Many people followed my call to give me an estimate on what they were willing to pay for this board, and from that data we concluded that it would be realistic to create such an campaign.

Unfortunately this took us longer than anticipated. The biggest risk involved in our campaign is that it is a physical product, that people can do stupid things with. In order to be on the safe side, and not ending up in a million dollar debt after an US law suit, we had to arrange for an insurance. This turned out to be a rather slow process (mostly caused by the insurance companies not reacting upon our inquiries).

But now all the legal problems have been addressed and we are very proud to announce that our campaign is now online at indiegogo.

At this point of time we already collected more than a sixth of the required funding in about two days. This leaves us very optimistic that we can get through with this campaign. This doesn't mean that we don't need help. We can need as much help as possible, and even if you decide not to back our campaign, it would be very nice if you could help us to spread the word.

Board details

As some people had some difficulties to understand how all of the board IOs are connected, we also created a new graph that hopefully helps you to understand it better.

High Level view

We also send the latest prototype to manufacturing and expect it to arrive in about two weeks.

Monday, March 17, 2014

Web updates

When I started to develop PSHDL, I thought that having a web ui might be a good idea. This allowed users to see what PSHDL can do without ever having installed a thing. The first version that i created was written in GWT and it work well. But when I realized that I can simulate PSHDL, and even do so in the browser, I thought it was time for an upgrade.

In the middle of last year i created the beta web interface that was written in Dart with Web UI. It turned out to be a quite popular and comfortable development environment and many people started using it. It is also powered by a generic REST API that can be used from other services. After such a long time of testing I am confident that it is time to promote the beta to the front page.

But that doesn't mean I won't have a beta anymore. In the last 2 month I completely rewrote the beta editor, yet again with Dart, but this time with the more modern Polymer framework and a much cleaner separation of concerns. While it does work quite well already, I am not perfectly happy with it. It does have some performance issues that needs to be addressed before I promote it to the front page.

New features

The new beta has some unique new features that I quickly want to present here:

  • The page now scales much better with lower resolutions.
  • Simulation
    • The simulation is now in most cases faster as the update rate is fixed at a maximum of 10 FPS.
    • The LEDs are now dimmed with the duty cycle.
  • Cloud synthesis
    • You can now create board definition files with a dialog.
    • It is now possible to create the synthesis definition files with a dialog.
      • Ports can be visually located on the PCB
  • Workspace
    • The ace editor recycles the session, which stores undo/redo history and cursor position
    • Dirty marking is now based on SHA checksum
    • Editor content is refreshed when a local-helper is updating a file

For the upcoming Indiegogo Campaign, I created a video that demonstrates how the new web UI can be used to get a blinking LED within a few minutes.

While old bookmarks of workspaces still kind of work, they should be updated with the new location.

Indiegogo campaign

During the 30C3 I announced that we plan to finance the initial PSHDL board with a crowd-funding campaign. While from a technical point of view everything is ready to launch the campaign tomorrow, there are some legal issues that we have to take care of. Tom needs to create an insurance for his company, unfortunately the insurance company takes an absurd amount of time to process our insurance request. When this step is resolved, we will launch it immediately.

The board underwent some design improvements. It now has a PMOD compatible connector that is connected to the Atmel and 4 blue LEDs on the upper side. This makes the board more usable even without LED arms. Also the Atmel now has some ADC/DAC Pins on a connector.

Friday, January 3, 2014

New features: PSHDL board and cloud synthesis

In the last few weeks, I was very busy with all kind of things. The two most important ones were to develop my PSHDL Board and the second one was to implement cloud synthesis. In this blog post I want to give some background into why I developed those.

Btw. both were introduced at the 30C3 congress, that I enjoyed very much. There is a recording of that presentation available at youtube my FPGA 101 is also available online. I was very surprised by how much positive feedback I received. Thanks a lot for this! This is very encouraging to continue this work.

PSHDL Board

When you start developing FPGAs you may wonder yourself: What should I do with them?! And unfortunately there is no single good answer to it. A lot of stuff can be done bei either an µController or a full blown CPU. Not so many people want to perform hardware glitching, sniff high-speed buses, or do the other stuff that FPGAs are good at. Especially since FPGAs are now out of the bitcoin mining league and only ASICs may provide a viable business case. So my idea is to create a little toy that gives an FPGA newcomer something fun to play with for some time. This is why I created the PSHDL board.

The primary thing to play with are the LEDs of course. Each LED Arm contains 4 RGB LEDs that can be controlled individually. With 4 arms, you can already do some pretty sweet looking things. The idea here is that you first learn how to control the LEDs and make them blink, or light in different colors. The next step would then be to create animations, for which you need state machines. After that you can start playing with connectivity or create simple games. You can either attempt to receive data from your PC, or interconnect multiple boards. For interconnection you can stack vertically with a 45° angle, or horizontal via the 4 pin headers at the end of the arm. If all of that gets too boring you can also implement a small CPU and have some fun with that. There is a gallery with pictures available online. I will implement some cool things hopefully soon.

The board itself contains an Atmel XMega, which either will be the XMega32A4u, or the XMega128A4u. The later would have some room for implementing interesting things, while the first is the bare minimum to fullfil its purpose (interfacing with the PC and programming the FPGA). It is also possible to use the ADCs for reading in analog data and passing that to the FPGA, connecting something to the I2C pins or do something entirely different with it. This chip also allows the PSHDL board to be programmed from any operating system.

I created a website where I am asking people what they are willing to pay for the board. The answers look very promising. I think I can realistically build and sell the boards for that price when I am able to sell more than 250, which also appears to be a rather realistic number. If you have not participated you can give your vote here.

Cloud synthesis

One of the major annoyances that keeps me from enjoying programming FPGAs is the fact that you have to install a vendor toolchain. Those are generally available for Linux and Windows. While I don't mind running either of those, I either have to fire up a VM, or use my Windows Desktop machine that is the most silent when it is switched off.

While the best solution would be to simply incorporate an open source synthesis flow into PSHDL, this is unfortunately not realistic. There are academic tool flows that can go down to place and route VTR Rapidsmith, but the one step they are missing is generating the FPGA configuration. The reason why they can't do this is the stubbornness of FPGA vendors. They think that laying open their exact specification for the FPGA configuration would harm their business. I think that is a rather stupid statement. A lot of companies are benefiting from GCC for example and the EDA industry would do good to step up and give developers the spec to implement this last crucial step. It is not like I would care about the state machine for programming the security stuff. I am perfectly fine if only the unencrypted configuration would be possible.

There are people that are reverse engineering some configuration files for some FPGAs, but I think that is the wrong way to do. The tools will always play catch-up with the latest technology and are prone to law-suits. Plus with an incomplete knowledge of the FPGA architecture it is possible to short circuit logic within the FPGA and ultimately damage it.

So how does cloud synthesis then work? Well, it does just hide the invocation of the vendor tool chain by putting it onto a different machine. In order to enable cloud synthesis, you have to attach a so called local-helper to your workspace. This local-helper creates a two way sync between your workspace and a directory. The web client then sends a message to the local-helper which then invokes the synthesis tools with the files it just downloaded. Exactly for this kind of purpose there is a PubSub channel on my REST API. So the machine that the synthesis is running on does not need to have any open ports or alike. It just uses a single SSE connection to the workspace.

Unfortunately this workflow is rather cumbersome right now. I am working on documenting it and increasing the usability. So please give me some time here :)

Sunday, October 13, 2013

Casting problems (part 2)

In the previous post I discussed some problems with arithmetic operations. This is the continuation of this blog series.

When you look at a cast operation, you might think: What's the deal? That can't be hard to implement correctly right? At least that was my first thought when I started to implement it.

Lets start with some simple C Code:

uint8_t  a=0xFF;
uint16_t b=(uint16_t)a;
int16_t  c=(uint16_t)a;
uint16_t d=(int16_t)a;
int16_t  e=(int16_t)a;  
uint16_t f=(int8_t)a;
int16_t  g=(int8_t)a;
printf("a=%4x b=%4x c=%4x d=%4x e=%4x f=%4x g=%4x\n",a,b,c,d,e,f,g);

The output of this program is:

a=  ff b=  ff c=  ff d=  ff e=  ff f=ffff g=ffffffff

The output of b til e is rather unsurprising, because a is unsigned nothing really happens when it is cast to a larger size. With f however the type is changed to a signed type and then resized to a uint16_t. Upon printf'ing the value, it is converted as every usigned. F on the other hand is correctly sign extended from int8 to int16 and then to 32 bit.

The rule I extract from this for PSHDL is the following:

  • Upon a cast the type is firstly resized with sign extension if the operand of the cast is a signed type. It doesn't matter what the cast itself is doing, the signedness of the operand determines whether sign extension is used.
  • The type is changed after the resize operation. You can acutally see that in VHDL.

PSHDL Example:

int<8> a=-5;
uint<16> b=a;

results in the following VHDL code:

b <= intToUint(resizeInt(a, 16));

The type is is first resized as a signed int to 16 bit and then the type is converted from signed int to usigned uint.

Sign extension

When a signed value is resized to a larger size, the additional bits have to be filled with something. For a proper sign extension, the MSB is used. So a 4 bit number 1011 is sign extended to a 8 bit by using the first bit, the MSB and filling up the first 4 bits with it. The result is then 11111011. When the MSB is zero, as in 0011, then the result would be 00000011. But what about a reduction of size?

When the size is reduced two possible ways can be taken. The first one is to simply clip the value, which is what C does. The reasoning is that, when you do a width reduction, you know what you're doing and it is your task to ensure that the result still makes sense. A simple example to demonstrate what problems might arise:

uint16_t  a=0xFFFF;
uint8_t b=a;
int8_t  c=a;

int8_t   d=-1;
uint16_t e= (uint8_t)d;
uint16_t f=d;
printf("a=%4d b=%4d c=%4d d=%4d e=%4d f=%4d\n",a,b,c,d,e,f);

The output of this is:

a=65535 b= 255 c=  -1 d=  -1 e= 255 f=65535

As you can see, it can happen that during the width reduction, an usnigned positive value can be become an unsigned negative value. This also works vice versa if the type is changed first, and then it is resized. To avoid that the principle of sign-extension can be used even for reducing the width, which is what the ieee.numeric resize operation is doing in VHDL. For PSHDL however I chose to implement the C way of resizing. Mostly because it doesn't alter the bits in unexpected ways. But it has the down-side that a change in signedness may happen. When you perform a downsize and it can affect your value data bits, you're probably doing something wrong, or you really don't care.

Implementing sign extension

In a programming language where you know the size of your variable, there is a very simple, yet effective way of implementing sign extension. Lets assume we want to cast a int<8> to an int<7>:

uint64_t data=0xFF; //Bits from an int<8>
uint64_t shift=64-min(8,7);
uint64_t seData=(((int64_t)data)<<shift)>>shift;

The minimum of the target and the current size is taken to ensure that the function works in either direction (from int<7> to int<8> and vice versa).The MSB of the current or target is then shifted to the MSB of the variable which in this case has 64 bits. The arithmetic shift is then used to perform a sign correct extension.

This was easy, but what about the implementation in a true arbitrary arithmetic? This will have to wait for the next blog post.