



## Codesign Embedded System

Ce document est la propriété de Secapem. Il ne peut être utilisé, reproduit ou divulgué, même partiellement, sans son autorisation écrite préalable.

This document is the property of Secapem. It may not be disclosed, used or reproduced, in whole or in part, without the prior written consent of Secapem.







### 1. TABLE OF CONTENTS

| 2. | SOF   | TWARE/HARDWARE CODESIGN          | 3  |
|----|-------|----------------------------------|----|
| 3. | OVE   | ERVIEW                           | 6  |
| 4. | COE   | DESYS HARDWARE ARCHITECTURE      | 6  |
| 4  | .1    | CodESys processor                | 8  |
| 4  | .2    | Clock system:                    | 8  |
| 4  | .3    | Reset system:                    | 8  |
| 4  | .4    | GLED signal:                     | 9  |
| 4  | .5    | UART Bus :                       | 9  |
| 4  | .6    | Slave Bus                        | 9  |
|    | 4.6.1 | SLAVE Write Access               | 9  |
|    | 4.6.2 | SLAVE Read Access                | 10 |
| 4  | .7    | Master Bus                       | 10 |
|    | 4.7.1 | Master Write Cycle               | 11 |
|    | 4.7.2 | Master Read Cycle                | 12 |
|    | 4.7.3 | MASTER READ/WRITE cycles         | 13 |
| 4  | .8    | FPU bus                          | 13 |
| 4  | .9    | CodESys Interrupt                | 14 |
| 5. | COE   | DESYS SOFTWARE API               | 15 |
| 5  | i.1   | SLAVE API                        | 15 |
|    | 5.1.1 | SLAVE write method               | 15 |
|    | 5.1.2 | SLAVE Read method                | 15 |
| 5  | 5.2   | MASTER DMA API                   | 15 |
| 6. | COE   | DESYS PLATFORM: ARKEON           | 18 |
| 7  | COI   | DESVS: SOFTWARE DEVELOPMENT TOOL | 22 |



Page : 3/22



# CodESys

## Codesign Embedded System

#### 2. SOFTWARE/HARDWARE CODESIGN

Software systems based on Turing/Von Neumann machines such as processor or microcontrollers. They are based on a CPU (Central Processing Unit) which executes sequentially a simple instruction. Their main benefits are to be easy to program, to provide a high flexibility of reprograming and to be able to execute any complex operation. The first disadvantage of these systems is to execute only one task at time. The second disadvantage is to segment an operation into a set of simple



instructions which slows down the execution of the operation. Also, if the system





uses an operating system, it is difficult to predict how much time the operation will take (furthermore if the system has several levels of memory cache).

In hardware systems the operation to execute is directly wired in hardware component such as an ASIC or a FPGA. The operation is executed very quickly because it is directly implemented with electronic component. Also, if the operator is wired several times in the component, several operations can be executed at same time. Because the implementation doesn't depend on instruction set or operating system it is much easier to predict the execution time of an operation. But describing hardware architecture in VHDL or Verilog description language is a little bit more complicated than using a programming language for software systems. The signal propagation time, the clock synchronisation, the critical path (etc...) require some experience. Also, when hardware component is configured to execute an operation it is harder to reconfigure it than reprogramming a software system. Without entering into details, the debug process of a hardware system requires also some equipment (scopes, logic analyser, etc.) and experience.

These two kinds of systems have complementary benefits and disadvantages. That's why the idea to design systems which merge soft and hard operators has been born (developed). This kind of mixed design is called « hardware-software codesign »: software to keep advantage of programming languages (flexibility, reprogramming, etc..) and hardware to take benefit from high speed execution and time predictability.

The conception of a codesign system is complex and brings a lot of problems which often eliminate this kind of solution. The most important problems are : the communication between the software and the hardware parts, the tests

A simple example could be a digital signal processing algorithm which often needs the convolution operator. This operator is very time consuming for a processor. In a





codesign system you can create your convolution operator in hardware and the rest of the algorithm in software. First, the both parts could work in parallel and second the execution of the convolution will be much faster than if it was programmed in software.

Another example: you want to execute a specific task when some analog input signals sampled at 1MHz reach a specified combination (for example: signal S1=1V s2=2.3V, s3<4V, etc...). In this case the microcontroller can only execute this task (get signals and compare with the specified combination) without doing anything else. In a codesign system, a specific hardware unit can be designed to get signals







and compare with a specified combination and interrupt the processor when the combination is reached. In this case the processor is freed from the « watching » task and is able to work on the application.

#### Some systems made with CodESys:



Video processing and overlaying application



#### Neural processor:







#### 3. OVERVIEW

We have created CoDESys in order to simplify the design of hardware/software codesign embedded systems. To reduce size and complexity of electronic system all units (hardware and software) are gathered in a single FPGA. CoDESys provides:

- a programming language (Java)
- a programming environment (Eclipse)
- a processor (software system)
- a shared access to the IEEE-754 FPU
- a shared access to the SDRAM (MASTER bus)
- a processor access to the hardware units (SLAVE bus)
- An API to communicate with hardware unit(s).



The SLAVE BUS is used to configure the hardware unit registers. In this case the hardware unit acts as a slave and only respond to the processor requests.

In order to be totally autonomous and able to work in parallel with the processor, hardware units have a direct access to memory (DMA: Direct Memory Access). They can work on big amount of data without interrupting the processor which have its own memory caches. In this case the hardware unit acts as a master and is able to request some reads or writes to the SDRAM memory. This bus is called the MASTER BUS.

#### 4. CODESYS HARDWARE ARCHITECTURE







#### The CodESys IP includes:

- A SDRAM controller: allow the processor and the DMA master unit to load/store from/into SDRAM memory
- A SPI Flash controller to communicate with the non-volatile Flash memory
- A reset manager (explained further)
- A Schedule Timer used by the Java Virtual Machine to schedule Java Threads
- An UART: used to update software/firmware or download/upload data
- A clock manager (explained further)
- A bootloader: the CodESys can boot on UART or on Flash memory. It boots on UART when the application or firmware must be updated (update mode). It boots on Flash to run the application (normal use mode).
- Memory cache: fast internal memory cache to speed up the execution
- watchdog: hardware device which resets de CodESys when application is stuck somewhere.
- JAP: Java Virtual Machine including software part and hardware acceleration.
   It includes an IEEE 754 floating Point Unit.
- Master DMA Bus: bus used to connect a master hardware unit to the SDRAM (explained further)
- Slave DMA Bus: bus used to connect a master/slave hardware unit
- FPU bus: bus used to access the IEEE 754 floating point unit. In this case the FPU is shared with the processor.





- Sleep\_clock: This clock still works even if the CodESys IP is in sleep mode. It
  must be used by hardware unit which are not sleeping in sleep mode (usually
  the hardware unit which are able to wake up the system with WAKE signal).
- System\_clock: this clock must be used to communicate with SLAVE and MASTER buses.
- Wake signal: this signal is used to wake up the system.
- Int ext: this signal is used to send an interrupt to CodESys.

#### 4.1 CodESys processor

The CodEsys processor has two instruction sets: Java and Risc. This processor has been designed with in collaboration with the CEA-DAM. It is working in military/industrial applications for twenty years. The Java has been chosen for its sustainability, security and "all inclusive" implementation. Java includes all mechanisms of an operating system, such as memory allocation/free, threads, thread synchronisation, object oriented, etc.

#### 4.2 Clock system:

Clock is generated from an external quartz of 40MHz. This external clock must be connected to clock\_in signal of CodESys. This clock is synchronized into PLLs and provided as clock\_out signal. The frequency of clock\_out signal is 40MHz. The clock out signal must be used with all provided buses (master, slave and FPU).

All signals are synchronized on the rising edge of clock out.

A sleep\_clock is also provided by CodESys IP. The CodESys IP can be set (by software) in sleep mode. When the CodESys IP is in sleep mode, the clock\_out signal is no longer generated in order to freeze all hardware units which doesn't need to work when sleeping. But some units can be used as wake system and need a clock signal. These units must use sleep\_clock because the sleep\_clock still works even if CodESys is in sleep mode. The frequency of sleep\_clock is 40MHz.

#### 4.3 Reset system:

CodESys has two external input reset signal: reset fla and reset com.

Reset\_fla resets CodESys and sets it in Flash boot mode (normal use). So reset\_fla signal is the "normal" reset.

Reset\_com resets CodEsys and sets it in COM(UART) boot mode (update). Reset\_com must be used when an application or firmware update is needed.

Reset\_out is generated either when reset\_fla or reset\_com occurs but it takes also into account the synchronisation of the PLLs with the external clock (clock\_in). reset\_out must be used to reset all hardware units.

All reset signals are active low.





#### 4.4 GLED signal:

This signal is used to see if the system is working correctly. In application mode (normal use) this signal toggle approximatively every second.

When updating the system, the led toggles when a data is received.

When this signal stays unchanged, it means the system is stuck somewhere.

#### 4.5 UART Bus:

The CodESys uses the UART bus for two main goals. First in update mode, it is used

to update application and firmware. Second in running mode (application) it can be used to communicate with an UART device (PC, module, etc).

It is a standard TTL UART. The default settings are:

115200 bauds, 1 stop bit, no parity



#### 4.6 Slave Bus



SLAVE bus is used by the processor to read or write data into hardware units.

#### 4.6.1 SLAVE Write Access

When the processor wants to write a data into a specific register, it resets the slave\_RW signal, sets the slave\_cs, sets the







Page: 10/22

slave\_address bus and sets the slave\_DataOut bus. The slave\_cs signal means the processor wants select a hardware unit register. The slave\_rw low means a write cycle. The slave\_address bus specifies the address of the requested register. The slave\_DataOut bus specifies the value of the data the processor wants to write in the register. The slave unit must update the specified register at the rising edge of the clock\_out signal.

#### 4.6.2 SLAVE Read Access



When the processor wants read a data from a specific register, it sets the slave cs, sets the slave rw signal and sets the slave address bus. The slave cs signal means the processor wants to select a hardware unit register. The slave rw low means a write cycle. The slave address

bus specifies the address of the requested register. Before the next rising edge of clock\_out the selected hardware unit must set the slave\_DataIn bus with the value of the selected register. The data is transferred to the processor on the rising edge of clock\_out signal.

#### 4.7 Master Bus

The master bus is used by hardware units to read and write into memory. This bus is also called a DMA (Direct Memory Access) bus. This kind of bus makes the hardware units totally autonomous and they can work on a lot of data without the processor help. Thus processor and hardware units are able to work in parallel.









The processor and the hardware units share the same SDRAM memory. The hardware units have a higher priority to the processor. When the processor and the hardware unit want to access the SDRAM at the same time, the SDRAM controller freezes the processor and give access to the hardware unit. The processor doesn't take a lot of SRAM bandwidth because it has its own memory caches: One for instruction and one for data (Harvard architecture).

#### 4.7.1 Master Write Cycle

When a hardware unit wants to write a data into SDRAM, it sets master\_req signal, master\_rw, master\_address bus, master\_bsd and master\_DataIn. master\_req sends a request to the SDRAM controller. master\_rw low means a write cycle. master\_address specifies the SDRAM address to write. master\_address is a 32 bits (4 bytes) address. master\_bsd specifies the byte(s) into the 4 bytes word. For example if master\_address=A and bsd(0)=1, bsd(1)=0, bsd(2)=0 and bsd(3)=0, only the byte at SDRAM address A\*4 will be written. If master\_address=A and bsd(0)=1, bsd(1)=0, bsd(2)=1 and bsd(3)=0, only the bytes at SDRAM address A\*4 and (A\*4)+2 will be written. If we want to write a 4 bytes word at SDRAM address 8 we have to set master\_address=2 and bsd(0)=1, bsd(1)=1, bsd(2)=1, bsd(3)=1. If we want to write only one byte at SDRAM address 17 we must set master\_address=4 and bsd(0)=0, bsd(1)=1, bsd(2)=0 and bsd(3)=0. The data to write must be in the 32 bits master\_DataIn Bus.



| Master_address | bsd      | SDRAM byte address |  |
|----------------|----------|--------------------|--|
| Α              | bsd(0)=1 | A*4                |  |
|                | bsd(1)=1 | (A*4)+1            |  |
|                | bsd(2)=1 | (A*4)+2            |  |
|                | bsd(3)=1 | (A*4)+3            |  |





Page: 12/22

The data is written into memory when done is set to one. Done signal can be set to one from 1 to n cycles.

Several write cycle can be chained. In this case one data write can be done each clock cycle (burst mode). Warning: the done signal can be lowered during some cycles (for example during refresh memory cycles).

#### 4.7.2 Master Read Cycle

When a hardware unit wants to read a data from SDRAM, it sets master\_req signal, sets master\_rw and master\_address bus. No need to set master\_bsd because the master bus always read a word of 32 bits. master\_req sends a request to the SDRAM controller. master\_rw high means a read cycle. master\_address specifies the SDRAM address to read. master\_address is a 32 bits (4 bytes) address. The done signal is set by CodESys to inform that request has been taken into account but the data will be available at least 2 clock cycles after. The data is ready on master\_DataOut bus two clock cycles after the done signal. As write cycle, read cycle can be executed in burst mode (data is always delayed by two clock cycles).





Page : 13/22







A read cycle can be followed by a write cycle without lowering the master\_req signal. When a write cycles follows a read cycle it is been delayed at least by two clock cycles.

#### 4.8 FPU bus

The processor includes an IEEE 754 Floating Point Unit. This FPU can be shared between the processor and with the hardware units in order to allow floating point processing.

In order to request the FPU for an operation the hardware unit must set fpu\_opc, fpu\_src1, fpu\_src2 and fpu\_req. fpu\_opc specifies the operation to perform.

IMPORTANT: In order to set fpu\_req, you must ensure that the "wait" signal is low. If wait signal is high, it means the processor is using it.

The operation is taken into account when wait signal is set to 1.

The result is ready on fpu dest when wait signal is low.





Page: 14/22





#### 4.9 CodESys Interrupt

Hardware units can send interrupt to the processor by setting *int\_ext* to one during one clock cycle. When CodESys interrupt manager receives the external interrupt which is the most important priority, the processor stops its current tasks and executes the CodESys.ITHandler() method. When the method is finished the processor resumes the current task.





Page: 15/22

#### CODESYS SOFTWARE API

#### 5.1 SLAVE API

#### 5.1.1 SLAVE write method

public static void slaveWrite(int ad, int value)

This method makes a write access on slave bus. It puts *ad* on the slave\_address bus, *value* on data\_out bus, resets slave\_rw and sets slave\_cs.

#### 5.1.2 SLAVE Read method

public static int slaveRead(int ad)

This method makes a read access on slave bus. It puts *ad* on the slave\_address bus, set slave\_rw and slave\_cs and returns the value read on the data\_in bus.

#### 5.2 MASTER DMA API



The MASTER bus only communicates with the SDRAM memory and never with the processor. The master hardware unit must have :

- a register which contains a read and/or a write buffer address.
- a register which contains the number of available bytes in the buffer

This part shows how to design a master unit with a simple example. We will consider we want to create an UART communication unit.





Page: 16/22

When there are some bytes to be sent on TX, the UART will use the DMA bus to read the memory at *readBufferAddress* and will send byte by byte a given number of bytes specified by the *readCounter* register. Each time a byte is sent, the *readBufferAddress* register is incremented and the *readCounter* register is decremented. When the *readCounter* register reaches 0 the UART stops to read. The reading will be restarted when the *readCounter* register will be set with a new value (strictly greater than 0). When the application needs to send new data, it must ensure the readCounter is equal to 0, set a new buffer (for example readbuffer2) in *readBufferAddress* and set the number of bytes to send in *readCounter*.

When a byte is received on RX, the UART uses the DMA bus to write the received byte in the SDRAM at the writeBufferAddress.

The

writeBufferAddress

and the writeCounter will be incremented. When the application needs to read received data, it must read the writeCounter, check if it is greater than 0 and set a new empty buffer to replace the current one by



Java Equivalent Code:

Val=CodESys.readSlave(WRITECOUNTER); if(val>0) return 0; // 0 means "no byte received" CodeESys.writeSlave(WRITEBUFFERADDRESS, newBuf); nbByteReceived=CodeESys.readSlave(WRITECOUNTERFIN AL);

setting the *writeBufferAddress*. When writing in *writeBufferAddress* register, the UART must also copy *writeCounter* in *writeCounterFinal* and reset *writeCounter* to 0. Next, the application must read *writeCounterFinal* to have the correct number of byte available in the buffer.

This mechanism of exchanging buffer is based on mechanism of *front* and *back* buffers. Front buffers are those who are used by the UART which are directly connected on RX and TX and the application works on back buffers.

The application uses the SLAVE API to read/write in the UART registers.

To simplify and accelerate the "exchange" (or swap) operation two methods have been created:

void swapReadBufMaster(int adRegBuf, byte newBuf[], int offsetBuf, int adRegCount, int availableByteInBuffer);





Page: 17/22

#### swapReadBufMaster will execute the following steps:

- Write (address of newBuf+offsetBuf) in the register at address adRegBuf.
- Write availableByteInBuffer in the register at address adRegCount

This method can be used like this:

Val=CodESys.readSlave(READCOUNTER); if(val>0) return -1; // for example -1 means "still busy" CodeESys.swapReadBufMaster(READBUFFERADDRESS, newBuf, 0, READCOUNTER, availableByteInNewBuf); return Val;

int swapWriteBufMaster(int adRegBuf, byte newBuf[], int offsetBuf, int adRegCountFinal)

swapWriteBufMaster will execute the following steps:

- Write (address of newBuf + offsetBuf) int the register at address adRegBuf
- Read register adRegCountFinal
- Return value of register adRegCountFinal

This method can be used like this:

Val=CodESys.readSlave(WRITECOUNTER); if(val==0) return 0; // for example -1 means "no available byte" Val=CodeESys.swapWriteBufMaster(WRITEBUFFERADDRESS, newBuf, 0, WRITECOUNTERFINAL); return Val:



Page: 18/22

#### 6. CODESYS PLATFORM: ARKEON

CodESys is integrated in several SECAPEM products such as target acquisition systems or Secapem application specific display systems.





All systems are based on the same hardware platform, only the hardware unit and the application software are different. The hardware platform has been called ARKEON. It is a daughter board and it includes :

- 1 FPGA: Lattice LFXP2-30E (256 pins BGA package)
- 2 SDRAM of 32MB (total : 64 MB)
- 2 SPI Flash memory of 32 MB (total : 64 MB)
- 1 40MHz oscillator
- 2 80 pins connectors







| Lattice XP2 -30 |                      |                     |                            |  |  |  |
|-----------------|----------------------|---------------------|----------------------------|--|--|--|
| LUTs            | EBR SRAM<br>Blocks   | EBR SRAM<br>(Kbits) | Distributed<br>RAM (Kbits) |  |  |  |
| 29732           | 21                   | 387                 | 56                         |  |  |  |
| registers       | 18x18<br>multipliers | PLL+DLL             | Configuration memory       |  |  |  |
| 22527           | 56                   | 4+2                 | Internal Flash             |  |  |  |



Each connector pin is connected to the FPGA pins. The FPGA is configured following the mother board specification.



Page : 20/22





Mother board (Here a programming board)

Arkeon plugged on Mother board



The CodESys IP (including processor, FPU, SDRAM controller) uses 63% of the available LUTs in Lattice XP2 -30. The figure below shows the occupation of the CodESys ip in the Lattice XP2 -30 FPGA.



If more space is needed, SECAPEM can design boards with some bigger FPGAs (Lattice, Xilinx, Altera, Microsemi (Microchip)).





Page: 21/22

Diamond is the tool to develop hardware IP for Lattice FPGAs (either in VHDL or Verilog).

#### Downloadable at:

http://www.latticesemi.com/en/Products/DesignSoftwareAndIP/FPGAandLDS/Lattice Diamond









#### 7. CODESYS: SOFTWARE DEVELOPMENT TOOL

To develop application CodESys uses the Java language. A plugin for Eclipse has been developed for CodESys.

