Through abstraction, it is possible to build truly complex systems — from gates you can assemble modules, from modules — a microarchitecture, and so on. In this context, the architecture serves as the foundation upon which the software abstraction stack is built. Assemblers are built on top of architectures, high-level languages are built on top of assemblers, and frameworks and meta-frameworks are built on top of those, providing higher-level and more convenient tools for developing new programs. Let us dive a little deeper into this stack.

Objective

In accordance with the individual assignment, write a program in the high-level programming language C, compile it to machine code, and run it on the previously developed RISC-V processor.

Workflow

Study the theory:
Prepare the cross-compilation toolchain
Study the compilation procedure and the relevant commands:
Write and compile your own program
Verify that your program executes correctly on your processor in the FPGA

Theory

In this lab you will write a complete program that will run on your processor. During compilation, you will need the files linker_script.ld and startup.S located in this directory.

— But why do I need these files? We have already done programming assignments in previous labs and did not need any additional files.

The thing is, previously you were writing small programs in assembly. However, the RISC-V assembly language, like that of any other RISC architecture, is not programmer-friendly, because it was originally designed with the expectation that compilers would be created and programs would be written in more human-friendly high-level languages. Before, you wrote simple programs that could be implemented in assembly; now you will be asked to write a complete program in C.

— But doesn't compiling C source code produce a program written in assembly? It would be the same code we could write ourselves.

The thing is, the assembly code you wrote before is different from the assembly code generated by a compiler. The code you wrote had, shall we say... finer micro-control over the program flow. When you wrote a program, you knew the size of your memory, where instructions were stored, and where data was stored (well, in your programs you barely used the data memory, and when you did — you just used arbitrary addresses and it worked). You used all registers in the register file as you pleased, without any restrictions. However, imagine for a moment that you are writing an assembly project together with a colleague: you write some functions and they write others. How would you use the registers in that case? If you both use the same registers, calling one function could corrupt data in another. Split it in half and each use your own half? But what if another colleague joins the project — would you have to split the register file into three parts? There would be nothing left. To resolve situations like this, a calling convention was developed.

Thus, when generating assembly code, the compiler cannot use all resources without restrictions the way you did — it must follow the constraints imposed by the calling convention, as well as constraints arising from the fact that it knows nothing about the memory layout of the target device, so it cannot work with memory arbitrarily. When working with memory, the compiler follows certain rules that allow the linker to assemble the program for your device using a special script.

Calling Convention

The calling convention defines the procedure for calling functions: where arguments are placed when calling functions, where the stack pointer and return address are located, etc.

In addition, the convention divides the register file into two groups: callee-saved and caller-saved registers.

When working with callee-saved registers, a function must guarantee that upon return, these registers hold the same values they had at the time the function was called. That is, if a function intends to write something to a callee-saved register, it must first save its value onto the stack, and then restore that value from the stack before returning.

A simple analogy: two people share a single desk in a small apartment, taking turns. Each uses the full desk, but upon finishing, they must leave their neighbour's half of the desk (callee-saved registers) in the same state they found it, while doing whatever they want with their own half (caller-saved registers). By the way, the neighbour's belongings are stored in a nearby pile (stack) so they are not lost (in main memory).

A function may use caller-saved registers however it wishes — the called function makes no guarantees about preserving them. At the same time, if a function calls another function, it similarly has no guarantee that the called function will leave the caller-saved registers unchanged, so if values needed after the called function returns are stored there, those values must be saved to the stack.

Table 1 shows the division of registers into callee-saved (the right column contains Callee, meaning the called function is responsible for preserving them) and caller-saved (Caller — the calling function is responsible for preserving them). In addition, there are three registers for which the right column is not applicable: the zero register (since it cannot be changed) and the thread and global memory pointers. Per the calling convention, these registers must not be used for computations inside functions; they are only changed in pre-defined situations.

The ABI name column contains the register's alias, associated with its functional purpose (see register description). Assemblers often accept both forms of register names interchangeably.

Register	ABI Name	Description	Saver
x0	zero	Hard-wired zero	—
x1	ra	Return address	Caller
x2	sp	Stack pointer	Callee
x3	gp	Global pointer	—
x4	tp	Thread pointer	—
x5	t0	Temporary/alternate link register	Caller
x6–7	t1–2	Temporaries	Caller
x8	s0/fp	Saved register/frame pointer	Callee
x9	s1	Saved register	Callee
x10–11	a0–1	Function arguments/return values	Caller
x12–17	a2–7	Function arguments	Caller
x18–27	s2–11	Saved registers	Callee
x28–31	t3–6	Temporaries	Caller

Table 1. Assembly mnemonics for RISC-V integer registers and their roles in the calling convention [1, p. 6].

Although the stack pointer is marked as a Callee-saved register, this does not mean that the called function can write anything it wants to it after saving its value onto the stack. How would you restore the stack pointer value from the stack if the stack pointer register itself contains something wrong?

The Callee designation means that by the time the called function returns, the values of Callee-saved registers must be exactly the same as they were at the moment the function was called. For registers s0–s11 this is accomplished by saving their values to the stack. Before each save, the stack pointer is decremented so that it points to the saved value. Then, before returning from the function, all values saved to the stack are restored, incrementing the stack pointer accordingly. Thus, even though the stack pointer value changed during the execution of the called function, it will ultimately be the same upon exit.

Linker Script (linker_script.ld)

The linker script describes how data will be stored in your memory. You may have heard that an executable file contains sections .text and .data — for instructions and data respectively. The linker knows nothing about your memory structure: whether your architecture is Princeton or Harvard, at which addresses instructions should be stored and at which addresses data should be stored, or what byte order (endianness) is used. You may have multiple memory types for special sections — and all of this can be communicated to the linker via the linker script.

In its simplest form, a linker script consists of one section: the sections block, in which you describe which parts of the program go where and in what order.

To facilitate this description, there is a helper variable: the location counter. This counter indicates where in memory the next section will be placed (unless an explicit address is specified when placing a section). At the start of the script execution, this counter is zero. When placing a section, the counter is incremented by the size of that section. Suppose we have two files startup.o and main.o, each containing .text and .data sections. We want to place them in memory as follows: first place the .text sections from both files, then the .data sections.

As a result, starting from address zero, the .text section of startup.o will be placed. It is placed there because the location counter is zero at the start of the script, and the next section is placed at the address currently pointed to by the location counter. After that, the counter is incremented by the size of this section and the .text section of main.o is placed immediately after. After that, the location counter is incremented again. The same will happen for the remaining sections.

Additionally, you can change the value of the location counter at any time. For example, if the memory address space is divided into two parts: 512 bytes for instructions and 1024 bytes for data. Thus, the allocated address range for instructions is [0:511], and for data: [512:1535]. Suppose the total size of all .text sections is 416 bytes. In this case, you can first place the .text sections as described above, then set the location counter to 512 and describe the placement of data sections. This will create a 96-byte gap between the sections, and the data will end up in the allocated address range.

Our processor system uses a Harvard architecture. This means that the instruction memory and data memory are independent from each other. They are physically separate devices with different buses and different address spaces. However, both memories share the same lower address values: the lowest address is zero, the next is one, and so on. This causes an overlap of the address spaces of instruction memory and data memory. This makes things difficult for the linker: "how can I place a data section at this address when an instruction section is already there?"

There are two mechanisms to resolve this. The first is to link instruction and data sections separately. In this case, there would be two separate linker scripts. However, linking instruction sections depends on linking data sections (specifically, on the addresses at which the stack and .bss section will be placed, as well as the global data pointer), because concrete addresses need to be embedded into some instructions. This would require intermediate steps such as exporting global symbols to a separate object file, which feels overly complicated.

Instead, another approach will be used: the mechanism of Virtual Memory Addresses (VMA) and Load Memory Addresses (LMA).

VMA — the address at which the section will be accessible during program execution. The processor will use this address to access the section.
LMA — the address at which the section will be loaded into memory at program startup.

Usually LMA coincides with VMA. However, in some cases they may differ (for example, the data section is initially written to ROM and then copied from ROM to RAM before program execution). In this case, LMA is the section's address in ROM and VMA is the section's address in RAM.

Thus, we can use shared VMAs (the processor will use overlapping address spaces when accessing instruction and data sections), and resolve the linker's section placement conflict by assigning an arbitrarily large LMA to the data section. We will then simply ignore this address by initializing data memory starting from zero.

In addition, the linker script must specify the byte order, the location of the stack, and the value of the global memory pointer.

All of this is described with detailed comments in the linker_script.ld file.

OUTPUT_FORMAT("elf32-littleriscv")      /* Specify the byte order */

ENTRY(_start)                           /* We tell the linker that the first
                                           instruction executed by the processor
                                           is at the label "_start"
                                        */

/*
  This section specifies the memory structure:
  First comes the "instr_mem" region, which is executable memory
  (indicated by the 'x' attribute). This region starts
  at address 0x00000000 and occupies 1024 bytes.
  Next comes the "data_mem" region, starting at address 0x00000000 and
  occupying 2048 bytes. This region is non-executable memory
  (in the sense that it does not contain executable code).
*/
MEMORY
{
  instr_mem (x) : ORIGIN = 0x00000000, LENGTH = 1K
  data_mem (!x) : ORIGIN = 0x00000000, LENGTH = 2K
}

_trap_stack_size = 640;                /* Size of the trap handler stack.
                                          This size allows up to 8 nested
                                          calls during trap handling.
                                        */

_stack_size = 640;                     /* Size of the program stack.
                                          This size allows up to 8 nested
                                          calls.
                                       */

/*
  This section describes the placement of the program in memory.
  The program is divided into various sections:
  - sections of the executable code;
  - sections of static variables and arrays whose values must be
    embedded in the program;
  etc.
*/

SECTIONS
{

  /*
    In linker scripts there is an internal variable written as '.'
    This variable is called the "location counter". It stores the current
    address in memory.
    At the beginning of the file it is initialized to zero. As new sections
    are added, this variable is incremented by the size of each new section.
    If no address is specified when placing sections, they will be placed
    at the current value of the location counter.
    This variable can be assigned values; after that, it will increment
    from that value.
    More details:
      https://home.cs.colorado.edu/~main/cs1300/doc/gnu/ld_3.html#IDX338
  */

  /*
    The following command specifies that starting from the address currently
    held by the location counter (at this point, starting from zero), the
    .text section of the output file will be located, consisting of the .boot
    sections as well as all sections starting with .text from all binary files
    passed to the linker.
    We additionally specify that this section must be placed in the
    "instr_mem" region.
  */
  .text : {
    PROVIDE(_start = .);
    *(.boot)
    *(.text*)
  } > instr_mem


  /*
  The data section is placed similarly to the instruction section, except
  for the Load Memory Address (LMA). Since instruction and data memories are
  physically separate, they have overlapping address spaces that we would
  like to use (which is why in the MEMORY section we set the start addresses
  of both memories to zero). However, the linker does not like this, since
  how can it place two different sections in the same location? So we tell it,
  using the "AT" operator, that the data section should actually be loaded
  at a different address — one that is guaranteed to be larger than the size
  of the instruction memory — while the processor will use addresses starting
  from zero. The linker accepts this arrangement and builds the executable
  without errors. Our task is then to load the final data section at address
  zero in data memory.
  */
  .data : AT (0x00800000) {
    /*
    It is conventional to assign GP a value equal to the start of the data
    section offset forward by 2048 bytes.
    With 12-bit relative addressing with offset, it is possible to address
    the beginning of the data section as well as the entire address space up
    to 4096 bytes from the start of the data section, which reduces the number
    of addressing instructions needed (LUI operations are rarely used since GP
    already holds the base address and only an offset is needed).
    More details:
      https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/60IdaZj27dY/m/s1eJMlrUAQAJ
    */
    _gbl_ptr = . + 2048;
    *(.*data*)
    *(.sdata*)
  } > data_mem


  /*
    Since we do not know the total size of all data sections used,
    before placing other sections, we must align the location counter
    to a 4-byte boundary.
  */
  . = ALIGN(4);


  /*
    BSS (block started by symbol, unofficially expanded as
    "better save space") is a segment where uninitialized static
    variables are placed. The C standard states that such variables
    are initialized to zero (or NULL for pointers). When you create
    a static array, it must be placed in the executable file.
    Without a bss section, the array would occupy as much space in the
    executable as its own size. A 1000-byte array would take 1000 bytes
    in the .data section.
    Thanks to the bss section, the initial values of the array are not stored;
    instead, only the variable names and their addresses are recorded.
    However, during executable loading, the memory region occupied by the
    bss section must be explicitly zeroed out, since static variables must
    be initialized to zero.
    Thus, the bss section significantly reduces the size of the executable
    (when using uninitialized static arrays) at the cost of increased
    loading time.
    To zero out the bss section, two variables are defined in the script
    that point to the beginning and end of the bss section via the
    location counter.
    More details:
      https://en.wikipedia.org/wiki/.bss

    We additionally specify that this section must be placed in the
    "data_mem" region.
  */
  _bss_start = .;
  .bss : {
    *(.bss*)
    *(.sbss*)
  } > data_mem
  _bss_end = .;


  /*=================================
      The allocated data section is complete; the remaining free memory is
      reserved for the program stack, the trap stack, and (possibly) the heap.
      The RISC-V calling convention states that the stack grows downward
      (from higher to lower addresses), so our goal is to place it at the
      highest addresses in memory.
      Since we have two stacks, the trap stack is placed at the very bottom
      and the program stack above it. We must also ensure that the program
      stack is protected from being overwritten by the trap stack.
      Before doing so, however, we must verify that there is enough room
      for both stacks.
    =================================
  */

  /* We want to guarantee that there is room left for the stack */
  ASSERT(. < (LENGTH(data_mem) - _trap_stack_size - _stack_size),
            "Program size is too big")

  /*  Move the location counter above the trap stack (so that we can
      use it in the ALIGN call later) */
  . = LENGTH(data_mem) - _trap_stack_size;

  /*
      Place the program stack pointer as close to the trap stack boundary
      as possible, subject to the requirement that the stack address be
      aligned to 16 bytes.
      More details:
        https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf
  */
  _stack_ptr = ALIGN(16) <= LENGTH(data_mem) - _trap_stack_size?
                ALIGN(16) : ALIGN(16) - 16;
  ASSERT(_stack_ptr <= LENGTH(data_mem) - _trap_stack_size,
            "SP exceed memory size")

  /*  Move the location counter to the end of memory (so that we can
      use it in the ALIGN call later) */
  . = LENGTH(data_mem);

  /*
      Memory size is usually a multiple of 16, but in case it is not, we
      perform a check: we either stay at the very end of memory (if the
      end is a multiple of 16), or move 16 bytes up from the memory edge
      rounded up to the nearest multiple of 16.
  */
  _trap_stack_ptr = ALIGN(16) <= LENGTH(data_mem) ? ALIGN(16) : ALIGN(16) - 16;
  ASSERT(_trap_stack_ptr <= LENGTH(data_mem), "ISP exceed memory size")
}

Listing 1. Example of a linker script with comments.

Important

Note the specified sizes of instruction and data memory. They differ from the sizes previously used in the memory_pkg package. The reason is that while the system and the programs it runs were simple, there was no need for large memory, and smaller sizes significantly reduced synthesis time. However, at this point, to provide the program with enough room for instructions, the program stack, and the trap stack, it is necessary to increase the sizes of both instruction memory and data memory. To do this, update the values of the INSTR_MEM_SIZE_BYTES and DATA_MEM_SIZE_BYTES parameters to 32'd1024 and 32'd2048 respectively. Depending on the complexity of your project, you may need to change the memory size again in the future. Remember: all changes to memory_pkg must also be reflected in the linker script for your system.

Startup File (startup.S)

The startup file contains instructions that must be executed before any program begins running. These include initializing the stack pointer and global data pointer registers, the interrupt system control registers, zeroing out the .bss section, etc.

Upon completing initialization, the startup file transfers control to the entry point of the program being launched.

  .section    .boot

 .global _start
_start:
  la    gp, _gbl_ptr     # Initialize the global pointer
  la    sp, _stack_ptr   # Initialize the stack pointer

# Initialize (zero out) the bss segment
  la    t0, _bss_start
  la    t1, _bss_end
_bss_init_loop:
  blt   t1, t0, _irq_config
  sw    zero, 0(t0)
  addi  t0, t0, 4
  j     _bss_init_loop

# Configure the interrupt vector (mtvec), interrupt mask (mie),
# and trap stack pointer (mscratch).
_irq_config:
  la    t0, _int_handler
  li    t1, -1 # -1 (all bits set to 1) means all interrupts are enabled
  la    t2, _trap_stack_ptr
  csrw  mtvec, t0
  csrw  mscratch, t2
  csrw  mie, t1

# Call the main function
_main_call:
  li    a0, 0 # Pass argc and argv arguments to main. Formally, argc should
  li    a1, 0 # be greater than zero, and argv should point to an array of
              # strings whose zeroth element is the executable name.
              # For simplicity of implementation, both arguments are simply
              # set to zero. This is done for deterministic program behavior
              # in case the programmer tries to use these arguments.

  # Call main.
  # For the program to link successfully, a function with exactly
  # this name must be defined somewhere.
  call  main
# Infinite loop after main returns
_endless_loop:
  j     _endless_loop

# The low-level interrupt handler is responsible for:
#   * Saving and restoring context;
#   * Calling the high-level handler with the interrupt source id
#     as an argument.
# The code is based on the handler from the urv-core repository:
# https://github.com/twlostow/urv-core/blob/master/sw/common/irq.S
# Saves of unimplemented CS registers have been removed. Additionally,
# only caller-saved registers are saved here, because the high-level
# interrupt handler is required to preserve callee-saved registers
# in accordance with the calling convention.
_int_handler:
  # This operation swaps the sp and mscratch registers.
  # As a result, the trap stack pointer ends up in sp, and the top
  # of the program stack ends up in mscratch.
  csrrw sp, mscratch,sp

  # Move up the trap stack and save all registers.
  addi  sp, sp, -80  # The stack pointer must be aligned to 16 bytes,
                     # so we move up by 80, not 76.
  sw    ra, 4(sp)
  # We want to ensure that a subsequent interrupt does not cause the trap
  # stack to overwrite the program stack, so we load the bottom of the
  # program stack into the freed register and verify that the raised trap
  # stack pointer has not encroached on the program stack.
  # If this has happened (trap stack overflow), we want to halt the
  # processor to avoid losing data that could help us debug the situation.
  la    ra, _stack_ptr
  blt   sp, ra, _endless_loop

  sw    t0, 12(sp) # We skipped offset 8 because that is where the sp
                   # register saved into mscratch earlier should go.
                   # We will write it to the stack a little later.
  sw    t1, 16(sp)
  sw    t2, 20(sp)
  sw    a0, 24(sp)
  sw    a1, 28(sp)
  sw    a2, 32(sp)
  sw    a3, 36(sp)
  sw    a4, 40(sp)
  sw    a5, 44(sp)
  sw    a6, 48(sp)
  sw    a7, 52(sp)
  sw    t3, 56(sp)
  sw    t4, 60(sp)
  sw    t5, 64(sp)
  sw    t6, 68(sp)

  # We also save the interrupt register state in case another
  # interrupt occurs.
  csrr  t0, mscratch
  csrr  t1, mepc
  csrr  a0, mcause
  sw    t0, 8(sp)
  sw    t1, 72(sp)
  sw    a0, 76(sp)

  # Call the high-level interrupt handler.
  # For the program to link successfully, a function with exactly
  # this name must be defined somewhere.
  call  int_handler

  # Restore context. First, we want to restore the CS registers in case
  # a nested interrupt occurred. To do this, we must restore the original
  # value of the trap stack pointer. However, its current value is still
  # needed for context restoration, so we save it to register a0 and
  # restore from there.
  mv    a0,sp

  lw    t1, 72(a0)
  lw    t2, 76(a0)
  addi  sp, sp, 80
  csrw  mscratch, sp
  csrw  mepc, t1
  csrw  mcause, t2
  lw    ra, 4(a0)
  lw    sp, 8(a0)
  lw    t0, 12(a0)
  lw    t1, 16(a0)
  lw    t2, 20(a0)
  lw    a1, 28(a0)   # We skipped a0 because it is currently used as a
                     # pointer to the top of the stack and cannot be
                     # restored.
  lw    a2, 32(a0)
  lw    a3, 36(a0)
  lw    a4, 40(a0)
  lw    a5, 44(a0)
  lw    a6, 48(a0)
  lw    a7, 52(a0)
  lw    t3, 56(a0)
  lw    t4, 60(a0)
  lw    t5, 64(a0)
  lw    t6, 68(a0)
  lw    a0, 24(a0)

  # Return from the interrupt handler
  mret

Listing 2. Example contents of the startup file with explanatory comments.

Important

Note the lines call main and call int_handler. Linking the object file produced by compiling startup.S will only succeed if functions with exactly these names are defined in other files being linked.

Practice

To simulate the execution of a program on your processor, you must first compile the program and convert it into a text file that the EDA tool can use to initialize the processor's memory. To compile the program, you will need a special compiler called a cross-compiler. It allows you to compile source code for a computer architecture different from the one on which the compilation is performed. In our case, you will be building code for the RISC-V architecture on a computer with the x86_64 architecture.

The compiler suitable for this task should be installed in the lab. If not, you can download it here (note that the archive size is approximately 550 MB; attempting to download it from the lab may use up your monthly internet traffic quota).

Compiling Object Files

First, compile the source files into object files. This can be done with the following command:

<compiler executable> -c <compilation flags> <input source file> -o <output object file>

You will need the following compilation flags:

-march=rv32i_zicsr — specifies the bit width and extension set of the target architecture (our processor is rv32i, extended with the Zicsr instruction set for interacting with control and status registers)
-mabi=ilp32 — specifies the application binary interface. This states that the int, long, and pointer types are 32-bit.

There is a very helpful video explaining the composition of toolchains, the naming of compiler executables, and how architecture and ABI flags are formed.

Given the name of the compiler executable you downloaded (assuming you renamed the extracted directory to riscv_cc and copied it to the root of drive C:, and are running the command from a git bash shell), the command to compile startup.S might look like:

/c/riscv_cc/bin/riscv-none-elf-gcc -c -march=rv32i_zicsr -mabi=ilp32 startup.S -o startup.o

Linking Object Files into an Executable

Next, link the object files. This can be done with the following command format:

<compiler executable> <linking flags> <input object files> -o <output object file>

The compiler executable is the same; the linking flags are as follows:

-march=rv32i_zicsr -mabi=ilp32 — the same flags as during compilation (we still need to specify the architecture, otherwise the linker might link the object files with standard libraries from a different architecture)
-Wl,--gc-sections — instruct the linker to remove unused sections (reduces the size of the output file). Note that there must be no space after the comma — this is important!
-nostartfiles — instruct the linker not to use the startup files from standard libraries (reduces file size and eliminates compilation errors caused by conflicts with the startup file being used).
-T linker_script.ld — pass the linker script to the linker

Example linking command:

/c/riscv_cc/bin/riscv-none-elf-gcc -march=rv32i_zicsr -mabi=ilp32 -Wl,--gc-sections -nostartfiles -T linker_script.ld startup.o main.o -o result.elf

Exporting Sections for Memory Initialization

As a result of linking, you will obtain an executable file in elf format (Executable and Linkable Format). This is a binary file, but it is not simply a stream of binary instructions and data to be loaded into the processor's memory. This file contains headers and special information that helps a loader place the file in computer memory. Since the role of the loader will be played by you and the EDA tool used for simulation, this information is not needed, so you will need to export only the binary instructions and data from this file, discarding all other information. The resulting file can then be used with the $readmemh function.

To export, use the command:

/c/riscv_cc/bin/riscv-none-elf-objcopy -O verilog result.elf init.mem

The -O verilog flag specifies that the file should be saved in a format that the $readmemh command can process.

Since instruction and data memory are separate, you can export individual sections into separate files:

/c/riscv_cc/bin/riscv-none-elf-objcopy -O verilog -j .text result.elf init_instr.mem
/c/riscv_cc/bin/riscv-none-elf-objcopy -O verilog -j .data -j .bss -j .sdata result.elf init_data.mem

Note the contents of the resulting file:

@00000000
97 11 00 00 93 81 01 AD 13 01 00 76 93 02 00 2D
13 03 00 2D 63 88 62 00 23 A0 02 00 93 82 42 00
6F F0 5F FF 93 02 40 04 13 03 F0 FF 73 90 52 30
73 10 43 30 13 05 00 00 93 05 00 00 EF 00 C0 1F
6F 00 00 00 73 11 01 34 13 01 01 FB 23 22 11 00
...

The first line indicates that memory should be initialized starting from address zero and is not of interest to us right now. What matters is that the file was exported byte by byte. This format does not suit our memory, since each of our memory cells consists of 4 bytes.

For the output file to be compatible with a memory with 32-bit cells, the export command must be supplemented with the --verilog-data-width=4 option, which specifies the cell size in bytes of the memory being initialized. The file will then look like:

@00000000
00001197 AD018193 76000113 2D000293
2D000313 00628863 0002A023 00428293
FF5FF06F 04400293 FFF00313 30529073
30431073 00000513 00000593 1FC000EF
0000006F 34011173 FB010113 00112223
...

Note that the bytes were not simply concatenated into groups of four — the byte order also changed. This is important, because the data must be laid out in memory exactly in this (updated) byte order (see the first line of the linker script). At one point objcopy had a bug where the byte order was not changed. In some versions of the toolchain (other than the one used in this lab) you may still encounter this behavior.

Let us return to the first line: @00000000. As mentioned, a number starting with @ tells the EDA tool to initialize memory starting from the cell whose number matches this value. When you export the data sections, the first line will be @20000000. This happens because the linker script instructs the data memory to be initialized from address 0x80000000 (divided by 4 to obtain the 32-bit cell number; at one point objcopy had another bug where this division by 4 was not performed). This was done to prevent the address spaces of instruction memory and data memory from overlapping (see the linker script section). For the system to work correctly, this line must be deleted.

Disassembly

During lab debugging you will frequently need to look at the program counter and the current instruction. It is quite difficult to manually decode an instruction to understand what is currently being executed. To simplify this, you can disassemble the compiled file. The resulting assembly file will store instruction addresses as well as their binary and assembly representations.

Example of a disassembled file:

Disassembly of section .text:

00000000 <_start>:
   0: 00001197           auipc gp,0x1
   4: adc18193           addi gp,gp,-1316 # adc <_gbl_ptr>
   8: 76000113           li sp,1888
   c: 2dc00293           li t0,732
  10: 2dc00313           li t1,732

00000014 <_bss_init_loop>:
  14: 00628863           beq t0,t1,24 <_irq_config>
  18: 0002a023           sw zero,0(t0)
  1c: 00428293           addi t0,t0,4
...

00000164 <bubble_sort>:
 164: fd010113           addi sp,sp,-48
 168: 02112623           sw ra,44(sp)
 16c: 02812423           sw s0,40(sp)
 170: 03010413           addi s0,sp,48
 174: fca42e23           sw a0,-36(s0)
 178: fcb42c23           sw a1,-40(s0)
 17c: fe042623           sw zero,-20(s0)
 180: 09c0006f           j 21c <bubble_sort+0xb8>
...

00000244 <main>:
 244: ff010113           addi sp,sp,-16
 248: 00112623           sw ra,12(sp)
 24c: 00812423           sw s0,8(sp)
 250: 01010413           addi s0,sp,16
 254: 00a00593           li a1,10
 258: 2b400513           li a0,692
 25c: f09ff0ef           jal ra,164 <bubble_sort>
 260: 2b400793           li a5,692
...

Disassembly of section .data:

000002b4 <array_to_sort>:
 2b4: 00000003           lb zero,0(zero) # 0 <_start>
 2b8: 0005                 c.nop 1
 2ba: 0000                 unimp
 2bc: 0010                 0x10
 2be: 0000                 unimp
...

Listing 3. Example of a disassembled file.

The numbers in the leftmost column, incrementing by 4, are memory addresses. When debugging a program using the waveform, you can use these numbers as reference values for PC.

The string in hexadecimal following the address is the instruction (or data) placed at that address. Using this column, you can verify that the instruction read from the waveform (the instr signal) is correct.

The right column contains the assembly (human-readable) representation of the instruction from the previous column. For example, instruction 00001197 is the operation auipc gp,0x1, where gp is the alias (ABI name) of register x3 (see the Calling Convention section).

Pay attention to the last part of the listing: the disassembly of the .data section. In this section, addresses may increment by any amount, hexadecimal data can be of any size, and the assembly instructions in the right column should be completely ignored.

The reason is that the disassembler attempts to decode all binary data it sees, making no distinction between instructions and data. If it can decode bytes from the data section in any way (which can be completely arbitrary), it will. Moreover, the resulting "instructions" may belong to extensions not supported by the current file: compressed (two bytes instead of four), floating-point, atomic, etc.

This does not mean the data section in the disassembly is useless — from the listing above you can determine that the first elements of the array_to_sort array are the numbers 3, 5, 10, and also the addresses at which they reside (0x2b4, 0x2b8, 0x2bc; if it is unclear why the first number occupies a single 4-byte row while the other two are split into two 2-byte rows, try rereading the previous paragraph). Simply be mindful of which section you are currently reading when examining a disassembled file.

To perform disassembly, execute the following command:

[disassembler executable] -D (or -d) [input executable file] > [output assembly file]

For our example, the command is:

/c/riscv_cc/bin/riscv-none-elf-objdump -D result.elf > disasmed_result.S

The -D flag disassembles all sections. The -d flag disassembles only executable sections (sections containing instructions). Thus, by using the -d flag, you avoid the confusing pseudo-instructions produced by decoding .data section bytes, but you will no longer be able to check the addresses and values stored in those sections.

Assignment

Write a program for your individual assignment from Lab 4 in C or C++ (depending on the chosen language, use the corresponding compiler: gcc for C, g++ for C++).

For your program to link successfully, you must define two functions: main and int_handler. Arguments and return values may be anything, but they will not be used. The main function will be called at the start of program execution (after the .boot section of the startup file has run); the int_handler function will be called automatically every time your input device controller generates an interrupt request (once the processor has finished handling the previous request).

Thus, the minimum required algorithm is to read the input data from the input device upon an interrupt (referred to as sw_i in the individual assignment), perform the processing defined in your variant, and write the result to the output device. Keep the following in mind:

When entering data from a keyboard, a key scancode is sent, not the digit value of the pressed key (nor its ASCII code). Moreover, when a key is released, scancode F0 is generated, followed by the scancode of that key being sent again.
When working with UART via Putty, you send the ASCII code of the entered character.

For these two input devices, you need to design a protocol for entering numbers into your program. In the simplest case, data can be processed "as-is". That is, for the keyboard, pressing key 1 in the top row with scancode 0x16 can be interpreted as the number 0x16. For UART, sending character 1 with ASCII code 0x31 can be interpreted as 0x31. However, Putty displays output as the characters corresponding to the received ASCII codes, so there is a high risk of producing non-printable characters.

The main function may be empty, contain only a return statement, or contain an infinite loop — program flow will not break in any case, since the startup file already includes an infinite loop after main returns. Nevertheless, you may place some logic here that receives data from the interrupt handler via global variables.

Access to the peripheral controller registers is done through memory access. In the simplest case, this is accomplished via pointer dereferencing, with pointers initialized to the register addresses from the memory map of Lab 13.

When writing the program, remember that in C++ pointer arithmetic is heavily restricted, so when assigning an integer address value to a pointer, you must use the reinterpret_cast operator.

To reduce your interaction with the dark magic of pointers, you are provided with the file platform.h, which declares pointers to structures that map fields to the physical addresses of peripheral devices. You only need to use the pointer corresponding to your peripheral device.

If your output device is a VGA controller, you must use a structure instance rather than a pointer to it. Inside this structure, pointers to byte arrays are declared: char_map, color_map, tiff_map. As you know, a pointer can be used as an array name, so you can access the desired byte in the corresponding memory region of the VGA controller as an array element. For example, to write a character to the sixth position of the second row, you would access char_map[2*80+6] (2*80 is the index of the start of the second row).

Example of interacting with a FICTIONAL peripheral device via a structure. This program is only an example illustrating interaction with peripherals through the provided structure pointers. You need to understand how the fictional device works, and then write your own program that implements the logic of your individual assignment and interacts with your real device.

/*
 Do not copy this code and use it as the basis for your program.
 It is not suitable for that purpose. Your processor system has no
 colliders, DEADLY_SERIOUS events, or emergency switches.
 Just understand the `->`, ".", operators, the use of pointers as
 array names, and write your own program.
*/
#include "platform.h"

/*
  In the header file "platform.h", collider_ptr — a pointer to the
  SUPER_COLLIDER_HANDLE structure — and collider_obj — an instance of the
  same structure — are declared.
  Fields of this structure can be accessed through the pointer using the
  "->" operator. Fields can be accessed through the instance using the
  "." operator.
  Among other fields, the structure contains a pointer collider_mem, which
  points to some memory of this peripheral device. This pointer can be
  used as an array name.
*/

int main(int argc, char** argv)
{
  while(1){                             // In an infinite loop
    while (!(collider_ptr->ready));     // Continuously poll the ready register
                                        // until it becomes 1.

                                        // Then start the collider by
    collider_ptr->start = 1;            // writing 1 to the start control register
    collider_obj.mem[0] = 300;          // Example of interacting with memory
                                        // using the pointer declared in the
                                        // structure as an array name.
  }
}

#define DEADLY_SERIOUS_EVENT 0xDEADDAD1

// extern "C" is only needed in C++. It ensures that in the object file
// the function is named exactly int_handler, as the linker expects when
// combining code with startup.S.
// Without extern "C", when compiling C++ code, the function name in the
// object file will be slightly different (something like _Z11int_handlerv),
// causing linking errors.
extern "C" void int_handler()
{
  // If an interrupt arrives from the collider, immediately check the status
  // register, and if its code equals DEADLY_SERIOUS_EVENT, emergency-stop
  // the collider
  if(DEADLY_SERIOUS_EVENT == collider_ptr->status)
  {
    collider_ptr->emergency_switch = 1;
  }
}

Listing 4. Example of C++ code interacting with a fictional peripheral device via structure and array pointers declared in platform.h.

When writing a program in a high-level language, you may be tempted to use all the benefits of high-level programming: scanf/cin for console input, printf/cout for console output, or dynamic arrays and STL containers. You should avoid such impulses, because "out of the box" none of this functionality will be available for the following reasons:

Console I/O. For printf to output a message to your output device, you need to "tell" it how to do so. This requires overriding the write function used by printf.
Dynamic memory. The use of dynamic arrays and containers that implicitly rely on dynamic memory is limited by the complex mechanics of memory management. In general-purpose systems, the operating system handles this, but in our embedded system there is no OS. To use dynamic memory, you would need to write an allocator — a special function or class that handles memory allocation and deallocation.
Standard library size. Even if you implement all the required functionality, any of the standard C/C++ library functions listed above will pull in a large amount of code that will simply not fit in your instruction memory.

However, if you still want to write truly high-level code, you can use third-party libraries written specifically for embedded systems. For example, for printf you can use the following repository: https://github.com/mpaland/printf. To use this library, you only need to implement one function with the following prototype:

void _putchar(char character);

In this function, you need to describe how to output a single ASCII character to your output device.

You should still avoid dynamic arrays, replacing them with statically allocated arrays of a sufficiently large size. However, it is not always possible to solve a problem with just a static array. For example, you may want to use a dictionary (associative array). In that case, you can use the following repository: https://github.com/mpaland/avl_array. It does not use dynamic memory; the underlying container is also a static array whose size you must set manually.

Steps

Carefully study the theory and practice sections.
Understand the principle of interacting with the control and status registers of a peripheral device using Listing 4 as an example.
Update the values of the INSTR_MEM_SIZE_BYTES and DATA_MEM_SIZE_BYTES parameters in the memory_pkg package to 32'd1024 and 32'd2048 respectively. Since packages are not modules, you will not see them in the Hierarchy tab of the sources window; instead, you can find them in the Libraries and Compile order tabs.
Write a program for your individual assignment and peripheral device set in C or C++. If writing C++ code, remember to add extern "C" before the definition of the int_handler function.
1. The program must include both main and int_handler functions, since the startup file contains calls to these functions. If needed, you may define additional helper functions — there is no restriction that exactly two functions must be present.
2. The main function may be empty — upon its return, the startup file provides an infinite loop that the processor can only exit via an interrupt.
3. In the int_handler function, you must read the input data received from the input device.
4. You must decide on your own how to structure the program's execution flow: whether your individual assignment will be computed only once inside main using data passed from int_handler via global variables, or whether it will be recomputed on every call to int_handler.
5. Access to control and status registers must be done through pointers to structures declared in the file platform.h. For the VGA controller, access to memory regions is done through a structure instance (not a pointer to it) containing the array names char_map, color_map, and tiff_map.
Compile the program and the startup file into object files.
Link the object files into an executable, passing the corresponding script to the linker.
Export the .text and .data sections from the object file into the text files init_instr.mem and init_data.mem. If you did not create any initialized static arrays or global variables, the init_data.mem file may be empty.
1. If the init_data.mem file is not empty, initialize the memory in the data_mem module using the $readmemh system function, as was done for instruction memory.
2. Before doing so, remove the first line (of the form @20000000) from the init_data.mem file, which specifies the starting initialization address.
Add the resulting text files to the Vivado project.
Run the simulation of program execution on your processor using the testbench from Lab 13.
1. peripheral_pkg contains helper tasks that allow you to simulate keyboard or UART input (no helper tasks are needed for switches). You can see an example of input simulation in the testbench. Update the testbench code so that the data required for your program to run is fed into your system.
2. For debugging during simulation, it is convenient to use the disassembled file as a reference, guided by the address and data signals on the instruction bus.
Verify that the program executes correctly on the processor in the FPGA.

README.md Unescape Escape

Lab 14. High-Level Programming