APF27 FPGA-IMX interface description
This article describe the interface between IMX and Spartan3A on APF27.
Documentation of i.MX interface can be found in the iMX reference manual, chapter 17,
«Wireless External Interface Module (WEIM)».
Hardware
The detailed electronic schematics of apf27 fpga interface can be found on this document page 11. A simplified schema is shown below on figure 1.
Signals used in the design are:
- CLKO: Clock generated by i.MX. Used as general clock by the FPGA.
- DATA[16]: 16 bits data bus.
- ADDR[13]: 12 bits address bus, least significant bit (ADDR[0]) is not used because only word access are done.
- CS4N_DTACK: Chip Select 4 or Data Transmit ACKnowledge.
- CS5,CS1: Chip Select 5 and 1.
- EB0N and EB1N: For Enable Byte, write signal for lower byte and upper byte on data bus.
- OEN: For Output Enable bit, read signal.
- DMA_GRANT# and DMA_REQ#: Signals to use DMA on i.MX.
Each chip select has its own configuration (timing, address range, ...) that can be used for different slaves in the FPGA.
CLKO is now configured at 100MHz by default configured to simplify FPGA IP design.
Chip Select Timings configuration
Old configuration (133MHz)
The old configuration uses CS5 for accessing the FPGA. 32 bits register CS5 is used to configure all timing for this chip select. It's old configuration was :
- CS5U (Upper 16bits, see page 521 of reference manual ): mw D8002050 00000600
This will add 6 waits state on access to read value correctly (WSC: Wait State Control).
- CS5L (Lower 16bits,see page 525 of reference manual): mw D8002054 00000D01
Enable chip select (CSEN), Enable for only write access EB[] (EBC) and select data port size to 16bits (DSZ: Data port SiZe).
- CS5A (additionnal register, page 528): mw D8002058 0
- WCR (WEIM Configuration Register): mw D8002060 00002000
Address unshifted for CS5 (AUS5)
With this configuration, the access time (read/write) to the FPGA was set to 44ns.
This configuration was interesting because all timings are under control. But the problem was that to be perfectly synchronous, the FPGA was clocked at 133MHz like WEIM and some IP design doesn't work at this frequency.
Alternative configuration (with DTACK)
To solve the problem, another solution can be the DTACK signal (asynchronous protocol). The DTACK signal is emitted by the slave to master when write/read is done. With this solution, access time is variable and the timing is not static any more.
i.MX registers configuration
To configure CS5N access using dtack, the gpio PF21 must be configured on input with a_out :
md 10015500 1 # read direction register PTF_DDIR md 10015510 1 # read register PTF_ICONFA2
By default, configuration is ok, we just have to select gpio in use :
mw 10015520 FFBF1E80 # PTF_GIUS
FPGA design
On design, the wishbone signal ack is returned to dtack pin. Here a sample code used to test the dtack :
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
-- ----------------------------------------------------------------------------
Entity imx27_wb16_wrapper is
-- ----------------------------------------------------------------------------
port
(
-- i.MX Signals
imx_address : in std_logic_vector(11 downto 0);
imx_data : inout std_logic_vector(15 downto 0);
imx_cs_n : in std_logic;
imx_oe_n : in std_logic;
imx_eb3_n : in std_logic;
imx_dtack : out std_logic ;
data0_out : out std_logic ;
addr1_out : out std_logic ;
cs_n_out : out std_logic ;
oe_n_out : out std_logic ;
eb3_n_out : out std_logic ;
dtack_out : out std_logic ;
-- Global Signals
gls_reset : in std_logic;
gls_clk : in std_logic
);
end entity;
-- ----------------------------------------------------------------------------
Architecture RTL of imx27_wb16_wrapper is
-- ----------------------------------------------------------------------------
constant DELAY : natural := 2;
signal write : std_logic;
signal read : std_logic;
signal strobe : std_logic;
signal writedata : std_logic_vector(15 downto 0);
signal address : std_logic_vector(12 downto 0);
signal reg1 : std_logic_vector(15 downto 0);
signal reg2 : std_logic_vector(15 downto 0);
signal write_ack : std_logic ;
signal read_ack : std_logic ;
signal wbm_address : std_logic_vector(12 downto 0);
signal wbm_writedata: std_logic_vector(15 downto 0);
signal wbm_readdata: std_logic_vector(15 downto 0);
signal wbm_strobe : std_logic ;
signal wbm_write : std_logic ;
signal wbm_cycle : std_logic ;
signal dtack_s : std_logic ; -- dtack
signal dtack_d : std_logic ; -- dtack delayed
signal dtack_reg : std_logic_vector( DELAY-1 downto 0);
--i signal dtack_old: std_logic_vector( DELAY-1 downto 0);
signal dtack_old : std_logic ;
begin
dtack_s <= write_ack or read_ack;
-- imx_dtack <= dtack_d;
-- dtack_out <= dtack_d;
data0_out <= imx_data(0);
addr1_out <= imx_address(1);
cs_n_out <= imx_cs_n;
oe_n_out <= imx_oe_n;
eb3_n_out <= imx_eb3_n;
-- ----------------------------------------------------------------------------
-- External signals synchronization process
-- ----------------------------------------------------------------------------
process(gls_clk, gls_reset)
begin
if(gls_reset='1') then
write <= '0';
read <= '0';
strobe <= '0';
writedata <= (others => '0');
address <= (others => '0');
elsif(rising_edge(gls_clk)) then
strobe <= not (imx_cs_n) and not(imx_oe_n and imx_eb3_n);
write <= not (imx_cs_n or imx_eb3_n);
read <= not (imx_cs_n or imx_oe_n);
address <= imx_address & '0';
writedata <= imx_data;
end if;
end process;
wbm_address <= address when (strobe = '1') else (others => '0');
wbm_writedata <= writedata when (write = '1') else (others => '0');
wbm_strobe <= strobe;
wbm_write <= write;
wbm_cycle <= strobe;
sync_p : process (gls_clk,gls_reset)
variable ack: std_logic ;
begin
if gls_reset = '1' then
imx_data <= (others => 'Z');
imx_dtack <= '0';
dtack_old <= '0';
elsif rising_edge(gls_clk) then
if read = '1' then
imx_data <= wbm_readdata;
dtack_old <= (read_ack or write_ack);
imx_dtack <= dtack_old;
dtack_out <= dtack_old;
else
imx_data <= (others => 'Z');
dtack_old <= '0';
imx_dtack <= '0';
dtack_out <= '0';
end if;
end if;
end process sync_p;
register_write : process(gls_clk,gls_reset)
begin
if gls_reset = '1' then
reg1 <= x"caca";
reg2 <= x"5599";
elsif rising_edge(gls_clk) then
if (wbm_strobe = '1') and (wbm_cycle = '1') and (wbm_write = '1') then
if wbm_address = "0000000000000" then
write_ack <= '1';
reg1 <= wbm_writedata;
elsif wbm_address = "0000000000010" then
write_ack <= '1';
reg2 <= wbm_writedata;
end if;
else
write_ack <= '0';
end if;
end if;
end process register_write;
register_read : process(gls_clk,gls_reset)
begin
if gls_reset = '1' then
wbm_readdata <= (others => '0');
elsif rising_edge(gls_clk) then
if (wbm_strobe = '1') and (wbm_cycle = '1') and (wbm_write = '0') then
if wbm_address = "0000000000000" then
read_ack <= '1';
wbm_readdata <= reg1;
elsif wbm_address = "0000000000010" then
read_ack <= '1';
wbm_readdata <= reg2;
end if;
else
read_ack <= '0';
end if;
end if;
end process register_read;
end architecture RTL;
With ucf ;
# Constraint file
#
NET "gls_clk" TNM_NET = "gls_clk";
TIMESPEC "TS_rstgen_syscon00_ext_clk" = PERIOD "gls_clk" 7.5188 ns HIGH 50 %;
NET "gls_clk" LOC="N9" | IOSTANDARD=LVCMOS18;# CLK0
NET "imx_cs_n" LOC="P10" | IOSTANDARD=LVCMOS18;# CS5N
NET "imx_eb3_n" LOC="P9" | IOSTANDARD=LVCMOS18;# EB0N
NET "imx_oe_n" LOC="R9" | IOSTANDARD=LVCMOS18;# OEN
NET "imx_dtack" LOC="R3" | IOSTANDARD=LVCMOS18 | DRIVE=8;# CS4N_DTACK
NET "imx_address<0>" LOC="N5" | IOSTANDARD=LVCMOS18;# ADDR1
NET "imx_address<1>" LOC="L7" | IOSTANDARD=LVCMOS18;# ADDR2
NET "imx_address<2>" LOC="M7" | IOSTANDARD=LVCMOS18;# ADDR3
NET "imx_address<3>" LOC="M8" | IOSTANDARD=LVCMOS18;# ADDR4
NET "imx_address<4>" LOC="L8" | IOSTANDARD=LVCMOS18;# ADDR5
NET "imx_address<5>" LOC="L9" | IOSTANDARD=LVCMOS18;# ADDR6
NET "imx_address<6>" LOC="L10" | IOSTANDARD=LVCMOS18;# ADDR7
NET "imx_address<7>" LOC="M11" | IOSTANDARD=LVCMOS18;# ADDR8
NET "imx_address<8>" LOC="P11" | IOSTANDARD=LVCMOS18;# ADDR9
NET "imx_address<9>" LOC="N11" | IOSTANDARD=LVCMOS18;# ADDR10
NET "imx_address<10>" LOC="N12" | IOSTANDARD=LVCMOS18;# ADDR11
NET "imx_address<11>" LOC="P13" | IOSTANDARD=LVCMOS18;# ADDR12
NET "imx_data<0>" LOC="T5" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA0
NET "imx_data<1>" LOC="T6" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA1
NET "imx_data<2>" LOC="P7" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA2
NET "imx_data<3>" LOC="N8" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA3
NET "imx_data<4>" LOC="P12" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA4
NET "imx_data<5>" LOC="T13" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA5
NET "imx_data<6>" LOC="R13" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA6
NET "imx_data<7>" LOC="T14" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA7
NET "imx_data<8>" LOC="P5" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA8
NET "imx_data<9>" LOC="N6" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA9
NET "imx_data<10>" LOC="T3" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA10
NET "imx_data<11>" LOC="T11" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA11
NET "imx_data<12>" LOC="T4" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA12
NET "imx_data<13>" LOC="R5" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA13
NET "imx_data<14>" LOC="M10" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA14
NET "imx_data<15>" LOC="T10" | IOSTANDARD=LVCMOS18 | DRIVE=8;# DATA15
NET "data0_out" LOC="D16" | IOSTANDARD=LVCMOS18 | DRIVE=12;#IO_L22P_1
NET "addr1_out" LOC="D15" | IOSTANDARD=LVCMOS18 | DRIVE=12;#IO_L22N_1
NET "cs_n_out" LOC="N3" | IOSTANDARD=LVCMOS18 | DRIVE=12;#IO_L24P_3
NET "oe_n_out" LOC="R1" | IOSTANDARD=LVCMOS18 | DRIVE=12;#IO_L23P_3
NET "eb3_n_out" LOC="N2" | IOSTANDARD=LVCMOS18 | DRIVE=12;#IO_L22P_3
NET "dtack_out" LOC="M1" | IOSTANDARD=LVCMOS18 | DRIVE=8;#IO_L20P_3
#end
Timings
The main problem with this solution is that i.MX wait a too long time (~42ns) after dtack rise to de-assert its chip select.
Synchronous access with FPGA at 100MHz (default configuration on APF27)
The main problem with fpga clocked at 100MHz is that wishbone will not be synchronous with WEIM interface (clocked at 133MHz).
Simulation
To unsure that interface work well, we will simulate it.
Registers configuration
Changing CLK0 to 100MHz:
To change CLK0 to 100MHz, we will use HCLK Source (400MHz) divided by 4.
Selecting HCLK Source (CCSR)
mw 10027028 00008305 # HCLK Source (MPLL 2x clock output / 3) := 400MHz
Divide by 4 (PCDR0);
mw 10027018 12C41083 # divide by 4
And we have to add one more clock cycle for CS (CSCR5U):
mw.l D8002050 00000600
Timing register configuration
All register configurations for external memory are done in u-boot. Configuration file can be found in buildroot/project_build_armv5te/<project_name>/u-boot-1.3.4/include/configs/apf27.h And is saved in armadeus tree at buildroot/target/device/armadeus/apf27/apf27.h
Linux testing program
A program is available for testing imx-fpga communication speed in armadeus tree in target/linux/debug/imx-fpga-test.