HWPE Interface Modules: Data Movement & Marshaling

Basic modules (HWPE-Stream)

Basic HWPE-Stream management modules are used to select multiple streams, merge multiple streams into one, split a stream in multiple ones, synchronize their handshakes and similar basic “morphing” functionality; or to delay and enqueue streams. Modules performing these functions can be found within the rtl/basic and rtl/fifo subfolders of the hwpe-stream repository.

hwpe_stream_merge

_images/hwpe_stream_merge.sv.png

The hwpe_stream_merge module is used to merge NB_IN_STREAMS input streams into a single, bigger stream. The data and strb channels from the input streams are bound in order and the valid is generated as the AND of all valid’s from input streams. The ready is broadcasted from the output stream to all input streams.

A typical use of this module is to take NB_IN_STREAMS 32-bit streams coming from a TCDM load interface to be merged into a single bigger stream.

The following shows an example of the hwpe_stream_merge operation:

_images/wavedrom-eef2a770-89f3-4d7b-b8ff-adbb03cde060.svg

Fig. 9 Example of hwpe_stream_merge operation.

Table 4 hwpe_stream_merge design-time parameters.
Name Default Description
NB_IN_STREAMS 2 Number of input HWPE-Stream streams.
DATA_WIDTH_IN 32 Width of the input HWPE-Stream streams.

hwpe_stream_split

_images/hwpe_stream_split.sv.png

The hwpe_stream_split module is used to split a single stream into NB_OUT_STREAMS, 32-bit output streams. The data and strb channel from the input stream is split in ordered output streams, and the valid is broadcast to all outgoing streams. The ready is generated as the AND of all ready’s from output streams.

A typical use of this module is to take a multiple-of-32-bit stream coming from within the HWPE and split it into multiple 32-bit streams that feed a TCDM store interface.

The following shows an example of the hwpe_stream_split operation:

_images/wavedrom-76a645e7-e811-4b0e-9015-1faefd601aef.svg

Fig. 10 Example of hwpe_stream_split operation.

Table 5 hwpe_stream_split design-time parameters.
Name Default Description
NB_OUT_STREAMS 2 Number of output HWPE-Stream streams.
DATA_WIDTH_IN 128 Width of the input HWPE-Stream stream.

hwpe_stream_fence

_images/hwpe_stream_fence.sv.png

The hwpe_stream_fence module is used to synchronize the handshake between NB_STREAMS streams. This is necessary, for example, when multiple 32-bit streams are produced from separate TCDM accesses and have to be joined into a single, wider stream.

_images/wavedrom-9ee6a7ab-1aad-4557-adda-e372ed8e9170.svg

Fig. 11 Example of hwpe_stream_fence operation.

Table 6 hwpe_stream_fence design-time parameters.
Name Default Description
NB_STREAMS 2 Number of input/output HWPE-Stream streams.
DATA_WIDTH 32 Width of the HWPE-Stream streams.

hwpe_stream_mux_static

_images/hwpe_stream_mux_static.sv.png

The hwpe_stream_mux_static module is used to statically propagate one of 2 input streams of size DATA_SIZE into a single output stream. The multiplexer is static as the selection bit sel_i cannot be changed when there are transactions in flight; if the selection bit is changed when transactions are in flight, the result is undefined.

The following shows an example of the hwpe_stream_mux_static operation:

_images/wavedrom-5d31b6d0-59e6-4d4d-a9b9-a5adc1ebfd00.svg

Fig. 12 Example of hwpe_stream_mux_static operation.

hwpe_stream_demux_static

_images/hwpe_stream_demux_static.sv.png

The hwpe_stream_demux_static module is used to propagate a single input stream of size DATA_SIZE into one of NB_OUT_STREAMS output streams. The non-selected output streams are all invalid. The demultiplexer is static as the selection bit sel_i cannot be changed when there are transactions in flight; if the selection bit is changed when transactions are in flight, the result is undefined.

The following shows an example of the hwpe_stream_demux_static operation:

_images/wavedrom-5c86295a-c29a-4c9c-8415-a82c91728942.svg

Fig. 13 Example of hwpe_stream_demux_static operation.

Table 7 hwpe_stream_demux_static design-time parameters.
Name Default Description
NB_OUT_STREAMS 2 Number of output HWPE-Stream streams.

hwpe_stream_fifo

_images/hwpe_stream_fifo.sv.png

The hwpe_stream_fifo module implements a hardware FIFO queue for HWPE-Stream streams, used to withstand data scarcity (valid`=0) or backpressure (`ready`=0), decoupling two architectural domains. This FIFO is single-clock and therefore cannot be used to cross two distinct clock domains. The FIFO will lower its `ready signal on the input stream push_i interface when it is completely full, and will lower its valid signal on the output stream pop_o interface when it is completely empty.

Table 8 hwpe_stream_fifo design-time parameters.
Name Default Description
DATA_WIDTH 32 Width of the HWPE-Streams (typically multiple of 32, but this module does not care).
FIFO_DEPTH 8 Depth of the FIFO queue (multiple of 2).
LATCH_FIFO 0 If 1, use latches instead of flip-flops (requires special constraints in synthesis).
LATCH_FIFO_TEST_WRAP 0 If 1 and LATCH_FIFO is 1, wrap latches with BIST wrappers.
Table 9 hwpe_stream_fifo output flags.
Name Type Description
empty logic 1 if the FIFO is currently empty.
full logic 1 if the FIFO is currently full.
push_pointer logic[7:0] Unused.
pop_pointer logic[7:0] Unused.

hwpe_stream_fifo_earlystall

_images/hwpe_stream_fifo_earlystall.sv.png

The hwpe_stream_fifo_earlystall module implements a hardware FIFO queue for HWPE-Stream streams, used to withstand data scarcity (valid =1) or backpressure (ready =1), decoupling two architectural domains. This FIFO is single-clock and therefore cannot be used to cross two distinct clock domains. The only difference with respect to hwpe_stream_fifo is that this version of the FIFO lowers its ready signal one cycle earlier, i.e. when it is filled by FIFO_DEPTH -1 elements. It will lower its valid signal on the output stream pop_o interface when it is completely empty.

Table 10 hwpe_stream_fifo_earlystall design-time parameters.
Name Default Description
DATA_WIDTH 32 Width of the HWPE-Streams (multiple of 32).
FIFO_DEPTH 8 Depth of the FIFO queue (multiple of 2).
LATCH_FIFO 0 If 1, use latches instead of flip-flops (requires special constraints in synthesis).
Table 11 hwpe_stream_fifo_earlystall output flags.
Name Type Description
empty logic 1 if the FIFO is currently empty.
full logic 1 if the FIFO is currently full.
push_pointer logic[7:0] Unused.
pop_pointer logic[7:0] Unused.

hwpe_stream_fifo_ctrl

_images/hwpe_stream_fifo_ctrl.sv.png

The hwpe_stream_fifo_ctrl module implements a hardware FIFO queue similar to that implemented by hwpe_stream_fifo, but without any actual interface handshake forced on HWPE-Streams. Instead, it will push its “virtual” handshake on the push_valid_i/push_ready_o and pop_valid_o/pop_ready_i signals. It can be used to operate multiple big FIFO queues (e.g. with latches) in a synchronized fashion without breaking the HWPE-Stream protocol.

Table 12 hwpe_stream_fifo_ctrl design-time parameters.
Name Default Description
FIFO_DEPTH 8 Depth of the FIFO queue (multiple of 2).

Basic modules (HWPE-Mem / HWPE-MemDecoupled)

Basic HWPE-Mem management modules are used to delay/enqueue HWPE-MemDecoupled interfaces, multiplex multiple HWPE-Mem, or reorder them before hooking the accelerator to a Tightly-Coupled Data Memory (TCDM). Modules performing these functions can be found within the rtl/tcdm subfolder of the hwpe-stream repository.

hwpe_stream_tcdm_fifo_store

_images/hwpe_stream_tcdm_fifo_store.sv.png

The hwpe_stream_tcdm_fifo_store module implements a hardware FIFO queue for HWPE-MemDecoupled store streams, used to withstand data scarcity (req`=0) or backpressure (`gnt`=0), decoupling two architectural domains. This FIFO is single-clock and therefore cannot be used to cross two distinct clock domains. The FIFO treats a HWPE-MemDecoupled store stream as a wide HWPE-Stream where, on both sides, the `data field contains addr, data, be of the input tcdm_slave; the req and gnt of the HWPE-MemDecoupled interfaces are mapped on valid and ready respectively. The FIFO will lower its gnt signal on the slave interface tcdm_slave when it is completely full, and will lower its req signal on the master interface tcdm_master when it is completely empty. _hwpe_stream_tcdm_fifo_store_mapping shows this mapping.

_images/hwpe_stream_tcdm_fifo_store.png

Fig. 14 Mapping of HWPE-MemDecoupled and HWPE-Stream signals inside the store FIFO.

Table 13 hwpe_stream_tcdm_fifo_store design-time parameters.
Name Default Description
FIFO_DEPTH 8 Depth of the FIFO queue (multiple of 2).
LATCH_FIFO 0 If 1, use latches instead of flip-flops (requires special constraints in synthesis).
Table 14 hwpe_stream_tcdm_fifo_store output flags.
Name Type Description
empty logic 1 if the FIFO is currently empty.
full logic 1 if the FIFO is currently full.
push_pointer logic[7:0] Unused.
pop_pointer logic[7:0] Unused.

hwpe_stream_tcdm_fifo_load

_images/hwpe_stream_tcdm_fifo_load.sv.png

The hwpe_stream_tcdm_fifo_load module implements a hardware FIFO queue for HWPE-MemDecoupled load streams, used to withstand data scarcity (req`=0) or backpressure (`gnt`=0), decoupling two architectural domains. This FIFO is single-clock and therefore cannot be used to cross two distinct clock domains. The FIFO treats a HWPE-MemDecoupled load stream as a combination of two 32-bit HWPE-Streams, one going from the `tcdm_master to the tcdm_slave interface carrying the addr (outgoing stream); the other from the tcdm_slave to the tcdm_master interface, carrying the r_data (incoming stream).

On the slave side, the req and gnt of the HWPE-MemDecoupled interfaces are mapped on valid and ready respectively in the outgoing stream. Backpressure on the incoming stream (slave side) cannot be enforced by means of the HWPE-MemDecoupled slave interface and thus is carried by a specific input ready_i that must be generated outside of the TCDM FIFO, typically by a hwpe_stream_source module (output tcdm_fifo_ready_o). On the master side, req is mapped to the AND of the incoming stream ready signal and the outgoing stream valid signal. gnt is hooked to the outgoing stream ready signal. The r_valid is mapped on valid in the incoming stream. _hwpe_stream_tcdm_fifo_load_mapping shows this mapping.

_images/hwpe_stream_tcdm_fifo_load.png

Fig. 15 Mapping of HWPE-MemDecoupled and HWPE-Stream signals inside the load FIFO.

Table 15 hwpe_stream_tcdm_fifo_load design-time parameters.
Name Default Description
FIFO_DEPTH 8 Depth of the FIFO queue (multiple of 2).
LATCH_FIFO 0 If 1, use latches instead of flip-flops (requires special constraints in synthesis).
Table 16 hwpe_stream_tcdm_fifo_load output flags.
Name Type Description
empty logic 1 if the FIFO is currently empty.
full logic 1 if the FIFO is currently full.
push_pointer logic[7:0] Unused.
pop_pointer logic[7:0] Unused.

hwpe_stream_tcdm_mux

_images/hwpe_stream_tcdm_mux.sv.png

The TCDM multiplexer can be used to funnel more input “virtual” TCDM channels in into a smaller set of master ports out. It uses a round robin counter to avoid starvation, and differs from the modules used within the logarithmic interconnect in that arbitration is performed depending on the round robin counter and not on the slave port; in other words, its task is to fill all out ports with requests from the in port, and not to route in requests to a specific out port.

Notice that the multiplexer is not “optimal” in the sense that there is no reorder buffer, so transactions cannot be swapped in-flight to optimally fill the downstream available bandwidth. However, in real accelerators many systematic issues with bandwidth sharing can be solved by upstream TCDM FIFOs and by clever reordering of channels, since the dataflow schedule is known.

Table 17 hwpe_stream_tcdm_mux design-time parameters.
Name Default Description
NB_IN_CHAN 2 Number of input HWPE-Mem channels.
NB_OUT_CHAN 1 Number of output HWPE-Mem channels.

hwpe_stream_tcdm_mux_static

_images/hwpe_stream_tcdm_mux_static.sv.png

The hwpe_stream_tcdm_mux_static module is used to statically share a set of out master ports using the HWPE-Mem protocol between two sets of slave ports in0 and in1. It works similarly to the hwpe_stream_mux_static and similarly requires a strictly static selector sel_i.

Table 18 hwpe_stream_tcdm_mux_static design-time parameters.
Name Default Description
NB_CHAN 2 Number of output HWPE-Mem channels.

hwpe_stream_tcdm_reorder

_images/hwpe_stream_tcdm_reorder.sv.png

The hwpe_stream_tcdm_reorder block can be used to rotate the order of a set of HWPE-Mem channels depending on an order_i input, which can be changed dynamically (e.g. a counter). This is used to “equalize” channels with different probabilities of issuing a request so that the downstream HWPE-Mem channels are used with the same average probability, minimizing the chances for memory starvation.

Table 19 hwpe_stream_tcdm_reorder design-time parameters.
Name Default Description
NB_CHAN 2 Number of HWPE-Mem channels.

Streamer modules

Streamer modules constitute the heart of the IPs use to interface HWPEs with a PULP system. They include all the modules that are used to generate HWPE-Streams from address patterns on the TCDM, including the address generation itself, data realignment to enable access to data located at non-byte-aligned addresses, strobe generation to selectively disable parts of a stream, and the main streamer source and sink modules used to put these functions together. Modules performing these functions can be found within the rtl/streamer subfolder of the hwpe-stream repository.

Two main streamer modules (hwpe_stream_source and hwpe_stream_sink) are composite of several other IPs, including address generation and strobe generation blocks included in this section, as well as of basic HWPE-Stream management blocks.

hwpe_stream_source

_images/hwpe_stream_source.sv.png

The hwpe_stream_source module is the high-level source streamer performing a series of loads on a HWPE-Mem or HWPE-MemDecoupled interface and producing a HWPE-Stream data stream to feed a HWPE engine/datapath. The source streamer is a composite module that makes use of many other fundamental IPs. Its architecture is shown in :numfig: _hwpe_stream_source_archi.

_images/hwpe_stream_source_archi.png

Fig. 16 Architecture of the source streamer.

Fundamentally, a source streamer acts as a specialized DMA engine acting out a predefined pattern from an hwpe_stream_addressgen to perform a burst of loads via a HWPE-Mem interface, producing a HWPE-Stream data stream from the HWPE-Mem r_data field.

Depending on the DECOUPLED parameter, the streamer supports delayed accesses using a HWPE-MemDecoupled interface. The source streamer does not include any TCDM FIFO inside on its own; rather, it provides a specific tcdm_fifo_ready_o output signal that can be hooked to an external hwpe_stream_tcdm_fifo_load. tcdm_fifo_ready_o provides a backpressure mechanism from the source streamer to the TCDM FIFO (this is unnecessary in the case of TCDM FIFOs for store).

Table 20 hwpe_stream_source design-time parameters.
Name Default Description
DECOUPLED 0 If 1, the module expects a HWPE-MemDecoupled interface instead of HWPE-Mem.
DATA_WIDTH 32 Width of input/output streams (multiple of 32).
LATCH_FIFO 0 If 1, use latches instead of flip-flops (requires special constraints in synthesis).
TRANS_CNT 16 Number of bits supported in the transaction counter of the address generator, which will overflow at 2^ TRANS_CNT.
Table 21 hwpe_stream_source input control signals.
Name Type Description
req_start logic When 1, the source streamer operation is started if it is ready.
addressgen_ctrl ctrl_addressgen_t Configuration of the address generator (see hwpe_stream_addresgen).
Table 22 hwpe_stream_source output flags.
Name Type Description
ready_start logic 1 when the source streamer is ready to start operation.
done logic 1 for one cycle when the streamer ends operation.
addressgen_flags flags_addressgen_t Address generator flags (see hwpe_stream_addresgen).
ready_fifo logic Unused.

hwpe_stream_sink

_images/hwpe_stream_sink.sv.png

The hwpe_stream_sink module is the high-level sink streamer performing a series of stores on a HWPE-Mem or HWPE-MemDecoupled interface from an incoming HWPE-Stream data stream from a HWPE engine/datapath. The sink streamer is a composite module that makes use of many other fundamental IPs. Its architecture is shown in :numfig: _hwpe_stream_sink_archi.

_images/hwpe_stream_sink_archi.png

Fig. 17 Architecture of the source streamer.

Fundamentally, a ink streamer acts as a specialized DMA engine acting out a predefined pattern from an hwpe_stream_addressgen to perform a burst of stores via a HWPE-Mem interface, consuming a HWPE-Stream data stream into the HWPE-Mem data field.

The sink streamer indifferently supports standard HWPE-Mem or delayed HWPE-MemDecoupled accesses. This is due to the nature of store streams, that are unidirectional (i.e. addr and data move in the same direction) and hence insensitive to latency.

Table 23 hwpe_stream_sink design-time parameters.
Name Default Description
USE_TCDM_FIFOS 0 If 1, the module produces a HWPE-MemDecoupled interface and includes a TCDM FIFO directly inside.
DATA_WIDTH 32 Width of input/output streams.
LATCH_FIFO 0 If 1, use latches instead of flip-flops (requires special constraints in synthesis).
TRANS_CNT 16 Number of bits supported in the transaction counter of the address generator, which will overflow at 2^ TRANS_CNT.
Table 24 hwpe_stream_sink input control signals.
Name Type Description
req_start logic When 1, the sink streamer operation is started if it is ready.
addressgen_ctrl ctrl_addressgen_t Configuration of the address generator (see hwpe_stream_addresgen).
Table 25 hwpe_stream_sink output flags.
Name Type Description
ready_start logic 1 when the sink streamer is ready to start operation.
done logic 1 for one cycle when the streamer ends operation.
addressgen_flags flags_addressgen_t Address generator flags (see hwpe_stream_addresgen).
ready_fifo logic Unused.

hwpe_stream_addressgen

_images/hwpe_stream_addressgen.sv.png

The hwpe_stream_addressgen module is used to generate addresses to load or store HWPE-Stream streams, as well as the related byte enable strobes (gen_addr_o and gen_strb_o respectively). The address generator can be used to generate address from a three-dimensional space of “words”, “lines” and “features”. Lines and features can be separated by a certain stride, and a roll parameter can be used to reuse the same offsets multiple times.

The multiple loop functionality is partially overlapped by the functionality provided by the microcode processor hwce_ctrl_ucode that can be embedded in HWPEs. The latter is much more flexible and smaller, but less fast. When using a single loop in the address generator, the HWPE designer should statically set line_stride =0, feat_length =1, feat_stride =0.

The address generation loop considers three-dimensional vectors, where the three dimensions are called packet, line and features from the innermost to the outermost. One iteration is performed per each cycle when enable_i is 1. Feature loops can behave in two different fashions, modeled after the behavior of input/output features in CNNs. The following piece of code resumes the basic functionality provided by the address generator, discarding more complex situations where the address is misaligned (resulting in one more transaction, introduced automatically).

int word_addr=0, line_addr=0, feat_addr=0;
int trans_idx=0;
while(trans_idx < trans_size) {
  if(!enable)
    continue;
  for(int feat_idx=0; feat_idx<feat_roll; feat_idx++) { // feature loop
    for(int line_idx=0; line_idx<feat_length; line_idx++) { // line loop
      for(int word_idx=0; word_idx<line_length; word_idx++) { // word loop
        gen_addr = base_addr + feat_addr + line_addr + word_idx * STEP;
      }
      line_addr += line_stride;
    }
    if((loop_outer) && (feat_idx == feat_roll-1)) {
      feat_addr += feat_stride;
      feat_idx  = 0;
    }
    else if ((!loop_outer) && (feat_idx < feat_roll-1)){
      feat_addr += feat_stride;
    }
    else if ((!loop_outer) && (feat_idx == feat_roll-1)){
      feat_addr = 0;
      feat_idx  = 0;
    }
  }
}
Table 26 hwpe_stream_addressgen design-time parameters.
Name Default Description
REALIGN_TYPE HWPE_STREAM_REALIGN_SOURCE Type of realignment, can be set to HWPE_STREAM_REALIGN{SOURCE,SINK}.
STEP 4 Step of address generation (untested with != 4).
TRANS_CNT 16 Number of bits supported in the transaction counter, which will overflow at 2^ TRANS_CNT.
CNT 10 Number of bits supported in non-transaction counters, which will overflow at 2^ CNT.
DELAY_FLAGS 0 If 1, delay the production of flags by one cycle.
Table 27 hwpe_stream_addressgen input control signals.
Name Type Description
base_addr logic[31:0] Byte-aligned base address of the stream in the HWPE-accessible memory.
trans_size logic[31:0] Total size of transaction; only the TRANS_CNT LSB are actually used.
line_stride logic[15:0] Distance between two adjacent lines in bytes.
line_length logic[15:0] Length of a line in words, rounded by including also incomplete final words.
feat_stride logic[15:0] Distance between two adjacent features in bytes.
feat_length logic[15:0] Length of a feature in number of lines.
loop_outer logic Whether this corresponds to an outer or inner feature loop.
feat_roll logic[15:0] After this number of features, depending on loop_outer, feature index will be rolled back or incremented.
realign_type logic Unused.
line_length_remainder logic[7:0] Unused.
Table 28 hwpe_stream_addressgen output flags.
Name Type Description
realign_flags ctrl_realign_t Control signals to be used for realignment by hwpe_stream_{source,sink}_realign modules.
word_update logic 1 when the word loop has been updated.
line_update logic 1 when the line loop has been updated.
feat_update logic 1 when the feature loop has been updated.
in_progress logic 1 when the address generation has progressed.

hwpe_stream_strbgen

_images/hwpe_stream_strbgen.sv.png

The hwpe_stream_strbgen module is used to generate strobes for load or store HWPE-Stream streams, in case of incomplete transfers. It uses information passed through the same configuration struct used for the address generator.

Table 29 hwpe_stream_strbgen design-time parameters.
Name Default Description
DATA_WIDTH 32 Width of input/output streams.
Table 30 hwpe_stream_strbgen input control signals.
Name Type Description
base_addr logic[31:0] Unused.
trans_size logic[31:0] Unused.
line_stride logic[15:0] Unused.
line_length logic[15:0] Length of a line in words, rounded by including also incomplete final words.
feat_stride logic[15:0] Unused.
feat_length logic[15:0] Unused.
loop_outer logic Unused.
feat_roll logic[15:0] Unused.
realign_type logic Unused.
line_length_remainder logic[7:0] Number of valid bytes in the final word in a line; if 0, the final word is considered fully valid.

hwpe_stream_sink_realign

_images/hwpe_stream_sink_realign.sv.png

The hwpe_stream_sink_realign module realigns HWPE-Streams to prepare them for storage in memory. Specifically, it rotates strb signals according to its control interface, produced along with addresses in the address generator.

Table 31 hwpe_stream_sink_realign design-time parameters.
Name Default Description
DATA_WIDTH 32 Width of input/output streams.
Table 32 hwpe_stream_sink_realign input control signals.
Name Type Description
enable logic Unused.
strb_valid logic Unused.
realign logic If 1, the realigner is actively used to generate strobed HWPE-Streams. If 0, it is bypassed.
first logic Strobe at 1 for the first packet in a line.
last logic Strobe at 1 for the last packet in a line.
last_packet logic Strobe at 1 for the last packet of the transfer.
line_length logic[15:0] Unused.

hwpe_stream_source_realign

_images/hwpe_stream_source_realign.sv.png

The hwpe_stream_source_realign module realigns HWPE-Streams loaded in a misaligned fashion from memory. Specifically, it rotates strb signals according to its control interface, produced along with addresses in the address generator.

Table 33 hwpe_stream_source_realign design-time parameters.
Name Default Description
DECOUPLED 0 If 1, the module expects a HWPE-MemDecoupled interface instead of HWPE-Mem.
DATA_WIDTH 32 Width of input/output streams.
STRB_FIFO_DEPTH 4 Depth of the FIFO queue used for strobes; when full, the realigner will lower its ready signal at the input interface.
Table 34 hwpe_stream_source_realign input control signals.
Name Type Description
enable logic If 0, the realigner is fully clock-gated.
strb_valid logic If 1, the strobe at the strb_i interface is considered valid.
realign logic If 1, the realigner is actively used to generate strobed HWPE-Streams. If 0, it is bypassed.
first logic Strobe at 1 for the first packet in a line.
last logic Strobe at 1 for the last packet in a line.
last_packet logic Strobe at 1 for the last packet of the transfer.
line_length logic[15:0] Length of a line in words, rounded by including also incomplete final words.
Table 35 hwpe_stream_source_realign output flags.
Name Type Description
decoupled_stall logic Do not use.

Control interface modules (HWPE-Periph)

The control interface of HWPEs exposes a HWPE-Periph interface that is used to program a memory-mapped register file. T Several IPs can be used to compose the control interface, delivering a standard accelerator control interface that is described below. Modules performing these functions can be found within the rtl/ subfolder of the hwpe-ctrl repository.