diff options
Diffstat (limited to 'qemu/docs/specs/rocker.txt')
-rw-r--r-- | qemu/docs/specs/rocker.txt | 1014 |
1 files changed, 0 insertions, 1014 deletions
diff --git a/qemu/docs/specs/rocker.txt b/qemu/docs/specs/rocker.txt deleted file mode 100644 index d2a82624f..000000000 --- a/qemu/docs/specs/rocker.txt +++ /dev/null @@ -1,1014 +0,0 @@ -Rocker Network Switch Register Programming Guide -Copyright (c) Scott Feldman <sfeldma@gmail.com> -Copyright (c) Neil Horman <nhorman@tuxdriver.com> -Version 0.11, 12/29/2014 - -LICENSE -======= - -This program is free software; you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation; either version 2 of the License, or -(at your option) any later version. - -This program is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. - -SECTION 1: Introduction -======================= - -Overview --------- - -This document describes the hardware/software interface for the Rocker switch -device. The intended audience is authors of OS drivers and device emulation -software. - -Notations and Conventions -------------------------- - -o In register descriptions, [n:m] indicates a range from bit n to bit m, -inclusive. -o Use of leading 0x indicates a hexadecimal number. -o Use of leading 0b indicates a binary number. -o The use of RSVD or Reserved indicates that a bit or field is reserved for -future use. -o Field width is in bytes, unless otherwise noted. -o Register are (R) read-only, (R/W) read/write, (W) write-only, or (COR) clear -on read -o TLV values in network-byte-order are designated with (N). - - -SECTION 2: PCI Configuration Registers -====================================== - -PCI Configuration Space ------------------------ - -Each switch instance registers as a PCI device with PCI configuration space: - - offset width description value - --------------------------------------------- - 0x0 2 Vendor ID 0x1b36 - 0x2 2 Device ID 0x0006 - 0x4 4 Command/Status - 0x8 1 Revision ID 0x01 - 0x9 3 Class code 0x2800 - 0xC 1 Cache line size - 0xD 1 Latency timer - 0xE 1 Header type - 0xF 1 Built-in self test - 0x10 4 Base address low - 0x14 4 Base address high - 0x18-28 Reserved - 0x2C 2 Subsystem vendor ID * - 0x2E 2 Subsystem ID * - 0x30-38 Reserved - 0x3C 1 Interrupt line - 0x3D 1 Interrupt pin 0x00 - 0x3E 1 Min grant 0x00 - 0x3D 1 Max latency 0x00 - 0x40 1 TRDY timeout - 0x41 1 Retry count - 0x42 2 Reserved - - -* Assigned by sub-system implementation - -SECTION 3: Memory-Mapped Register Space -======================================= - -There are two memory-mapped BARs. BAR0 maps device register space and is -0x2000 in size. BAR1 maps MSI-X vector and PBA tables and is also 0x2000 in -size, allowing for 256 MSI-X vectors. - -All registers are 4 or 8 bytes long. It is assumed host software will access 4 -byte registers with one 4-byte access, and 8 byte registers with either two -4-byte accesses or a single 8-byte access. In the case of two 4-byte accesses, -access must be lower and then upper 4-bytes, in that order. - -BAR0 device register space is organized as follows: - - offset description - ------------------------------------------------------ - 0x0000-0x000f Bogus registers to catch misbehaving - drivers. Writes do nothing. Reads - back as 0xDEADBABE. - 0x0010-0x00ff Test registers - 0x0300-0x03ff General purpose registers - 0x1000-0x1fff Descriptor control - -Holes in register space are reserved. Writes to reserved registers do nothing. -Reads to reserved registers read back as 0. - -No fancy stuff like write-combining is enabled on any of the registers. - -BAR1 MSI-X register space is organized as follows: - - offset description - ------------------------------------------------------ - 0x0000-0x0fff MSI-X vector table (256 vectors total) - 0x1000-0x1fff MSI-X PBA table - - -SECTION 4: Interrupts, DMA, and Endianness -========================================== - -PCI Interrupts --------------- - -The device supports only MSI-X interrupts. BAR1 memory-mapped region contains -the MSI-X vector and PBA tables, with support for up to 256 MSI-X vectors. - -The vector assignment is: - - vector description - ----------------------------------------------------- - 0 Command descriptor ring completion - 1 Event descriptor ring completion - 2 Test operation completion - 3 RSVD - 4-255 Tx and Rx descriptor ring completion - Tx vector is even - Rx vector is odd - -A MSI-X vector table entry is 16 bytes: - - field offset width description - ------------------------------------------------------------- - lower_addr 0x0 4 [31:2] message address[31:2] - [1:0] Rsvd (4 byte alignment - required) - upper_addr 0x4 4 [31:19] Rsvd - [14:0] message address[46:32] - data 0x8 4 message data[31:0] - control 0xc 4 [31:1] Rsvd - [0] mask (0 = enable, - 1 = masked) - -Software should install the Interrupt Service Routine (ISR) before any ports -are enabled or any commands are issued on the command ring. - -DMA Operations --------------- - -DMA operations are used for packet DMA to/from the CPU, command and event -processing. Command processing includes statistical counters and table dumps, -table insertion/deletion, and more. Event processing provides an async -notification method for device-originating events. Each DMA operation has a -set of control registers to manage a descriptor ring. The descriptor rings are -allocated from contiguous host DMA-able memory and registers specify the rings -base address, size and current head and tail indices. Software always writes -the head, and hardware always writes the tail. - -The higher-order bit of DMA_DESC_COMP_ERR is used to mark hardware completion -of a descriptor. Software will clear this bit when posting a descriptor to the -ring, and hardware will set this bit when the descriptor is complete. - -Descriptor ring sizes must be a power of 2 and range from 2 to 64K entries. -Descriptor rings' base address must be 8-byte aligned. Descriptors must be -packed within ring. Each descriptor in each ring must also be aligned on an 8 -byte boundary. Each descriptor ring will have these registers: - - DMA_DESC_xxx_BASE_ADDR, offset 0x1000 + (x * 32), 64-bit, (R/W) - DMA_DESC_xxx_SIZE, offset 0x1008 + (x * 32), 32-bit, (R/W) - DMA_DESC_xxx_HEAD, offset 0x100c + (x * 32), 32-bit, (R/W) - DMA_DESC_xxx_TAIL, offset 0x1010 + (x * 32), 32-bit, (R) - DMA_DESC_xxx_CTRL, offset 0x1014 + (x * 32), 32-bit, (W) - DMA_DESC_xxx_CREDITS, offset 0x1018 + (x * 32), 32-bit, (R/W) - DMA_DESC_xxx_RSVD1, offset 0x101c + (x * 32), 32-bit, (R/W) - -Where x is descriptor ring index: - - index ring - -------------------- - 0 CMD - 1 EVENT - 2 TX (port 0) - 3 RX (port 0) - 4 TX (port 1) - 5 RX (port 1) - . - . - . - 124 TX (port 61) - 125 RX (port 61) - 126 Resv - 127 Resv - -Writing BASE_ADDR or SIZE will reset HEAD and TAIL to zero. HEAD cannot be -written past TAIL. To do so would wrap the ring. An empty ring is when HEAD -== TAIL. A full ring is when HEAD is one position behind TAIL. Both HEAD and -TAIL increment and modulo wrap at the ring size. - -CTRL register bits: - - bit name description - ------------------------------------------------------------------------ - [0] CTRL_RESET Reset the descriptor ring - [1:31] Reserved - -All descriptor types share some common fields: - - field width description - ------------------------------------------------------------------- - DMA_DESC_BUF_ADDR 8 Phys addr of desc payload, 8-byte - aligned - DMA_DESC_COOKIE 8 Desc cookie for completion matching, - upper-most bit is reserved - DMA_DESC_BUF_SIZE 2 Desc payload size in bytes - DMA_DESC_TLV_SIZE 2 Desc payload total size in bytes - used for TLVs. Must be <= - DMA_DESC_BUF_SIZE. - DMA_DESC_COMP_ERR 2 Completion status of associated - desc payload. High order bit is - clear on new descs, toggled by - hw for completed items. - -To support forward- and backward-compatibility, descriptor and completion -payloads are specified in TLV format. Fields are packed with Type=field name, -Length=field length, and Value=field value. Software will ignore unknown fields -filled in by the switch. Likewise, the switch will ignore unknown fields -filled in by software. - -Descriptor payload buffer is 8-byte aligned and TLVs are 8-byte aligned. The -value within a TLV is also 8-byte aligned. The (packed, 8 byte) TLV header is: - - field width description - ----------------------------- - type 4 TLV type - len 2 TLV value length - pad 2 Reserved - -The alignment requirements for descriptors and TLVs are to avoid unaligned -access exceptions in software. Note that the payload for each TLV is also -8 byte aligned. - -Figure 1 shows an example descriptor buffer with two TLVs. - - <------- 8 bytes -------> - - 8-byte +––––+ +–––––––––––+–––––+–––––+ +–+ - align | type | len | pad | TLV#1 hdr | - +–––––––––––+–––––+–––––+ (len=22) | - | | | - | value | TVL#1 value | - | | (padded to 8-byte | - | +–––––+ alignment) | - | |/////| | - 8-byte +––––+ +–––––––––––+–––––––––––+ | - align | type | len | pad | TLV#2 hdr DESC_BUF_SIZE - +–––––+–––––+–––––+–––––+ (len=2) | - |value|/////////////////| TLV#2 value | - +–––––+/////////////////| | - |///////////////////////| | - |///////////////////////| | - |///////////////////////| | - |////////unused/////////| | - |////////space//////////| | - |///////////////////////| | - |///////////////////////| | - |///////////////////////| | - +–––––––––––––––––––––––+ +–+ - - fig. 1 - -TLVs can be nested within the NEST TLV type. - -Interrupt credits -^^^^^^^^^^^^^^^^^ - -MSI-X vectors used for descriptor ring completions use a credit mechanism for -efficient device, PCIe bus, OS and driver operations. Each descriptor ring has -a credit count which represents the number of outstanding descriptors to be -processed by the driver. As the device marks descriptors complete, the credit -count is incremented. As the driver processes those outstanding descriptors, -it returns credits back to the device. This way, the device knows the driver's -progress and can make decisions about when to fire the next interrupt or not. -When the credit count is zero, and the first descriptors are posted for the -driver, a single interrupt is fired. Once the interrupt is fired, the -interrupt is disabled (auto-masked*). In response to the interrupt, the driver -will process descriptors and PIO write a returned credit value for that -descriptor ring. If the driver returns all credits (the driver caught up with -the device and there is no outstanding work), then the interrupt is unmasked, -but not fired. If only partial credits are returned, the interrupt remains -masked but the device generates an interrupt, signaling the driver that more -outstanding work is available. - -(* this masking is unrelated to the MSI-X interrupt mask register) - -Endianness ----------- - -Device registers are hard-coded to little-endian (LE). The driver should -convert to/from host endianess to LE for device register accesses. - -Descriptors are LE. Descriptor buffer TLVs will have LE type and length -fields, but the value field can either be LE or network-byte-order, depending -on context. TLV values containing network packet data will be in network-byte -order. A TLV value containing a field or mask used to compare against network -packet data is network-byte order. For example, flow match fields (and masks) -are network-byte-order since they're matched directly, byte-by-byte, against -network packet data. All non-network-packet TLV multi-byte values will be LE. - -TLV values in network-byte-order are designated with (N). - - -SECTION 5: Test Registers -========================= - -Rocker has several test registers to support troubleshooting register access, -interrupt generation, and DMA operations: - - TEST_REG, offset 0x0010, 32-bit (R/W) - TEST_REG64, offset 0x0018, 64-bit (R/W) - TEST_IRQ, offset 0x0020, 32-bit (R/W) - TEST_DMA_ADDR, offset 0x0028, 64-bit (R/W) - TEST_DMA_SIZE, offset 0x0030, 32-bit (R/W) - TEST_DMA_CTRL, offset 0x0034, 32-bit (R/W) - -Reads to TEST_REG and TEST_REG64 will read a value equal to twice the last -value written to the register. The 32-bit and 64-bit versions are for testing -32-bit and 64-bit host accesses. - -A vector can be written to TEST_IRQ and the device will generate an interrupt -for that vector. - -To test basic DMA operations, allocate a DMA-able host buffer and put the -buffer address into TEST_DMA_ADDR and size into TEST_DMA_SIZE. Then, write to -TEST_DMA_CTRL to manipulate the buffer contents. TEST_DMA_CTRL operations are: - - operation value description - ----------------------------------------------------------- - TEST_DMA_CTRL_CLEAR 1 clear buffer - TEST_DMA_CTRL_FILL 2 fill buffer bytes with 0x96 - TEST_DMA_CTRL_INVERT 4 invert bytes in buffer - -Various buffer address and sizes should be tested to verify no address boundary -issue exists. In particular, buffers that start on odd-8-byte boundary and/or -span multiple PAGE sizes should be tested. - - -SECTION 6: Ports -================ - -Physical and Logical Ports ------------------------------------- - -The switch supports up to 62 physical (front-panel) ports. Register -PORT_PHYS_COUNT returns the actual number of physical ports available: - - PORT_PHYS_COUNT, offset 0x0304, 32-bit, (R) - -In addition to front-panel ports, the switch supports logical ports for -tunnels. - -Front-panel ports and logical tunnel ports are mapped into a single 32-bit port -space. A special CPU port is assigned port 0. The front-panel ports are -mapped to ports 1-62. A special loopback port is assigned port 63. Logical -tunnel ports are assigned ports 0x0001000-0x0001ffff. -To summarize the port assignments: - - port mapping - ------------------------------------------------------- - 0 CPU port (for packets to/from host CPU) - 1-62 front-panel physical ports - 63 loopback port - 64-0x0000ffff RSVD - 0x00010000-0x0001ffff logical tunnel ports - 0x00020000-0xffffffff RSVD - -Physical Port Mode ------------------- - -Switch front-panel ports operate in a mode. Currently, the only mode is -OF-DPA. OF-DPA[1] mode is based on OpenFlow Data Plane Abstraction (OF-DPA) -Abstract Switch Specification, Version 1.0, from Broadcom Corporation. To -set/get the mode for front-panel ports, see port settings, below. - -Port Settings -------------- - -Link status for all front-panel ports is available via PORT_PHYS_LINK_STATUS: - - PORT_PHYS_LINK_STATUS, offset 0x0310, 64-bit, (R) - - Value is port bitmap. Bits 0 and 63 always read 0. Bits 1-62 - read 1 for link UP and 0 for link DOWN for respective front-panel ports. - -Other properties for front-panel ports are available via DMA CMD descriptors: - - Get PORT_SETTINGS descriptor: - - field width description - ---------------------------------------------- - PORT_SETTINGS 2 CMD_GET - PPORT 4 Physical port # - - Get PORT_SETTINGS completion: - - field width description - ---------------------------------------------- - PPORT 4 Physical port # - SPEED 4 Current port interface speed, in Mbps - DUPLEX 1 1 = Full, 0 = Half - AUTONEG 1 1 = enabled, 0 = disabled - MACADDR 6 Port MAC address - MODE 1 0 = OF-DPA - LEARNING 1 MAC address learning on port - 1 = enabled - 0 = disabled - PHYS_NAME <var> Physical port name (string) - - Set PORT_SETTINGS descriptor: - - field width description - ---------------------------------------------- - PORT_SETTINGS 2 CMD_SET - PPORT 4 Physical port # - SPEED 4 Port interface speed, in Mbps - DUPLEX 1 1 = Full, 0 = Half - AUTONEG 1 1 = enabled, 0 = disabled - MACADDR 6 Port MAC address - MODE 1 0 = OF-DPA - -Port Enable ------------ - -Front-panel ports are initially disabled, which means port ingress and egress -packets will be dropped. To enable or disable a port, use PORT_PHYS_ENABLE: - - PORT_PHYS_ENABLE: offset 0x0318, 64-bit, (R/W) - - Value is bitmap of first 64 ports. Bits 0 and 63 are ignored - and always read as 0. Write 1 to enable port; write 0 to disable it. - Default is 0. - - -SECTION 7: Switch Control -========================= - -This section covers switch-wide register settings. - -Control -------- - -This register is used for low level control of the switch. - - CONTROL: offset 0x0300, 32-bit, (W) - - bit name description - ------------------------------------------------------------------------ - [0] CONTROL_RESET If set, device will perform reset - [1:31] Reserved - -Switch ID ---------- - -The switch has a SWITCH_ID to be used by software to uniquely identify the -switch: - - SWITCH_ID: offset 0x0320, 64-bit, (R) - - Value is opaque to switch software and no special encoding is implied. - - -SECTION 8: Events -================= - -Non-I/O asynchronous events from the device are notified to the host using the -event ring. The TLV structure for events is: - - field width description - --------------------------------------------------- - TYPE 4 Event type, one of: - 1: LINK_CHANGED - 2: MAC_VLAN_SEEN - INFO <nest> Event info (details below) - -Link Changed Event ------------------- - -When link status changes on a physical port, this event is generated. - - field width description - --------------------------------------------------- - INFO <nest> - PPORT 4 Physical port - LINKUP 1 Link status: - 0: down - 1: up - -MAC VLAN Seen Event -------------------- - -When a packet ingresses on a port and the source MAC/VLAN isn't known to the -device, the device will generate this event. In response to the event, the -driver should install to the device the MAC/VLAN on the port into the bridge -table. Once installed, the MAC/VLAN is known on the port and this event will -no longer be generated. - - field width description - --------------------------------------------------- - INFO <nest> - PPORT 4 Physical port - MAC 6 MAC address - VLAN 2 VLAN ID - - -SECTION 9: CPU Packet Processing -================================ - -Ingress packets directed to the host CPU for further processing are delivered -in the DMA RX ring. Likewise, host CPU originating packets destined to egress -on switch ports are scheduled by software using the DMA TX ring. - -Tx Packet Processing --------------------- - -Software schedules packets for egress on switch ports using the DMA TX ring. A -TX descriptor buffer describes the packet location and size in host DMA-able -memory, the destination port, and any hardware-offload functions (such as L3 -payload checksum offload). Software then bumps the descriptor head to signal -hardware of new Tx work. In response, hardware will DMA read Tx descriptors up -to head, DMA read descriptor buffer and packet data, perform offloading -functions, and finally frame packet on wire (network). Once packet processing -is complete, hardware will writeback status to descriptor(s) to signal to -software that Tx is complete and software resources (e.g. skb) backing packet -can be released. - -Figure 2 shows an example 3-fragment packet queued with one Tx descriptor. A -TLV is used for each packet fragment. - - pkt frag 1 - +–––––––+ +–+ - +–––+ | | - desc buf | | | | - +––––––––+ | | | | - Tx ring +–––+ +–––––+ | | | - +–––––––––+ | | TLVs | +–––––––+ | - | +–––+ +––––––––+ pkt frag 2 | - | desc 0 | | +–––––+ +–––––––+ | - +–––––––––+ | TLVs | +–––+ | | - head+–+ | +––––––––+ | | | - | desc 1 | | +–––––+ +–––––––+ |pkt - +–––––––––+ | TLVs | | | - | | +––––––––+ | pkt frag 3 | - | | | +–––––––+ | - +–––––––––+ +–––+ | | - | | | | | - | | | | | - +–––––––––+ | | | - | | | | | - | | | | | - +–––––––––+ | | | - | | +–––––––+ +–+ - | | - +–––––––––+ - - fig 2. - -The TLVs for Tx descriptor buffer are: - - field width description - --------------------------------------------------------------------- - PPORT 4 Destination physical port # - TX_OFFLOAD 1 Hardware offload modes: - 0: no offload - 1: insert IP csum (ipv4 only) - 2: insert TCP/UDP csum - 3: L3 csum calc and insert - into csum offset (TX_L3_CSUM_OFF) - 16-bit 1's complement csum value. - IPv4 pseudo-header and IP - already calculated by OS - and inserted. - 4: TSO (TCP Segmentation Offload) - TX_L3_CSUM_OFF 2 For L3 csum offload mode, the offset, - from the beginning of the packet, - of the csum field in the L3 header - TX_TSO_MSS 2 For TSO offload mode, the - Maximum Segment Size in bytes - TX_TSO_HDR_LEN 2 For TSO offload mode, the - length of ethernet, IP, and - TCP/UDP headers, including IP - and TCP options. - TX_FRAGS <array> Packet fragments - TX_FRAG <nest> Packet fragment - TX_FRAG_ADDR 8 DMA address of packet fragment - TX_FRAG_LEN 2 Packet fragment length - -Possible status return codes in descriptor on completion are: - - DESC_COMP_ERR reason - -------------------------------------------------------------------- - 0 OK - -ROCKER_ENXIO address or data read err on desc buf or packet - fragment - -ROCKER_EINVAL bad pport or TSO or csum offloading error - -ROCKER_ENOMEM no memory for internal staging tx fragment - -Rx Packet Processing --------------------- - -For packets ingressing on switch ports that are not forwarded by the switch but -rather directed to the host CPU for further processing are delivered in the DMA -RX ring. Rx descriptor buffers are allocated by software and placed on the -ring. Hardware will fill Rx descriptor buffers with packet data, write the -completion, and signal to software that a new packet is ready. Since Rx packet -size is not known a-priori, the Rx descriptor buffer must be allocated for -worst-case packet size. A single Rx descriptor will contain the entire Rx -packet data in one RX_FRAG. Other Rx TLVs describe and hardware offloads -performed on the packet, such as checksum validation. - -The TLVs for Rx descriptor buffer are: - - field width description - --------------------------------------------------- - PPORT 4 Source physical port # - RX_FLAGS 2 Packet parsing flags: - (1 << 0): IPv4 packet - (1 << 1): IPv6 packet - (1 << 2): csum calculated - (1 << 3): IPv4 csum good - (1 << 4): IP fragment - (1 << 5): TCP packet - (1 << 6): UDP packet - (1 << 7): TCP/UDP csum good - (1 << 8): Offload forward - RX_CSUM 2 IP calculated checksum: - IPv4: IP payload csum - IPv6: header and payload csum - (Only valid is RX_FLAGS:csum calc is set) - RX_FRAG_ADDR 8 DMA address of packet fragment - RX_FRAG_MAX_LEN 2 Packet maximum fragment length - RX_FRAG_LEN 2 Actual packet fragment length after receive - -Offload forward RX_FLAG indicates the device has already forwarded the packet -so the host CPU should not also forward the packet. - -Possible status return codes in descriptor on completion are: - - DESC_COMP_ERR reason - -------------------------------------------------------------------- - 0 OK - -ROCKER_ENXIO address or data read err on desc buf - -ROCKER_ENOMEM no memory for internal staging desc buf - -ROCKER_EMSGSIZE Rx descriptor buffer wasn't big enough to contain - packet data TLV and other TLVs. - - -SECTION 10: OF-DPA Mode -====================== - -OF-DPA mode allows the switch to offload flow packet processing functions to -hardware. An OpenFlow controller would communicate with an OpenFlow agent -installed on the switch. The OpenFlow agent would (directly or indirectly) -communicate with the Rocker switch driver, which in turn would program switch -hardware with flow functionality, as defined in OF-DPA. The block diagram is: - - +–––––––––––––––----–––+ - | OF | - | Remote Controller | - +––––––––+––----–––––––+ - | - | - +––––––––+–––––––––+ - | OF | - | Local Agent | - +––––––––––––––––––+ - | | - | Rocker Driver | - +––––––––––––––––––+ - <this spec> - +––––––––––––––––––+ - | | - | Rocker Switch | - +––––––––––––––––––+ - -To participate in flow functions, ports must be configure for OF-DPA mode -during switch initialization. - -OF-DPA Flow Table Interface ---------------------------- - -There are commands to add, modify, delete, and get stats of flow table entries. -The commands are issued using the DMA CMD descriptor ring. The following -commands are defined: - - CMD_ADD: add an entry to flow table - CMD_MOD: modify an entry in flow table - CMD_DEL: delete an entry from flow table - CMD_GET_STATS: get stats for flow entry - -TLVs for add and modify commands are: - - field width description - ---------------------------------------------------- - OF_DPA_CMD 2 CMD_[ADD|MOD] - OF_DPA_TBL 2 Flow table ID - 0: ingress port - 10: vlan - 20: termination mac - 30: unicast routing - 40: multicast routing - 50: bridging - 60: ACL policy - OF_DPA_PRIORITY 4 Flow priority - OF_DPA_HARDTIME 4 Hard timeout for flow - OF_DPA_IDLETIME 4 Idle timeout for flow - OF_DPA_COOKIE 8 Cookie - -Additional TLVs based on flow table ID: - -Table ID 0: ingress port - - field width description - ---------------------------------------------------- - OF_DPA_IN_PPORT 4 ingress physical port number - OF_DPA_GOTO_TBL 2 goto table ID; zero to drop - -Table ID 10: vlan - - field width description - ---------------------------------------------------- - OF_DPA_IN_PPORT 4 ingress physical port number - OF_DPA_VLAN_ID 2 (N) vlan ID - OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask - OF_DPA_GOTO_TBL 2 goto table ID; zero to drop - OF_DPA_NEW_VLAN_ID 2 (N) new vlan ID - -Table ID 20: termination mac - - field width description - ---------------------------------------------------- - OF_DPA_IN_PPORT 4 ingress physical port number - OF_DPA_IN_PPORT_MASK 4 ingress physical port number mask - OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd - OF_DPA_DST_MAC 6 (N) destination MAC - OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask - OF_DPA_VLAN_ID 2 (N) vlan ID - OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask - OF_DPA_GOTO_TBL 2 only acceptable values are - unicast or multicast routing - table IDs - OF_DPA_OUT_PPORT 2 if specified, must be - controller, set zero otherwise - -Table ID 30: unicast routing - - field width description - ---------------------------------------------------- - OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd - OF_DPA_DST_IP 4 (N) destination IPv4 address. - Must be unicast address - OF_DPA_DST_IP_MASK 4 (N) IP mask. Must be prefix mask - OF_DPA_DST_IPV6 16 (N) destination IPv6 address. - Must be unicast address - OF_DPA_DST_IPV6_MASK 16 (N) IPv6 mask. Must be prefix mask - OF_DPA_GOTO_TBL 2 goto table ID; zero to drop - OF_DPA_GROUP_ID 4 data for GROUP action must - be an L3 Unicast group entry - -Table ID 40: multicast routing - - field width description - ---------------------------------------------------- - OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd - OF_DPA_VLAN_ID 2 (N) vlan ID - OF_DPA_SRC_IP 4 (N) source IPv4. Optional, - can contain IPv4 address, - must be completely masked - if not used - OF_DPA_SRC_IP_MASK 4 (N) IP Mask - OF_DPA_DST_IP 4 (N) destination IPv4 address. - Must be multicast address - OF_DPA_SRC_IPV6 16 (N) source IPv6 Address. Optional. - Can contain IPv6 address, - must be completely masked - if not used - OF_DPA_SRC_IPV6_MASK 16 (N) IPv6 mask. - OF_DPA_DST_IPV6 16 (N) destination IPv6 Address. Must - be multicast address - Must be multicast address - OF_DPA_GOTO_TBL 2 goto table ID; zero to drop - OF_DPA_GROUP_ID 4 data for GROUP action must - be an L3 multicast group entry - -Table ID 50: bridging - - field width description - ---------------------------------------------------- - OF_DPA_VLAN_ID 2 (N) vlan ID - OF_DPA_TUNNEL_ID 4 tunnel ID - OF_DPA_DST_MAC 6 (N) destination MAC - OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask - OF_DPA_GOTO_TBL 2 goto table ID; zero to drop - OF_DPA_GROUP_ID 4 data for GROUP action must - be a L2 Interface, L2 - Multicast, L2 Flood, - or L2 Overlay group entry - as appropriate - OF_DPA_TUNNEL_LPORT 4 unicast Tenant Bridging - flows specify a tunnel - logical port ID - OF_DPA_OUT_PPORT 2 data for OUTPUT action, - restricted to CONTROLLER, - set to 0 otherwise - -Table ID 60: acl policy - - field width description - ---------------------------------------------------- - OF_DPA_IN_PPORT 4 ingress physical port number - OF_DPA_IN_PPORT_MASK 4 ingress physical port number mask - OF_DPA_ETHERTYPE 2 (N) ethertype - OF_DPA_VLAN_ID 2 (N) vlan ID - OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask - OF_DPA_VLAN_PCP 2 (N) vlan Priority Code Point - OF_DPA_VLAN_PCP_MASK 2 (N) vlan Priority Code Point mask - OF_DPA_SRC_MAC 6 (N) source MAC - OF_DPA_SRC_MAC_MASK 6 (N) source MAC mask - OF_DPA_DST_MAC 6 (N) destination MAC - OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask - OF_DPA_TUNNEL_ID 4 tunnel ID - OF_DPA_SRC_IP 4 (N) source IPv4. Optional, - can contain IPv4 address, - must be completely masked - if not used - OF_DPA_SRC_IP_MASK 4 (N) IP Mask - OF_DPA_DST_IP 4 (N) destination IPv4 address. - Must be multicast address - OF_DPA_DST_IP_MASK 4 (N) IP Mask - OF_DPA_SRC_IPV6 16 (N) source IPv6 Address. Optional. - Can contain IPv6 address, - must be completely masked - if not used - OF_DPA_SRC_IPV6_MASK 16 (N) IPv6 mask - OF_DPA_DST_IPV6 16 (N) destination IPv6 Address. Must - be multicast address. - OF_DPA_DST_IPV6_MASK 16 (N) IPv6 mask - OF_DPA_SRC_ARP_IP 4 (N) source IPv4 address in the ARP - payload. Only used if ethertype - == 0x0806. - OF_DPA_SRC_ARP_IP_MASK 4 (N) IP Mask - OF_DPA_IP_PROTO 1 IP protocol - OF_DPA_IP_PROTO_MASK 1 IP protocol mask - OF_DPA_IP_DSCP 1 DSCP - OF_DPA_IP_DSCP_MASK 1 DSCP mask - OF_DPA_IP_ECN 1 ECN - OF_DPA_IP_ECN_MASK 1 ECN mask - OF_DPA_L4_SRC_PORT 2 (N) L4 source port, only for - TCP, UDP, or SCTP - OF_DPA_L4_SRC_PORT_MASK 2 (N) L4 source port mask - OF_DPA_L4_DST_PORT 2 (N) L4 source port, only for - TCP, UDP, or SCTP - OF_DPA_L4_DST_PORT_MASK 2 (N) L4 source port mask - OF_DPA_ICMP_TYPE 1 ICMP type, only if IP - protocol is 1 - OF_DPA_ICMP_TYPE_MASK 1 ICMP type mask - OF_DPA_ICMP_CODE 1 ICMP code - OF_DPA_ICMP_CODE_MASK 1 ICMP code mask - OF_DPA_IPV6_LABEL 4 (N) IPv6 flow label - OF_DPA_IPV6_LABEL_MASK 4 (N) IPv6 flow label mask - OF_DPA_GROUP_ID 4 data for GROUP action - OF_DPA_QUEUE_ID_ACTION 1 write the queue ID - OF_DPA_NEW_QUEUE_ID 1 queue ID - OF_DPA_VLAN_PCP_ACTION 1 write the VLAN priority - OF_DPA_NEW_VLAN_PCP 1 VLAN priority - OF_DPA_IP_DSCP_ACTION 1 write the DSCP - OF_DPA_NEW_IP_DSCP 1 new DSCP - OF_DPA_TUNNEL_LPORT 4 restrct to valid tunnel - logical port, set to 0 - otherwise. - OF_DPA_OUT_PPORT 2 data for OUTPUT action, - restricted to CONTROLLER, - set to 0 otherwise - OF_DPA_CLEAR_ACTIONS 4 if 1 packets matching flow are - dropped (all other instructions - ignored) - -TLVs for flow delete and get stats command are: - - field width description - --------------------------------------------------- - OF_DPA_CMD 2 CMD_[DEL|GET_STATS] - OF_DPA_COOKIE 8 Cookie - -On completion of get stats command, the descriptor buffer is written back with -the following TLVs: - - field width description - --------------------------------------------------- - OF_DPA_STAT_DURATION 4 Flow duration - OF_DPA_STAT_RX_PKTS 8 Received packets - OF_DPA_STAT_TX_PKTS 8 Transmit packets - -Possible status return codes in descriptor on completion are: - - DESC_COMP_ERR command reason - -------------------------------------------------------------------- - 0 all OK - -ROCKER_EFAULT all head or tail index outside - of ring - -ROCKER_ENXIO all address or data read err on - desc buf - -ROCKER_EMSGSIZE GET_STATS cmd descriptor buffer wasn't - big enough to contain write-back - TLVs - -ROCKER_EINVAL all invalid parameters passed in - -ROCKER_EEXIST ADD entry already exists - -ROCKER_ENOSPC ADD no space left in flow table - -ROCKER_ENOENT MOD|DEL|GET_STATS cookie invalid - -Group Table Interface ---------------------- - -There are commands to add, modify, delete, and get stats of group table -entries. The commands are issued using the DMA CMD descriptor ring. The -following commands are defined: - - CMD_ADD: add an entry to group table - CMD_MOD: modify an entry in group table - CMD_DEL: delete an entry from group table - CMD_GET_STATS: get stats for group entry - -TLVs for add and modify commands are: - - field width description - ----------------------------------------------------------- - FLOW_GROUP_CMD 2 CMD_[ADD|MOD] - FLOW_GROUP_ID 2 Flow group ID - FLOW_GROUP_TYPE 1 Group type: - 0: L2 interface - 1: L2 rewrite - 2: L3 unicast - 3: L2 multicast - 4: L2 flood - 5: L3 interface - 6: L3 multicast - 7: L3 ECMP - 8: L2 overlay - FLOW_VLAN_ID 2 Vlan ID (types 0, 3, 4, 6) - FLOW_L2_PORT 2 Port (types 0) - FLOW_INDEX 4 Index (all types but 0) - FLOW_OVERLAY_TYPE 1 Overlay sub-type (type 8): - 0: Flood unicast tunnel - 1: Flood multicast tunnel - 2: Multicast unicast tunnel - 3: Multicast multicast tunnel - FLOW_GROUP_ACTION nest - FLOW_GROUP_ID 2 next group ID in chain (all - types except 0) - FLOW_OUT_PORT 4 egress port (types 0, 8) - FLOW_POP_VLAN_TAG 1 strip outer VLAN tag (type 1 - only) - FLOW_VLAN_ID 2 (types 1, 5) - FLOW_SRC_MAC 6 (types 1, 2, 5) - FLOW_DST_MAC 6 (types 1, 2) - -TLVs for flow delete and get stats command are: - - field width description - ----------------------------------------------------------- - FLOW_GROUP_CMD 2 CMD_[DEL|GET_STATS] - FLOW_GROUP_ID 2 Flow group ID - -On completion of get stats command, the descriptor buffer is written back with -the following TLVs: - - field width description - --------------------------------------------------- - FLOW_GROUP_ID 2 Flow group ID - FLOW_STAT_DURATION 4 Flow duration - FLOW_STAT_REF_COUNT 4 Flow reference count - FLOW_STAT_BUCKET_COUNT 4 Flow bucket count - -Possible status return codes in descriptor on completion are: - - DESC_COMP_ERR command reason - -------------------------------------------------------------------- - 0 all OK - -ROCKER_EFAULT all head or tail index outside - of ring - -ROCKER_ENXIO all address or data read err on - desc buf - -ROCKER_ENOSPC GET_STATS cmd descriptor buffer wasn't - big enough to contain write-back - TLVs - -ROCKER_EINVAL ADD|MOD invalid parameters passed in - -ROCKER_EEXIST ADD entry already exists - -ROCKER_ENOSPC ADD no space left in flow table - -ROCKER_ENOENT MOD|DEL|GET_STATS group ID invalid - -ROCKER_EBUSY DEL group reference count non-zero - -ROCKER_ENODEV ADD next group ID doesn't exist - - - -References -========== - -[1] OpenFlow Data Plane Abstraction (OF-DPA) Abstract Switch Specification, -Version 1.0, from Broadcom Corporation, February 21, 2014. |