25
Our ‘recv1000.c’ driver Implementing a ‘packet- receive’ capability with the Intel 82573L network interface controller

Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

  • View
    229

  • Download
    7

Embed Size (px)

Citation preview

Page 1: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Our ‘recv1000.c’ driver

Implementing a ‘packet-receive’ capability with the Intel 82573L

network interface controller

Page 2: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Similarities

• There exist quite a few similarities between implementing the ‘transmit-capability’ and the ‘receive-capability’ in a device-driver for Intel’s 82573L ethernet controller:– Identical device-discovery and ioremap steps– Same steps for ‘global reset’ of the hardware– Comparable data-structure initializations– Parallel setups for the TX and RX registers

• But there also are a few fundamental differences (such as ‘active’ versus ‘passive’ roles for driver)

Page 3: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

‘push’ versus ‘pull’

Host memory

transmitpacketbuffer

transmit-FIFO push

Ethernet controller

receive-FIFOreceivepacketbuffer

pull

to/from

LAN

The ‘write()’ routine in our ‘xmit1000.c’ driver could transfer data at any time, but the ‘read()’ routine in our ‘recv1000.c’ driver has to wait for data to arrive.

So to avoid doing any wasteful busy-waiting, our ‘recv1000.c’ driver can use the Linux kernel’s sleep/wakeup mechanism – if it enables NIC’s interrupts!

Page 4: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Sleep/wakeup

• We will need to employ a wait-queue, we will need to enable device-interrupts, and we will need to write and install the code for an interrupt service routine (ISR)

• So our ‘recv1000.c’ driver will have a few additional code and data components that were absent in our ‘xmit1000.c’ driver

Page 5: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

This function will program the actual data-transfer

Driver’s components

read

my_fops my_read()

module_init() module_exit()

This function will allow us to inspect the receive-descriptors

This function will detect and configure the hardware, define page-mappings, allocate and initialize the descriptors, install our ISR and enable interrupts, start the ‘receive’ engine, create the pseudo-file and register ‘my_fops’

This function will do needed ‘cleanup’ when it’s time to unload our driver – turn off the ‘receive’ engine, disable interrupts and remove our ISR, free memory, delete page-table entries, the pseudo-file, and the ‘my_fops’

‘struct’ holds one function-pointer

my_get_info()

This function will awaken any sleeping reader-task my_isr()wait_queue_head

Page 6: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

How NIC’s interrupts work

• There are four interrupt-related registers which are essential for us to understand

ICR0x00C0

0x00C8

0x00D0

0x00D8

ICS

IMS

IMC

Interrupt Cause Read

Interrupt Cause Set

Interrupt Mask Set/Read

Interrupt Mask Clear

Page 7: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Interrupt event-types

reservedreserved

82573L

31: INT_ASSERTED (1=yes,0=no)

31 30 18 17 16 15 14 10 9 8 7 6 5 4 2 1 0

17: ACK (Rx-ACK Frame detected)16: SRPD (Small Rx-Packet detected)15: TXD_LOW (Tx-Descr Low Thresh hit)

9: MDAC (MDI/O Access Completed) 7: RXT0 ( Receiver Timer expired) 6: RXO (Receiver Overrun) 4: RXDMT0 (Rx-Desc Min Thresh hit) 2: LSC (Link Status Change) 1: TXQE( Transmit Queue Empty) 0: TXDW (Transmit Descriptor Written Back)

Page 8: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Interrupt Mask Set/Read

• This register is used to enable a selection of the device’s interrupts which the driver will be prepared to recognize and handle

• A particular interrupt becomes ‘enabled’ if software writes a ‘1’ to the corresponding bit of this Interrupt Mask Set register

• Writing ‘0’ to any register-bit has no effect, so interrupts can be enabled one-at-a-time

Page 9: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Interrupt Mask Clear

• Your driver can discover which interrupts have been enabled by reading IMS – but your driver cannot ‘disable’ any interrupts by writing to that register

• Instead a specific interrupt can be disabled by writing a ‘1’ to the corresponding bit in the Interrupt Mask Clear register

• Writing ‘0’ to a register-bit has no effect on the interrupt controller’s Interrupt Mask

Page 10: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Interrupt Cause Read

• Whenever interrupts occur, your driver’s interrupt service routine can discover the specific conditions that triggered them if it reads the Interrupt Cause Read register

• In this case your driver can clear any selection of these bits (except bit #31) by writing ‘1’s to them (writing ‘0’s to this register will have no effect)

• If case no interrupt has occurred, reading this register may have the side-effect of clearing it

Page 11: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Interrupt Cause Set

• For testing your driver’s interrupt-handler, you can artificially trigger any particular combination of interrupts by writing ‘1’s into the corresponding register-bits of this Interrupt Cause Set register (assuming your combination of bits corresponds to interrupts that are ‘enabled’ by ‘1’s being present for them in the Interrupt Mask)

Page 12: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Our interrupt-handler

• We decided to enable all possible causes (and we ‘log’ them via ‘printk()’ messages we’ve omitted in the code-fragment here):

irqreturn_t my_isr( int irq, void *dev_id ) {

int intr_cause = ioread32( io + E1000_ICR );if ( intr_cause == 0 ) return IRQ_NONE;

wake_up_interruptible( &wq_rd );iowrite32( intr_cause, io + E1000_ICR );

return IRQ_HANDLED;}

Page 13: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

We ‘tweak’ our packet-format

• Our ‘xmit1000.c’ driver elected to have the NIC append ‘padding’ to any short packets

• But this prevents a receiver from knowing how many bytes represent actual data

• To solve this problem, we added our own ‘count’ field to each packet’s payload

actual bytes of user-data

0 6 12 14destination MAC-address source MAC-address Type/Len count

Page 14: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Our ‘read()’ methodssize_t my_read( struct file *file, char *buf, size_t len, loff_t *pos ){

static int rxhead = 0; // to remember where we left offunsigned char *from = phys_to_virt( rxdesc[ rxhead ].base_addr );unsigned int count;

// go to sleep if no new data-packets have been received yetif ( ioread32( io + E1000_RDH ) == rxhead )

if ( wait_event_interruptible( wq_rd, ioread32( io + E1000_RDH ) != rxhead ) ) return –EINTR;

// get the number of actual data-bytes in the new (possibly padded) data-packetcount = *(unsigned short*)(from + 14); // data-count as stored by ‘xmit1000.c’if ( count > len ) count = len; // can’t transfer more bytes than buffer can holdif ( copy_to_user( buf, from+16, count ) ) return –EFAULT;

// advance our static array-index variable to the next receive-descriptorrxhead = (1 + rxhead) % 8; // this index wraps-around after 8 descriptorsreturn count; // tell kernel how many bytes were transferred

}

Page 15: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Hardware’s initialization

• We allocate and initialize a minimum-size Receive Descriptor Queue (8 descriptors)

• We perform a ‘global reset’ via the RST-bit in the NIC’s Device Control register (with a side-effect of zeroing both RDH and RDT)

• We configure the ‘receive’ engine (RCTL) plus a few additional registers that affect the network-controller’s reception-options (namely: RXCSUM, RFCTL, PSRCTL)

Page 16: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Receive Control (0x0100)

R=0

0 0FLXBUFSE

CRCBSEX R

=0PMCF DPF R

=0CFI

CFIEN

VFE BSIZE

BAM

R=0

MO DTYP RDMTS

ILOS

SLU

LPE UPE 0 0 R=0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

SBPEN

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16

LBM MPE

EN = Receive Enable DTYP = Descriptor Type DPF = Discard Pause Frames SBP = Store Bad Packets MO = Multicast Offset PMCF = Pass MAC Control FramesUPE = Unicast Promiscuous Enable BAM = Broadcast Accept Mode BSEX = Buffer Size ExtensionMPE = Multicast Promiscuous Enable BSIZE = Receive Buffer Size SECRC = Strip Ethernet CRCLPE = Long Packet reception Enable VFE = VLAN Filter Enable FLXBUF = Flexible Buffer sizeLBM = Loopback Mode CFIEN = Canonical Form Indicator EnableRDMTS = Rx-Descriptor Minimum Threshold Size CFI = Cannonical Form Indicator bit-value

82573L

Our driver initially will program this register with the value 0x0400801C. Thenlater, when everything is ready, it will turn on bit #1 to ‘start the receive engine’

Page 17: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Packet-Split Rx Control (0x2170)

BSIZE3 (in KB)

BSIZE2 (in KB)

BSIZE1 (in KB)

BSIZE0 (in 1/8 KB)

0000 0 00

31 30 29 24 23 22 21 16 15 14 13 8 7 6 0

If the controller is configured to use the packet-split feature (RCTL.DTYP=1), then this register controls the sizes of the four receive-buffers, so there are certain requirements that nonzero values appear in several of these fields.

But our ‘recv1000.c’ driver will use the ‘legacy’ receive-descriptor format (i.e., RCRL.DTYP=0) and so this register will be disregarded by the NIC and therefore we are allowed to program it with the value 0x00000000.

Page 18: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Receive Filter Control (0x5008)

PHYRST

VME R=0

TFCE RFCE RST R=0

R=0

R=0

R=0

R=0

ADVD3

WUC

R=0

D/UDstatus

R=0

reserved

EXSTENIPFRSP

_DISACKD_DIS

ACKDIS

IPv6XSUM_DIS

IPv6_DIS NFS_VER NSFR

_DISNSFW_DIS

R=0

R=0

R=1

0 0 iSCSI_DIS

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

GIOMD

iSCSI_DWC

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16

Our driver writes 0x00000000 to this register, which among other effects will cause the ethernet controller NOT to write Extended Status information into our device-driver’s legacy-format Receive Descriptors (bit 15: EXTEN=0)

Page 19: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

RX Checksum Control (0x5000)

reservedpacket

checksumstart

31 10 9 8 7 0

TCP/UDP Checksum Off-load enabled (1=yes, 0=no)

IP Checksum Off-load enabled (1=yes, 0=no)

This field controls the starting byte for the Packet Checksum calculation

Our driver programs this register with the value 0x00000000 (which disables Checksum Off-loading for TCP/UDP packets (which we won’t be receiving) and for IP packets (which likewise won’t be sent by our ‘xmit1000.c’ driver), and all Packet-Checksums will be calculated starting from the very first byte

Page 20: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Rx-Descriptor Control (0x2828)

0 0 0 0 0 0 0

GRAN

0 0 WTHRESH(Writeback Threshold)

0 0 0 FRCDPLX

FRCSPD 0HTHRESH

(Host Threshold)

ILOS

0 0

ASDE

0

LRST

0 0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16

PTHRESH(Prefetch Threshold)0 0

Recommended for 82573: 0x01010000 (GRAN=1, WTHRESH=1)

“This register controls the fetching and write back of receive descriptors. The three threshhold values are used to determine when descriptors are read from, and written to, host memory. Their values can be in units of cache lines or of descriptors (each descriptor is 16 bytes), based on the value of the GRAN bit (0=cache lines, 1=descriptors). When GRAN = 1, all descriptors are written back (even if not requested).” --Intel manual

Page 21: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Maximum-size buffers

• We use a minimal number of maximum-size receive-buffers (eight of 1536-bytes)

buffer7

buffer6

buffer5

buffer4

buffer3

buffer2

buffer1

buffer0

ring of eight rx-descriptors

kernelmemory

Page 22: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

NIC “owns” our rx-descriptors

descriptor 0012345678

descriptor 1

descriptor 2

descriptor 3

descriptor 4

descriptor 5

descriptor 6

descriptor 7

RDT

RDH

descriptor 8

RDLEN=0x80

RDBAH/RDBAL

This register getsinitialized to 8, thennever gets changed

This register getsinitialized to 0, thengets changed by the controller as newpackets are received

rxheadOur ‘static’ variable

Page 23: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Driver ‘defects’

• If an application tries to ‘read’ from our device-file ‘/dev/nic’, but the controller received a packet that contains more bytes of data than the user requested, excess bytes get “lost’ (i.e., discarded)

• If an application delays reading packets while the controller continues receiving, then an earlier packet gets “overwritten”

Page 24: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

In-class exercise #1

• Discuss with your nearest class-member your ideas for how these driver ‘defects’ might be overcome, so that packet-data being received will be protected against getting “lost” and/or being “overwritten”

Page 25: Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

In-class exercise #2

• Login to a pair of machines on the ‘anchor’ cluster and install our ‘xmit1000.ko’ and our ‘recv1000.ko’ modules (one on each)

• Try transferring a textfile from one of the machines to the other, by using ‘cat’:

anchor01$ cat textfile > /dev/nic

anchor02$ cat /dev/nic > recv1000.out

• How large a textfile can you successfully transfer using our simple driver-modules?