Upload
shawna-twichell
View
349
Download
13
Tags:
Embed Size (px)
Citation preview
Winsock KernelWinsock KernelBest PracticesBest Practices
Osman N. ErtugayOsman N. ErtugaySoftware Design EngineerSoftware Design EngineerWindows Network Developer PlatformWindows Network Developer PlatformMicrosoft CorporationMicrosoft Corporation
Session OutlineSession Outline
Brief Winsock Kernel (WSK) refresherBrief Winsock Kernel (WSK) refresherFamiliarity with WSK documentation and WSK sample Familiarity with WSK documentation and WSK sample in WDK ensures the most benefit from this sessionin WDK ensures the most benefit from this session
WSK programming guidelines and best practicesWSK programming guidelines and best practicesWSK registration and deregistrationWSK registration and deregistration
I/O Request Packet (IRP) handlingI/O Request Packet (IRP) handling
Buffer ownership and manipulationBuffer ownership and manipulation
Using socket callbacks versus socket functionsUsing socket callbacks versus socket functions
Memory/throughput tradeoff in stream data transferMemory/throughput tradeoff in stream data transfer
Transport Address securityTransport Address security
Dual-family socketsDual-family sockets
WSK RefresherWSK Refresher
Kernel-mode Network Programming InterfaceKernel-mode Network Programming InterfaceWSK replaces the Transport Driver Interface (TDI) for WSK replaces the Transport Driver Interface (TDI) for “consumers” of TDI (i.e., TDI clients)“consumers” of TDI (i.e., TDI clients)
WSK is not a “provider” interface for new WSK is not a “provider” interface for new transport developmenttransport development
WSK goals/benefitsWSK goals/benefitsEasier to use, consistent APIEasier to use, consistent API
Higher performance, better scalabilityHigher performance, better scalability
Better fit for the Next Generation TCP/IP StackBetter fit for the Next Generation TCP/IP Stack
Similar to Winsock2, but not the same Similar to Winsock2, but not the same
Easy to port to for existing TDI clientsEasy to port to for existing TDI clients
WSK RefresherWSK Refresher
WSK Client Driver
User-modeUser-mode
Kernel-modeKernel-mode
WSKRegistration
Library
I/O Manager
NetworkModule
Registrar(NMR)
WSK Subsystem
TCP (IPv6/IPv4) ...
WSK_CLIENT
SocketFunctions
SocketCallbacks
ClientFunctions
ClientCallbacks
UDP (IPv6/IPv4) Raw (IPv6/IPv4)
WSKRegistration
WSK
WSK_SOCKET
WSK Programming WSK Programming Guidelines And Guidelines And Best PracticesBest Practices
WSK Registration WSK Registration And DeregistrationAnd Deregistration
Use the new WSK registration library: Use the new WSK registration library: WskRegister WskDeregister WskCaptureProviderNPI WskReleaseProviderNPIWskRegister WskDeregister WskCaptureProviderNPI WskReleaseProviderNPI
const WSK_CLIENT_DISPATCH WskSampleClientDispatchWskSampleClientDispatch = { MAKE_WSK_VERSION(1, 0), // WSK version 1.0 0, // Reserved NULL // No WskClientEvent callback in WSK version 1.0};
WSK_REGISTRATION WskSampleRegistrationWskSampleRegistration;
NTSTATUSDriverEntry(. . .) { NTSTATUS status; WSK_CLIENT_NPI wskClientNpi; . . . wskClientNpi.ClientContext = NULL; wskClientNpi.Dispatch = &WskSampleClientDispatchWskSampleClientDispatch; status = WskRegisterWskRegister(&wskClientNpi, &WskSampleRegistrationWskSampleRegistration); . . .}
Network Module Registrar APIs still availableNetwork Module Registrar APIs still available
WSK Registration WSK Registration And DeregistrationAnd Deregistration
Capture the WSK_PROVIDER_NPI, use it, release itCapture the WSK_PROVIDER_NPI, use it, release itDo NOT use the WSK_PROVIDER_NPI after releasing itDo NOT use the WSK_PROVIDER_NPI after releasing it
WaitTimeOut usage in WskCaptureProviderNPIWaitTimeOut usage in WskCaptureProviderNPIWSK_NO_WAITWSK_NO_WAIT
WSK_INFINITE_WAITWSK_INFINITE_WAIT Do NOT use if calling from DriverEntry!Do NOT use if calling from DriverEntry!
NTSTATUSSomeWorkerRoutine(. . .) { NTSTATUS status; WSK_PROVIDER_NPI wskProviderNpi; . . . status = WskCaptureProviderNPIWskCaptureProviderNPI(&WskSampleRegistrationWskSampleRegistration, WSK_INFINITE_WAIT, &wskProviderNpi);
if(NT_SUCCESS(status)) { status = wskProviderNpi.Dispatch->WskSocketwskProviderNpi.Dispatch->WskSocket( wskProviderNpi.Client, AF_INET6, . . .);
WskReleaseProviderNPIWskReleaseProviderNPI(&WskSampleRegistrationWskSampleRegistration); } . . .}
WSK Registration WSK Registration And DeregistrationAnd Deregistration
WskDeregisterWskDeregisterMust be called exactly once for each successful WskRegister when WSK Must be called exactly once for each successful WskRegister when WSK client stops using WSKclient stops using WSKWill block untilWill block until
All captured provider NPI instances are returnedAll captured provider NPI instances are returnedAll outstanding calls to provider NPI functions completedAll outstanding calls to provider NPI functions completedAll sockets are closedAll sockets are closed
Must close all sockets and release all captured provider NPI instances for Must close all sockets and release all captured provider NPI instances for WskDeregister to returnWskDeregister to returnWill cause WskCaptureProviderNPI calls waiting in other threads (with Will cause WskCaptureProviderNPI calls waiting in other threads (with WSK_INFINITE_WAITWSK_INFINITE_WAIT or some timeout) to return or some timeout) to return
VOIDDriverUnload(. . .){ . . . WskDeregisterWskDeregister(&WskSampleRegistrationWskSampleRegistration); . . .}
IRP HandlingIRP HandlingWSK ClientWSK Client IO ManagerIO Manager WSK SubsystemWSK Subsystem
IoAllocateIrp(1, …)IoAllocateIrp(1, …)
IoSetCompletionRoutine(Irp,IoSetCompletionRoutine(Irp,
CompletionRoutine, Context,CompletionRoutine, Context,
TRUETRUE, , TRUETRUE, , TRUETRUE))
WskSend(Socket, …, Irp)WskSend(Socket, …, Irp)
IoCompleteRequest(Irp, …)IoCompleteRequest(Irp, …)CompletionRoutine(…, Irp, Context)CompletionRoutine(…, Irp, Context)
STATUS_MORE_PROCESSING_REQUIREDSTATUS_MORE_PROCESSING_REQUIRED
IoFreeIrp(Irp)IoFreeIrp(Irp)
IoReuseIrp(Irp, …)IoReuseIrp(Irp, …)
IRP HandlingIRP HandlingSimple example that waits for IRP completion synchronouslySimple example that waits for IRP completion synchronously
(Also demonstrating how to distinguish and optimize for “inline” IRP completion)(Also demonstrating how to distinguish and optimize for “inline” IRP completion)
NTSTATUSSyncIrpCompRtn(PDEVICE_OBJECT Reserved, PIRP Irp, PVOID Context){ PKEVENT compEvent = (PKEVENT)Context; if(Irp->PendingReturned)if(Irp->PendingReturned) KeSetEvent(compEvent, 2, FALSE); return STATUS_MORE_PROCESSING_REQUIREDreturn STATUS_MORE_PROCESSING_REQUIRED;;}
NTSTATUSSetSocketOption(PWSK_SOCKET Socket, . . .){ NTSTATUS status; CONST WSK_PROVIDER_BASIC_DISPATCH *dispatch = Socket->Dispatch; KEVENT compEvent; PIRP irp; KeInitializeEvent(&compEvent, SynchronizationEvent, FALSE); irp = IoAllocateIrp(1, FALSE); if(irp == NULL) return STATUS_INSUFFICIENT_RESOURCES; IoSetCompletionRoutineIoSetCompletionRoutine(irp, SyncIrpCompRtn, &compEvent, TRUETRUE, TRUETRUE, TRUETRUE); statusstatus = = dispatch->WskControlSocket(Socket, . . ., irp); if(status == STATUS_PENDING)if(status == STATUS_PENDING) KeWaitForSingleObject(&compEvent, Executive, KernelMode, FALSE, NULL); status = irp->IoStatus.Status; IoFreeIrp(irp); return status;}
Buffer Ownership Buffer Ownership And ManipulationAnd Manipulation
Setting up a WSK_BUFSetting up a WSK_BUFWSK_BUF.MdlWSK_BUF.Mdl
IoAllocateMdl(BufferAddress, BufferLength, . . .)IoAllocateMdl(BufferAddress, BufferLength, . . .)
MmProbeAndLockPages vs MmBuildMdlForNonPagedPoolMmProbeAndLockPages vs MmBuildMdlForNonPagedPool
WSK_BUF.LengthWSK_BUF.LengthMust be <= (BufferLength – WSK_BUF.Offset)Must be <= (BufferLength – WSK_BUF.Offset)
WSK_BUF.OffsetWSK_BUF.OffsetMust lie within the first MDL if WSK_BUF.Mdl Must lie within the first MDL if WSK_BUF.Mdl points to a chain of MDLspoints to a chain of MDLs
BufferAddressBufferAddress BufferLengthBufferLength
WSK_BUF.MdlWSK_BUF.Mdl
WSK_BUF.OffsetWSK_BUF.OffsetWSK_BUF.LengthWSK_BUF.Length
Page BoundaryPage Boundary
MDL ByteOffsetMDL ByteOffset
Example: Copy data from WSK_DATA_INDICATION list to a bufferExample: Copy data from WSK_DATA_INDICATION list to a buffer
Buffer Ownership Buffer Ownership And ManipulationAnd Manipulation
NTSTATUSCopyDataIndicationListToBuffer(__in PWSK_DATA_INDICATION DataIndication, __in SIZE_T BufSize, __out_bcount(BufferSize) PUCHAR Buf){ SIZE_T bytesCopied = 0;
while(DataIndication != NULL) {
PMDL mdl = DataIndication->Buffer.Mdl; ULONG offsetoffset = DataIndication->Buffer.Offset; SIZE_T lengthlength = DataIndication->Buffer.Length;
while(length > 0 && mdl != NULL) {
SIZE_T copyLength = min(lengthlength, MmGetMdlByteCount(mdl)-offsetoffset); PUCHAR sysAddr = (PUCHAR)MmGetSystemAddressForMdlSafe(mdl, LowPagePriority);
if(sysAddr == NULL) return STATUS_INSUFFICIENT_RESOURCES; else if((BufSize-bytesCopied) < copyLength) return STATUS_BUFFER_TOO_SMALL;
RtlCopyMemory(Buf+bytesCopied, sysAddr+offsetoffset, copyLength);
offset = 0;offset = 0; // WSK_BUF.Offset applies only to the first MDL bytesCopied += copyLength; lengthlength -= copyLength; mdl = mdl->Next; }
DataIndication = DataIndication->Next; }
return STATUS_SUCCESS;}
May “retain” (take temporary ownership of) a May “retain” (take temporary ownership of) a WSK data indication by returning WSK data indication by returning STATUS_PENDING from WskReceiveEvent or STATUS_PENDING from WskReceiveEvent or WskReceiveFromEvent callbacksWskReceiveFromEvent callbacks
Any status other than STATUS_PENDING means Any status other than STATUS_PENDING means data indication was NOT retained, hence no need to data indication was NOT retained, hence no need to call WskReleasecall WskReleaseMust release retained data indications via Must release retained data indications via WskReleaseWskReleaseDo not retain data indications with Do not retain data indications with WSK_FLAG_RELEASE_ASAP flag if possible. If you WSK_FLAG_RELEASE_ASAP flag if possible. If you do have to retain such indications, release them within do have to retain such indications, release them within a bounded short amount of time (in the order of a a bounded short amount of time (in the order of a few seconds)few seconds)
Buffer Ownership Buffer Ownership And ManipulationAnd Manipulation
Socket Callbacks Socket Callbacks Versus FunctionsVersus Functions
Accepting incoming connectionsAccepting incoming connectionsWskAcceptWskAccept
Client keeps one or more accept IRPs pended in WSKClient keeps one or more accept IRPs pended in WSKConnections rejected by WSK when no pending IRP exists Connections rejected by WSK when no pending IRP exists
WskAcceptEventWskAcceptEventWSK hands over “sockets” to client for arriving connectionsWSK hands over “sockets” to client for arriving connectionsClient accepts or rejectsClient accepts or rejects
Guidance Guidance Use WskAcceptEvent to accept as many connections as the Use WskAcceptEvent to accept as many connections as the system can handle at any given time system can handle at any given time Use WskAccept to accept only a few fixed number of Use WskAccept to accept only a few fixed number of connections at any given timeconnections at any given timeWSK does not have equivalent of listen backlog in Winsock2WSK does not have equivalent of listen backlog in Winsock2
Socket Callbacks Socket Callbacks Versus FunctionsVersus Functions
Receiving datagramsReceiving datagramsWskReceiveFromWskReceiveFrom
Data buffer owned by client, must allocate before data arrivesData buffer owned by client, must allocate before data arrives
Client keeps one or more receive IRPs pended in WSKClient keeps one or more receive IRPs pended in WSK
Datagrams dropped by WSK when no pending IRP existsDatagrams dropped by WSK when no pending IRP exists
WskReceiveFromEventWskReceiveFromEventData buffer owned by WSK, allocated when data arrivesData buffer owned by WSK, allocated when data arrives
Each arriving datagram handed over to client by WSKEach arriving datagram handed over to client by WSK
GuidanceGuidanceAlways use WskReceiveFromEvent as long as you do not retain Always use WskReceiveFromEvent as long as you do not retain datagram indications too longdatagram indications too long
Use WskReceiveFrom only if you must always copy datagrams into Use WskReceiveFrom only if you must always copy datagrams into your own buffers anywayyour own buffers anyway
WSK does not buffer datagramsWSK does not buffer datagrams
Socket Callbacks Socket Callbacks Versus FunctionsVersus Functions
Receiving stream dataReceiving stream dataWskReceiveWskReceive
Data buffer owned by client, must allocate before data arrivesData buffer owned by client, must allocate before data arrives
0-copy into client buffer possible0-copy into client buffer possible
Data buffered by transport if no pending receive IRP existsData buffered by transport if no pending receive IRP exists
WskReceiveEventWskReceiveEventData buffer owned by WSKData buffer owned by WSK
0-copy into client buffer not possible0-copy into client buffer not possible
Data handed over to client until client rejects indicationData handed over to client until client rejects indication
Client needs to use WskReceive to retrieve rejected dataClient needs to use WskReceive to retrieve rejected data
GuidanceGuidanceUse WskReceive for large block transfersUse WskReceive for large block transfers
Combined usage: Get initial data via WskReceiveEvent, then get rest of the data Combined usage: Get initial data via WskReceiveEvent, then get rest of the data via WskReceivevia WskReceive
WskReceiveEvent WskReceiveEvent Amount of retained data and the time Amount of retained data and the time retained must be bounded and smallretained must be bounded and small
Socket Callbacks Socket Callbacks Versus FunctionsVersus Functions
Both socket callbacks and the IRP Both socket callbacks and the IRP completions for socket functions mostly completions for socket functions mostly occur in Deferred Procedure Call occur in Deferred Procedure Call (DPC) context(DPC) context
Must limit amount of processing in callback Must limit amount of processing in callback and IRP completion routinesand IRP completion routines
Consider using Consider using System worker threads for tasks that won’t last System worker threads for tasks that won’t last too longtoo long
Dedicated system thread for long lasting tasks Dedicated system thread for long lasting tasks
Memory/Throughput TradeoffMemory/Throughput Tradeoff
Stream sockets Stream sockets subject to transport flow control subject to transport flow controlSend requests may remain pended until acknowledged by peerSend requests may remain pended until acknowledged by peerToo much pended send data Too much pended send data Poor memory usagePoor memory usageToo little pended send data Too little pended send data Suboptimal throughputSuboptimal throughput
So, how much data to keep pended So, how much data to keep pended (Ideal send backlog: “ISB”)?(Ideal send backlog: “ISB”)?
As much as the network can sustainAs much as the network can sustainAs much as the receiver can sustainAs much as the receiver can sustain
Use the Use the SIO_WSK_QUERY_IDEAL_SEND_BACKLOGSIO_WSK_QUERY_IDEAL_SEND_BACKLOG IOCTL and the IOCTL and the WskSendBacklogEventWskSendBacklogEvent callback callback
Initial ISB to use Initial ISB to use SIO_WSK_QUERY_IDEAL_SEND_BACKLOGSIO_WSK_QUERY_IDEAL_SEND_BACKLOGGet ISB change notifications Get ISB change notifications WskSendBacklogEventWskSendBacklogEventAlways have two or more WskSend requests pended with ISB Always have two or more WskSend requests pended with ISB worth of data in total. Example: ISB = 64 K worth of data in total. Example: ISB = 64 K 2 WskSend 2 WskSend requests, each with 32 K datarequests, each with 32 K data
Transport Address SecurityTransport Address Security
Secure by default: Creating socket with NULL Secure by default: Creating socket with NULL SecurityDescriptor and binding it to an address results in SecurityDescriptor and binding it to an address results in SO_EXCLUSIVEADDRUSE behaviorSO_EXCLUSIVEADDRUSE behavior
Refrain from designing applications based on Refrain from designing applications based on address sharingaddress sharing
If you must allow address sharingIf you must allow address sharingMay set SO_REUSEADDR to TRUE May set SO_REUSEADDR to TRUE Anybody else can reuse Anybody else can reuse the address (not good from security perspective)the address (not good from security perspective)
May use a SecurityDescriptor May use a SecurityDescriptor Sharing is allowed/denied based Sharing is allowed/denied based on an access check performed by the systemon an access check performed by the system
WSK (transport) uses the WSK (transport) uses the SecurityDescriptorSecurityDescriptor specified by the first specified by the first socketsocket and the SECURITY_SUBJECT_CONTEXT captured from the and the SECURITY_SUBJECT_CONTEXT captured from the OwningProcessOwningProcess and and OwningThreadOwningThread specified by the second socket specified by the second socket to perform the access checkto perform the access check
Dual Family SocketsDual Family Sockets
Use a single IPv6 socket to handle both IPv6 and Use a single IPv6 socket to handle both IPv6 and IPv4 trafficIPv4 traffic
Set the IPV6_V6ONLY option to FALSE (default is TRUE)Set the IPV6_V6ONLY option to FALSE (default is TRUE)
Bind to wildcard addressBind to wildcard address
// Example dual family listening socketULONG optVal = 0optVal = 0;;. . .status = dispatch->WskControlSocket(IPv6ListeningSocket, WskSetOptionWskSetOption,, IPV6_V6ONLYIPV6_V6ONLY, , IPPROTO_IPV6IPPROTO_IPV6,, sizeof(optVal), &optValoptVal, 0, NULL, NULL, irp); . . .status = dispatch->WskBind(IPv6ListeningSocket, (PSOCKADDR)Ipv6WildcardAddressIpv6WildcardAddress, 0, irp);. . .
IPv4 addresses represented in V4MAPPED IPv6 IPv4 addresses represented in V4MAPPED IPv6 address formataddress format
Can use the Can use the INETADDR_ISV4MAPPEDINETADDR_ISV4MAPPED macro from mstcpip.h to macro from mstcpip.h to check if a given SOCKADDR represents a V4MAPPED addresscheck if a given SOCKADDR represents a V4MAPPED address
Call To ActionCall To Action
Port your existing kernel-mode TDI Port your existing kernel-mode TDI applications to WSK and use WSK for applications to WSK and use WSK for new developmentnew development
Move from using TDI filter drivers to WFP Move from using TDI filter drivers to WFP for network traffic interceptionfor network traffic interception
Follow the practices outlined in this Follow the practices outlined in this session to achieve optimal performance session to achieve optimal performance and stability from WSKand stability from WSK
Additional ResourcesAdditional Resources
Web ResourcesWeb ResourcesWindows Network Developer Platform (WNDP) Windows Network Developer Platform (WNDP) Team BlogTeam Blog
http://blogs.msdn.com/wndphttp://blogs.msdn.com/wndp
WNDP Team Connect Site WNDP Team Connect Site Join the ‘WNDP’ program at Join the ‘WNDP’ program at http://connect.microsoft.comhttp://connect.microsoft.com
Related SessionsRelated SessionsHow to Use the Windows Filtering Platform to How to Use the Windows Filtering Platform to Integrate with Windows NetworkingIntegrate with Windows Networking
Using NDIS 6.0, TCP Chimney Offload, and RSS to Using NDIS 6.0, TCP Chimney Offload, and RSS to Achieve High Performance NetworkingAchieve High Performance Networking
E-mail:E-mail:wskapi @ microsoft.comwskapi @ microsoft.com
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.