Network Programming for Microsoft Windows

( 2 )

Overview

This updated edition provides the latest information about how to write applications that take advantage of the advanced networking protocols and technologies that Microsoft Windows XP supports-Internet Protocol (IP) versions 4 and 6, Pragmatic General Multicasting (PGM) protocol, Internet Group Management Protocol 3 (IGMPv3), IPv6 multicasting, the Network Location Awareness (NLA) namespace provider, the Winsock Provider Interface, 64-bit Winsock APIs, and .NET Sockets. The book includes code samples in the ...
See more details below
Available through our Marketplace sellers.
Other sellers (Paperback)
  • All (13) from $12.42   
  • New (3) from $126.68   
  • Used (10) from $12.40   
Close
Sort by
Page 1 of 1
Showing All
Note: Marketplace items are not eligible for any BN.com coupons and promotions
$126.68
Seller since 2008

Feedback rating:

(214)

Condition:

New — never opened or used in original packaging.

Like New — packaging may have been opened. A "Like New" item is suitable to give as a gift.

Very Good — may have minor signs of wear on packaging but item works perfectly and has no damage.

Good — item is in good condition but packaging may have signs of shelf wear/aging or torn packaging. All specific defects should be noted in the Comments section associated with each item.

Acceptable — item is in working order but may show signs of wear such as scratches or torn packaging. All specific defects should be noted in the Comments section associated with each item.

Used — An item that has been opened and may show signs of wear. All specific defects should be noted in the Comments section associated with each item.

Refurbished — A used item that has been renewed or updated and verified to be in proper working condition. Not necessarily completed by the original manufacturer.

New
0735615799 New. Has the slightest of shelf wear (like you might see in a major bookstore chain). Looks like an interesting title! We provide domestic tracking upon request, ... provide personalized customer service and want you to have a great experience purchasing from us. 100% satisfaction guaranteed and thank you for your consideration. Read more Show Less

Ships from: Chicago, IL

Usually ships in 1-2 business days

  • Standard, 48 States
  • Standard (AK, HI)
$126.68
Seller since 2008

Feedback rating:

(214)

Condition: New

Ships from: Chicago, IL

Usually ships in 1-2 business days

  • Standard, 48 States
  • Standard (AK, HI)
$130.68
Seller since 2014

Feedback rating:

(3)

Condition: New
New Brand new. Excellent consition. with CD.

Ships from: Scarborough, Canada

Usually ships in 1-2 business days

  • Canadian
  • International
  • Standard, 48 States
  • Standard (AK, HI)
  • Express, 48 States
  • Express (AK, HI)
Page 1 of 1
Showing All
Close
Sort by
Sending request ...

Overview

This updated edition provides the latest information about how to write applications that take advantage of the advanced networking protocols and technologies that Microsoft Windows XP supports-Internet Protocol (IP) versions 4 and 6, Pragmatic General Multicasting (PGM) protocol, Internet Group Management Protocol 3 (IGMPv3), IPv6 multicasting, the Network Location Awareness (NLA) namespace provider, the Winsock Provider Interface, 64-bit Winsock APIs, and .NET Sockets. The book includes code samples in the Microsoft Visual Basic, Microsoft Visual C++, and Microsoft Visual C# development systems.
Read More Show Less

Product Details

  • ISBN-13: 9780735615793
  • Publisher: Microsoft Press
  • Publication date: 2/13/2002
  • Series: PRO-Developer Series
  • Edition description: 2nd Edition
  • Edition number: 2
  • Pages: 580
  • Product dimensions: 7.70 (w) x 8.62 (h) x 1.58 (d)

Meet the Author


Anthony Jones and Jim Ohlund are Support Engineers with the NetAPI Developer Support Team at Microsoft. Anthony wrote a feature story for the May 1998 Microsoft Systems Journal on network programming with Windows CE.
Read More Show Less

Read an Excerpt


Chapter 3: Mailslots

Microsoft Windows NT, Windows 2000, Windows 95, and Windows 98 (but not Windows CE) include a simple one-way interprocess communication (IPC) mechanism known as mailslots. in simplest terms, mailslots allow a client process to transmit or broadcast messages to one or more server processes. Mailslots can assist transmission of messages among processes on the same computer or among processes on different computers across a network. Developing applications using mailslots is simple, requiring no formal knowledge of underlying network transport protocols such as TCP/IP or IPX. Because mailslots are designed around a broadcast architecture, you can't expect reliable data transmissions via mailslots. They can be useful, nevertheless, in certain types of network programming situations in which delivery of data isn't mission-critical.

One possible scenario for using mailslots is developing a messaging system that includes everyone in your office. Imagine that your office environment has a large number of workstations. Your office is suffering from a soda shortage, and every workstation user in your office is interested in knowing every few minutes how many Cokes are available in the soda machine. Mailslots lend themselves well to this type of situation. You can easily implement a mailslot client application that monitors the soda count and broadcasts to every interested workstation user the total number of available Cokes at five-minute intervals. Because mailslots don't guarantee delivery of a broadcast message, some workstation users might not receive all updates. A few failures of transmission won't be a problem in this case because messages sent at five-minute intervals with occasional misses are still frequent enough to keep the work- station users well informed.

The major limitation of mailslots is that they permit only unreliable one-way data communication from a client to a server. The biggest advantage of mailslots is that they allow a client application to easily send broadcast messages to one or more server applications.

This chapter explains how to develop a mailslot client/server application. We'll describe mailslot naming conventions before we address the message sizing considerations that control the overall behavior of mailslots. Next we'll show you the details of developing a basic client/server application. At the end of this chapter, we'll tell you about known problems and limitations of mailslots and offer workaround solutions.

MAILSLOT IMPLEMENTATION DETAILS

Mailslots are designed around the Windows file system interface. Client and server applications use standard Win32 file system I/0 functions, such as ReadFile and WriteFile, to send and receive data on a mailslot and take advantage of Win32 file system naming conventions. Mailslots rely on the Windows redirector to create and identify mailslots using a file system named the Mailslot File System (MSFS). Chapter 2 described the Windows redirector in greater detail.

Mailslot Names

Mailslots use the following naming convention for identification:

\server\Mailslot\ [path]name

The string above is divided into three portions: \\server, \Mailslot, and \[path]name. The first string portion, \\server, represents the name of the server on which a mailslot is created and on which a server application is running. The second portion, \Mailslot, is a hardcoded mandatory string for notifying the system that this filename belongs to MSFS. The third portion, \[path]name, allows applications to uniquely define and identify a mailslot name; the path portion might specify multiple levels of directories. For example, the following types of names are legal for identifying a mailslot:

\\Oreo\Mailslot\Mymailslot

\\Testserver\Mailslot\Cooldirectory\Funtest\Anothermailslot

\\.\Mailslot\Easymailslot

\\*\Mailslot\Myslot

The server string portion can be represented as a dot (.), an asterisk (*), a domain name, or a server name. A domain is simply a group of workstations and servers that share a common group name. We'll examine mailslot names in greater detail later in this chapter, when we cover implementation details of a simple client.

Because mailslots rely on the Windows file system services for creation and transferring data over a network, the interface protocol is independent. When creating your application, you don't have to worry about the details of underlying network transport protocols to form communications among processes across a network. When mailslots communicate remotely to computers across a network, the Windows file system services rely on the Windows redirector to send data from a client to a server using the Server Message Block (SMB) protocol. Messages are typically sent via connectionless transfers, but you can force the Windows redirector to use connection-oriented transfers on Windows NT and Windows 2000, depending on the size of your message.

Message Sizing

Mailslots normally use datagrams to transmit messages over a network. Datagrams are small packets of data that are transmitted over a network in a connectionless manner. Connectionless transmission means that each data packet is sent to a recipient without packet acknowledgment. This is unreliable data transmission, which is bad in that you cannot guarantee message delivery. However, connectionless transmission does give you the capability to broadcast a message from one client to many servers. The exception to this occurs on Windows NT and Windows 2000 when messages exceed 424 bytes.

On Windows NT and Windows 2000, messages larger than 426 bytes are transferred using a connection-oriented protocol over an SMB session instead of using datagrams. This allows large messages to be transferred reliably and efficiently. However, you lose the ability to broadcast a message from a client to many servers. Connection-oriented transfers are limited to one-to-one communication: one client to one server. Connection-oriented transfers normally provide reliable guaranteed delivery of data between processes, but the mailslot interface on Windows NT and Windows 2000 does not guarantee that a message will actully be written to a mailslot. For example, if you send a large message from a client to a server that does not exist on a network, the mailslot interface does not tell your client application that it failed to submit data to the server. Since Windows NT and Windows 2000 change their transmission method based on message size, an interoperability problem occurs when you send large messages between a machine running Windows NT or Windows 2000 and a machine running Windows 95 or Windows 98.

Windows 95 and Windows 98 deliver messages via datagrams only, regardless of message size. If a Windows 95 or Windows 98 client attempts to send a message larger than 424 bytes to a Windows NT or Windows 2000 server, Windows NT and Windows 2000 accept the first 424 bytes and truncate the remaining data. Windows NT and Windows 2000 expect larger messages to be sent over a connection-oriented, SMB session. A similar problem exists in transferring messages from a Windows NT or Windows 2000 client to a Windows 95 or Windows 98 server. Remember that Windows 95 and Windows 98 receive data via datagrams; only. Because Windows NT and Windows 2000 transfer data via datagrams for messages 426 bytes or smaller.....

Read More Show Less

Table of Contents

Acknowledgments
Introduction
1 Introduction to Winsock 1
2 Winsock Design 43
3 Internet Protocol 57
4 Other Supported Protocols 87
5 Winsock I/O Methods
6 Scalable Winsock Applications
7 Socket Options and loctis 201
8 Registration and Name Resolution 275
9 Multicasting 311
10 Generic Quality of Service 337
11 Raw Sockets 383
12 The Winsock Service Provider Interface 401
13 .NET Sockets Programming Using C# 463
14 The Microsoft Visual Basic Winsock Control 475
15 Remote Access Service 509
16 IP Helper Functions 535
Index 567
Read More Show Less

First Chapter

  • APIs and Scalability
    • AcceptEx
    • GetAcceptExSockaddrs
    • TransmitFile
    • TransmitPackets
    • ConnectEx
    • DisconnectEx
    • WSARecvMsg
  • Scalable Server Architecture
    • Accepting Connections
    • Data Transfers
    • TransmitFile and TransmitPackets
  • Resource Management
  • Server Strategies
    • High Throughput
    • Maximizing Connections
    • Performance Numbers
  • Winsock Direct and Sockets Direct Protocol
  • Conclusion

6 Scalable Winsock Applications

Developing Winsock applications has always been considered to be cryptic and difficult to learn. In reality, there are only a few basic principles, such as socket creation, connecting a socket, accepting connections, and sending and receiving data. The real difficultly lies in developing a scalable Winsock application that can handle a single connection or thousands of connections. This chapter describes how to write scalable Winsock applications for Windows NT. The main focus is the server side of the client-server model; however, some of the topics apply equally to both.

This discussion of writing scalable applications applies to server applications and therefore only applies to Windows NT 4.0 and later versions. We don't include earlier versions of Windows NT because many of the features we will cover require Winsock 2, which is available only on Windows NT 4.0 and later versions. Finally, the focus of our discussion will be on the TCP/IP protocol. However, all of the topics we cover can easily apply to other connection-oriented, stream-based protocols. Some of the topics apply to UDP/IP as well (such as resource management) but connectionless, message-based protocols themselves will not be covered.

This chapter will first discuss the different Winsock API functions designed for use in scalable, high-performance applications such as AcceptEx, TransmitFile, and ConnectEx. Typically, these are Microsoft- specific extensions added with different versions of the operating system because the original Winsock specification leaves out several key asynchronous functions. We'll then cover the necessary steps for implementing a scalable server and discuss how to handle low resource conditions that occur when the number of connections becomes very large.

APIs and Scalability

The only I/O model that provides true scalability on Windows NT platforms is overlapped I/O using completion ports for notification. In Chapter 5, we covered the various methods of socket I/O and explained that for a large number of connections, completion ports offer the greatest flexibility and ease of implementation. Mechanisms like WSAAsyncSelect and select are provided for easier porting from Windows 3.1 and UNIX, respectively, but are not designed to scale. The event-based models are not scalable because of the operating system limit of simultaneous wait events.

The other major advantages of overlapped I/O are the several Microsoft-specific extensions that can only be called in an overlapped manner. When you use overlapped I/O there are several options for how the notifications can be received. Event-based notification is not scalable because the operating system limit of waiting on 64 objects necessitates using many threads. This is not only inefficient but requires a lot of housekeeping overhead to assign events to available worker threads. Overlapped I/O with callbacks is not an option for several reasons. First, many of the Microsoft-specific extensions do not allow Asynchronous Procedure Calls (APCs) for completion notification. Second, due to the nature of how APCs are handled on Windows, it is possible for an application thread to starve. Once a thread goes into an alertable wait, all pending APCs are handled on a first in first out (FIFO) basis. Now consider the situation in which a server has a connection established and posts an overlapped WSARecv with a completion function. When there is data to receive, the completion routine fires and posts another overlapped WSARecv. Depending on timing conditions and how much work is performed within the APC, another completion function is queued (because there is more data to be read). This can cause the server's thread to starve as long as there is pending data on that socket.

Before delving deeper into the architecture of scalable Winsock applications, let's discuss the Microsoft-specific extensions that will aid us in developing scalable servers. These APIs are TransmitFile, AcceptEx, ConnectEx, TransmitPackets, DisconnectEx, and WSARecvMsg. There is a related extension function, GetAcceptExSockaddrs, which is used in conjunction with AcceptEx.

Before describing each of the extension API functions, it is important to point out that these functions are defined in MSWSOCK.H. Also, only three of the functions (TransmitFile, AcceptEx, and GetAcceptExSockaddrs) are actually exported from MSWSOCK.DLL. However, applications should avoid using those. Instead, applications should dynamically load the extension function, which is required for all the remaining extension APIs. Not all providers have to support these APIs, so it is best to explicitly load these APIs from the provider you are using. See Chapter 7 and the SIO_GET_EXTENSION_FUNCTION_POINTER for an example of how to load the extension APIs.

AcceptEx

Perhaps the most useful extension API for scalable TCP/IP servers is AcceptEx. This function allows the server to post an asynchronous call that will accept the next incoming client connection. This function is defined as

BOOL
PASCAL FAR
AcceptEx (
IN SOCKET sListenSocket,
IN SOCKET sAcceptSocket,
IN PVOID lpOutputBuffer,
IN DWORD dwReceiveDataLength,
IN DWORD dwLocalAddressLength,
IN DWORD dwRemoteAddressLength,
OUT LPDWORD lpdwBytesReceived,
IN LPOVERLAPPED lpOverlapped
);

The first parameter is the listening socket, and sAcceptSocket is a valid, unbound socket handle that will be assigned to the next client connection. So the socket handle for the client needs to be created before posting the AcceptEx call. This is necessary because socket creation is expensive, and if a server is interested in handling client connections as fast as possible, it needs to have a pool of sockets already created on which new connections will be assigned.

The four parameters that follow sAcceptSocket are related. The lpOutputBuffer is required and is filled in with the local and remote addresses for the client connection as well as an optional buffer to receive the first data chunk received from the client. The dwReceiveDataLength indicates how many bytes of the supplied buffer should be used to receive data sent by the client. An application may choose not to receive data and may specify zero. The dwLocalAddressLength specifies the size of the socket address structure corresponding to the address family of the client socket plus 16 bytes. The local address of the client socket connection is placed in the lpOutputBuffer following the receive data if specified. The dwRemoteAddressLength is the same. The remote address of the client connection will be written to the lpOutputBuffer following the receive data (if specified) and the local address. Note that dwReceiveDataLength may be zero but dwLocalAddressLength and dwRemoteAddressLength cannot be.

The lpdwBytesReceived indicates the number of bytes received on the newly-established client connection if the operation succeeds immediately. Finally, lpOverlapped is the WSAOVERLAPPED structure for this overlapped operation. This parameter is required—if you want to perform a blocking accept call, just use accept or WSAAccept.

Before going any farther, let's take a quick look at an example using the AcceptEx function. The following code creates an IPv4 listening socket and posts a single AcceptEx.

SOCKET         s, sclient;
HANDLE hCompPort;
LPFN_ACCEPTEX lpfnAcceptEx=NULL;
GUID GuidAcceptEx=WSAID_ACCEPTEX;
// The WSAOVERLAPPEDPLUS type will be described in detail in
// Chapter 12 and includes a WSAOVERLAPPED struc ture as well as
// context information for the overlapped operat ion.
WSAOVERLAPPEDPLUS    ol;
SOCKADDR_IN salocal;
DWORD dwBytes;
char buf[1024];
int buflen=1024;
// Create the completion port
hCompPort = CreateIoCompletionPort(INVALID_HAN DLE_VALUE,
NULL,
(ULONG_PTR) 0,
0
);
// Create the listening socket
s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
// Associate listening socket to completion port
CreateIoCompletionPort((HANDLE)s,
hCompPort,
(ULONG_PTR)0,
0
);
// Bind the socket to the local port
salocal.sin_family = AF_INET;
salocal.sin_port = htons(5150);
salocal.sin_addr.s_addr = htonl(INADDR_ANY);
bind(s, (SOCKADDR *)&salocal, sizeof(salocal)) ;
// Set the socket to listening
listen(s, 200);
// Load the AcceptEx function
WSAIoctl(s,
SIO_GET_EXTENSION_FUNCTION_POINTER,
&GuidAcceptEx,
sizeof(GuidAcceptEx),
&lpfnAcceptEx,
sizeof(lpfnAcceptEx),
&dwBytes,
NULL,
NULL
);
// Create the client socket for the accepted con nection
sclient = socket(AF_INET, SOCK_STREAM, IPPROTO _TCP);
// Initialize our "extended" overlapped structur e
memset(&ol, 0, sizeof(ol));
ol.operation = OP_ACCEPTEX;
ol.client = sclient;
lpfnAcceptEx(s,
sclient,
buf,
buflen - ((sizeof(SOCKADDR_IN) + 16) * 2),
sizeof(SOCKADDR_IN) + 16,
sizeof(SOCKADDR_IN) + 16,
&dwBytes,
&ol.overlapped
);
// Call GetQueuedCompletionStatus within the com pletion function
// After the AcceptEx operation completes associ ate the accepted client
// socket with the completion port

This sample is a bit simplified but it shows the necessary steps. It shows how to set up the listening socket, which you've seen before. Then it shows how to load the AcceptEx function. Applications should always load the extension functions themselves to avoid the performance penalty of the exported extension functions from MSWSOCK.DLL, because for each call they simply end up loading the same function. Next, the application-specific overlapped structure is established, which contains necessary information concerning the asynchronous operation so that when it completes the server can figure out what happened. The actual declaration of this type is not included for the sake of simplicity. See Chapter 5 for more information about this. Finally, once the AcceptEx operation completes, the newly-accepted client socket should be associated with the completion port.

Also be aware that because of the high performance nature of AcceptEx, the listening socket's socket attributes are not automatically inherited by the client socket. To do this, the server must call setsockopt with SO_UPDATE_ACCEPT_CONTEXT with the client socket handle. See Chapter 7 for more information.

Another point to be aware of, which we mentioned in Chapter 5, is that if a receive buffer is specified to AcceptEx (for example, dwReceiveDataLength is greater than zero), then the overlapped operation will not complete until at least one byte of data has been received on the connection. So a malicious client could post many connections but never send any data. Chapter 5 discusses methods to prevent this by using the SO_CONNECT_TIME socket option. The AcceptEx function is available on Windows NT 4.0 and later versions.

GetAcceptExSockaddrs

This is really a companion function to AcceptEx because it is required to decode the local and remote addresses contained within the buffer passed to the accept call. As you remember, a single buffer will contain any data received on the connection as well as the local and remote addresses for that connection. Any data indicated to be received will always be placed at the start of this buffer followed by the addresses. However, these addresses are in a packed form and the GetAcceptExSockaddrs function will decode them into the appropriate SOCKADDR structure for the address family. This function is defined as

VOID
PASCAL FAR
GetAcceptExSockaddrs (
IN PVOID lpOutputBuffer,
IN DWORD dwReceiveDataLength,
IN DWORD dwLocalAddressLength,
IN DWORD dwRemoteAddressLength,
OUT struct sockaddr **LocalSockaddr,
OUT LPINT LocalSockaddrLength,
OUT struct sockaddr **RemoteSockaddr,
OUT LPINT RemoteSockaddrLength
);

The first four parameters are the same as those in the AcceptEx call and they must match the values passed to AcceptEx. That is, if 1024 was specified as dwReceiveDataLength in AcceptEx then the same value must be passed to GetAcceptExSockaddrs. The remaining four parameters are SOCKADDR pointers and their lengths for the local and remote addresses. These parameters are all output parameters. The following code illustrates how you would call Get-AcceptExSockaddrs after the AcceptEx call in our previous example completes:

// buf and bufflen were defined previously
SOCKADDR *lpLocalSockaddr=NULL,
*lpRemoteSockaddr=NULL;
int LocalSockaddrLen=0,
RemoteSockaddrLen=0;
LPFN_GETACCEPTEXSOCKADDRS lpfnGetAcceptExSocka ddrs=NULL;
// Load the GetAcceptExSockaddrs function
lpfnGetAcceptExSockaddrs(
buf,
buflen - ((sizeof(SOCKADDR_IN) + 16) * 2),
sizeof(SOCKADDR_IN) + 16,
sizeof(SOCKADDR_IN) + 16,
&lpLocalSockaddr,
&LocalSockaddrLen,
&lpRemoteSockaddr,
&RemoteSockaddrLen
);

After the function completes, the lpLocalSockaddr and lpRemoteSockaddr point to within the specified buffer where the socket addresses have been unpacked into the correct socket address structure.

TransmitFile

TransmitFile is an extension API that allows an open file to be sent on a socket connection. This frees the application from having to manually open the file and repeatedly perform a read from the file, followed by writing that chunk of data on the socket. Instead, an open file handle is given along with the socket connection and the file data is read and sent on the socket all within kernel mode. This prevents the multiple kernel transitions required when you perform the file read yourself. This API is defined as

BOOL
PASCAL FAR
TransmitFile (
IN SOCKET hSocket,
IN HANDLE hFile,
IN DWORD nNumberOfBytesToWrite,
IN DWORD nNumberOfBytesPerSend,
IN LPOVERLAPPED lpOverlapped,
IN LPTRANSMIT_FILE_BUFFERS lpTransmitBuffe rs,
IN DWORD dwReserved
);

The first parameter is the connection socket. The hFile parameter is a handle to an open file. This parameter can be NULL in which case the lpTransmitBuffers are transmitted. Of course it doesn't make much sense to use TransmitFile to send memory-only buffers. nNumberOfBytesToWrite is the number of bytes to send from the file. A value of zero indicates send the entire file. The nNumberOfBytesPerSend indicates the size of each block of data sent in each send operation. If zero is specified, the system uses the default send size. The default send size on Windows NT Workstation is 4k and on Windows Server it is 64k. The lpOverlapped structure is optional. Note that if the OVERLAPPED structure is omitted, then the file transfer begins at the current file pointer position. Otherwise, the offset values in the OVERLAPPED structure can indicate where the operation starts. The lpTransmitBuffers is a TRANSMIT_FILE_BUFFERS structure that contains memory buffers to transmit before and after the file is transmitted. This parameter is optional. The last parameter is optional flags, which affect the behavior of the file operation. Table 6-1 contains the possible flags and their meaning. Multiple flags may be specified.

Table 6-1 TransmitFile Flags

Flag Meaning
TF_DISCONNECT Start a transport-level disconnect after the TransmitFile operation has been queued.
TF_REUSE_SOCKET Prepare the socket handle to be reused. After the TransmitFile completes, the socket handle may be used as the client socket in AcceptEx. This flag is valid only if TF_DISCONNECT is also specified.
TF_USE_DEFAULT_WORKER Indicates the file transfer to use the system's default thread. This is useful for large file sends.
TF_USE_SYSTEM_THREAD This option also indicates the TransmitFile operation to use system threads for processing.
TF_USE_KERNEL_APC Indicates that kernel asynchronous procedure calls should be used instead of worker threads to process the TransmitFile request. Note that kernel APCs can only be scheduled to run when the application is in a wait state (not necessarily an alertable wait state though).
TF_WRITE_BEHIND Indicates that the TransmitFile request should return immediately even though the data may not have been acknowledged by the remote end. This flag should not be used with TF_DISCONNECT or TF_REUSE_SOCKET.

The TransmitFile function is useful for file-based I/O such as Web servers. In addition, one beneficial feature of TransmitFile is the capability of specifying the flags TF_DISCONNECT and TF_REUSE_SOCKET. When both of these flags are specified, the file and/or memory buffers are transmitted and the socket is disconnected once the send operation has completed. Also, the socket handle passed to the API can then be used as the client socket in AcceptEx or the connecting socket in ConnectEx. This is extremely beneficial because socket creation is very expensive. A server can use AcceptEx to handle client connections, then use TransmitFile to send data (specifying these flags), and afterward the socket handle may be used in a subsequent call to AcceptEx.

Note that you can call TransmitFile with a NULL file handle and NULL lpTransmitBuffers but still specify TF_DISCONNECT and TF_REUSE_SOCKET. This call will not send any data but allows the socket to be reused in AcceptEx. This is a good workaround for platforms that do not support the DisconnectEx API discussed later in this chapter. Finally, the TransmitFile function is available on Windows NT 4.0 and later version. Also, because TransmitFile is geared toward server applications, it is fully functional only on server versions of Windows. On home and professional versions, there may be only two outstanding TransmitFile (or TransmitPackets) calls at any given time. If there are more, then they are queued and not processed until the executing calls are finished.

TransmitPackets

The TransmitPackets extension is similar to TransmitFile because it too is used to send data. The difference between them is that TransmitPackets can send both files and memory buffers in any number and order. This function is defined as

BOOL
(PASCAL FAR * LPFN_TRANSMITPACKETS) (
SOCKET hSocket,
LPTRANSMIT_PACKETS_ELEMENT lpPacketArray,
DWORD nElementCount,
DWORD nSendSize,
LPOVERLAPPED lpOverlapped,
DWORD dwFlags
);

The first parameter is the connected socket on which to send the data. Also, TransmitPackets works over datagram and stream-oriented protocols (such as TCP/IP and UDP/IP), unlike TransmitFile. The lpPacketArray is an array of one or more TRANSMIT_PACKETS_ELEMENT structures, which we'll define shortly. nElementCount simply indicates the number of members in the TRANSMIT_PACKETS_ELEMENT array. nSendSize is the same as the nNumberOfBytesPerSend parameter of TransmtFile. lpOverlapped indicates the overlapped structure is optional. dwFlags are the same as those for TransmitFile. See Table 6-1 for the options. The only exception is that the flag names begin with TP instead of TF—but their meanings are the same. And because TransmitPackets works over datagrams, the TP_DISCONNECT and TP_REUSE_SOCKET have no meaning for datagrams and specifying them will result in an error.

The TRANSMIT_PACKETS_ELEMENT structure is defined as

typedef struct _TRANSMIT_PACKETS_ELEMENT {
ULONG dwElFlags;
#define TP_ELEMENT_MEMORY 1
#define TP_ELEMENT_FILE 2
#define TP_ELEMENT_EOP 4
ULONG cLength;
union {
struct {
LARGE_INTEGER nFileOffset;
HANDLE hFile;
};
PVOID pBuffer;
};
} TRANSMIT_PACKETS_ELEMENT, *PTRANSMIT_PACKETS _ELEMENT,
FAR *LPTRANSMIT_PACKETS_ELEMENT;

The first field indicates the type of buffer contained in this element, either memory or file as given by TP_ELEMENT_MEMORY and TP_ELEMENT_FILE, respectively. The TP_ELEMENT_EOP flag can be bitwise OR'ed in with one of the other two flags. It indicates that this element should not be combined with the following element in a single send operation. This allows the application to shape how the traffic is placed on the wire. The cLength field indicates how many bytes to transfer from the file's memory buffer. If the element contains a file pointer, then a cLength of zero indicates transmit the entire file. The union contains either a pointer to a buffer in memory or a handle to an open file as well as an offset value into that file. It is possible to reference the same file handle in multiple elements of the TRANSMIT_PACKETS_ELEMENT. In this case, the offset can specify where to begin the transfer. Alternately, a value of - 1 indicates begin transmitting at the current file pointer position in that file.

A word of caution about using TransmitPackets with datagram sockets: the system is able to process and queue the send requests extremely fast, and it is possible that too many datagrams will pile up in the protocol driver. At this point, for unreliable protocols it is perfectly acceptable for the system to drop packets before they are even sent on the wire!

The TransmitPackets extension API is available on Windows XP and later version and is subject to the same type of limitation that TransmitFile is. On a non-server version of Windows NT, there can be only two outstanding TransmitPackets (or TransmitFile) calls at any given time.

ConnectEx

The ConnectEx extension function is a much-needed API available with Windows XP and later versions. This function allows for overlapped connect calls. Previously, the only way to issue multiple connect calls without using one thread for each connect was to use multiple non-blocking connects, which can be cumbersome to manage. This function is defined as

BOOL
(PASCAL FAR *LPFN_CONNECTEX) (
IN SOCKET s,
IN const struct sockaddr FAR *name,
IN int namelen,
IN PVOID lpSendBuffer,
IN DWORD dwSendDataLength,
OUT LPDWORD lpdwBytesSent,
IN LPOVERLAPPED lpOverlapped
);

The first parameter is a previously bound socket. The name parameter indicates the remote address to connect to and namelen is the length of that socket address structure. The lpSendBuffer is an optional pointer to a block of memory to send after the connection has been established, and dwSendDataLength indicates the number of bytes to send. lpdwBytesSent is updated to indicate the number of bytes sent successfully after the connection was established, if the operation completed immediately. lpOverlapped is the OVERLAPPED structure associated with this operation. This extension function can be called only in an overlapped manner.

Like with AcceptEx function, because ConnectEx is designed for performance, any previously set socket options or attributes are not automatically copied to the connected socket. To do so, the application must call SO_UPDATE_CONNECT_CONTEXT on the socket after the connection is established. In addition, as with AcceptEx, socket handles that have been "disconnected and re-used," either by TransmitFile, TransmitPackets, or DisconnectEx, may be used as the socket parameter to ConnectEx.

There isn't anything difficult about the ConnectEx API, and the only requirement is the socket passed into ConnectEx needs to be previously bound with a call to bind. There are no special flags, and it simply is an overlapped version of connect with the optional bonus of sending a block of data after the connection is established.

DisconnectEx

This extension API is simple. It takes a socket handle and performs a transport level disconnect and prepares the socket handle for re-use in a subsequent AcceptEx call. Both the TransmitFile and TransmitPackets APIs allow the socket to be disconnected and re-used after the send operation completes, but this standalone API was introduced for those applications that don't use either of those two APIs before shutting down. This extension API is available with Windows XP or later versions. However, for Windows 2000 or Windows NT 4.0 it is possible to call TransmitFile with a null file handle and buffers but specify the disconnect and re-use flags, which will achieve the same results. This API is defined as

typedef
BOOL
(PASCAL FAR * LPFN_DISCONNECTEX) (
IN SOCKET s,
IN LPOVERLAPPED lpOverlapped,
IN DWORD dwFlags,
IN DWORD dwReserved
);

The first two parameters are self-explanatory. The dwFlags parameter can specify zero or TF_REUSE_SOCKET. If the flags are zero, then this function simply disconnects the connection. To be able to re-use the socket in AcceptEx, the TF_REUSE_SOCKET flag must be specified. The last parameter must be zero; otherwise, WSAEINVAL will be returned. If this function is invoked with an overlapped structure and if there are still pending operations on the socket, the DisconnectEx call will return FALSE with the error WSA_IO_PENDING. The operation will complete once all pending operations are finished and the transport level disconnect has been issued. Otherwise, if it is called in a blocking manner, the function will not return until pending I/O is completed and the disconnect has been issued. Note that the DisconnectEx function works only on connection-oriented sockets.

WSARecvMsg

This last extension function is not too interesting in the discussion of high- performance, scalable I/O, but it is new to Windows XP (and later versions) and we chose to be consistent and cover it with the rest of the extension APIs. The WSARecvMsg is nothing more than a complicated WSARecv with the exception that it returns information about which interface the packet was received on. This is useful for datagram sockets that are bound to the local wildcard address on a multihomed machine and need to know which interface a packet arrived on. This function is defined as

typedef
INT
(PASCAL FAR * LPFN_WSARECVMSG) (
IN SOCKET s,
IN OUT LPWSAMSG lpMsg,
OUT LPDWORD lpdwNumberOfBytesRecvd,
IN LPWSAOVERLAPPED lpOverlapped,
IN LPWSAOVERLAPPED_COMPLETION_ROUTINE lpCo mpletionRoutine
);

Most of the parameters are self-explanatory. Unlike the other extension functions, which cannot be called with an overlapped completion routine, this one can. The parameter that requires explaining is lpMsg. This is a WSAMSG structure that contains the buffers for receiving data as well as the informational buffers that will contain information about the data received. This structure is defined as

typedef struct _WSAMSG {
LPSOCKADDR name; / * Remote address */
INT namelen; / * Remote address length */
LPWSABUF lpBuffers; / * Data buffer array */
DWORD dwBufferCount; / * Number of elements in the array */
WSABUF Control; / * Control buffer */
DWORD dwFlags; /* Flags */
} WSAMSG, *PWSAMSG, * FAR LPWSAMSG;

The first field is a buffer that will contain the address of the remote system and namelen specifies how large the address buffer is. lpBuffers and dwBufferCount are the same as in WSARecv. The Control field specifies a buffer that will contain the optional control data. Lastly, dwFlags is also the same as in WSARecv and WSARecvFrom. However, there are additional flags that can be returned that provide information about the packet received. These new flags are described in Table 6-2.

Table 6-2 Flags Returned from WSARecvMsg

Flag Description
MSG_BCAST Datagram was received as a link-layer broadcast or with a destination address that was a broadcast address.
MSG_TRUNC Datagram was truncated. There was more data that could be copied to the supplied receive buffer.
MSG_CTRUNC Control data was truncated. The buffer supplied in the WSAMSG Control field was too small to receive the control data.

By default, no control information is returned when WSARecvMsg is called. To enable control information, one or more socket options must be set on the socket, indicating the type of information to be returned. Currently, only one option is supported, which is IP_PKTINFO for IPv4 and IPV6_PKTINFO for IPv6. These options return information about which local interface the packet was received on. See Chapter 7 for more information about setting these options.

Once the appropriate socket option is set and the WSARecvMsg completes, the control information requested is returned via the Control buffer specified in the WSAMSG parameter. Each type of information requested is preceded by a WSACMSGHDR structure that indicates the type of information following as well as its size. This header structure is defined as

typedef struct _WSACMSGHDR {
SIZE_T cmsg_len;
INT cmsg_level;
INT cmsg_type;
/* followed by UCHAR cmsg_data[] */
} WSACMSGHDR, *PWSACMSGHDR, FAR *LPWSACMSGHDR;

Within MSWSOCK.H, several useful macros are defined that extract the message headers and their data.

Scalable Server Architecture

Now that we've introduced the Microsoft-specific extensions, we'll get into the details of implementing a scalable server. Because this chapter focuses on connection-oriented protocols such as TCP/IP, we will first discuss accepting connections followed by managing data transfers. The last section will discuss resource management in more detail.

Accepting Connections

The most common action a server performs is accepting connections. The Microsoft extension AcceptEx is the only Winsock function capable of accepting a client connection via overlapped I/O. As we mentioned previously, the AcceptEx function requires that the client socket be created beforehand by calling socket. The socket must be unbound and unconnected, although it is possible to re-use socket handles after calling TransmitFile, TransmitPackets, or DisconnectEx.

A responsive server must always have enough AcceptEx calls outstanding so that incoming client connections may be handled immediately. However, there is no magic number of outstanding AcceptEx calls that will guarantee that the server will be able to accept the connection immediately. Remember that the TCP/IP stack will automatically accept connections on behalf of the listening application, up to the backlog limit. For Windows NT Server, the maximum backlog value is currently 200. If a server posts 15 AcceptEx calls and then a burst of 50 clients connect to the server, none of the clients' connections will be rejected. The server's accept calls will satisfy the first 15 connections and the system will accept the remaining connections silently—this dips into the backlog amount so that the server will be able to accept 165 additional connections. Then when the server posts additional AcceptEx calls, they will succeed immediately because one of the system queued connections will be returned.

The nature of the server plays an important role in determining how many AcceptEx operations to post. For example, a server that is expected to handle many short-lived connections from a great number of clients may want to post more concurrent AcceptEx operations than a server that handles fewer connections with longer lifetimes. A good strategy is to allow the number of AcceptEx calls to vary between a low and high watermark. An application can keep track of the number of outstanding AcceptEx operations that are pending. Then, when one or more of those completes and the outstanding count decreases below the set watermark, additional AcceptEx calls may be posted. Of course, if at some point an AcceptEx completes and the number of outstanding accepts is greater than or equal to the high watermark then no additional calls should be posted in the handling of the current AcceptEx.

On Windows 2000 and later versions, Winsock provides a mechanism for determining if an application is running behind in posting adequate AcceptEx calls. When creating the listening socket, associate it with an event by using the WSAEventSelect API call and registering for FD_ACCEPT notification. If there are no pending AcceptEx operations but there are incoming client connections (accepted by the system according to the backlog value), then the event will be signaled. This can even be used as an indication to post additional AcceptEx operations.

One significant benefit of using AcceptEx is the capability to receive data in addition to accepting the client connection. For servers whose clients send an initial request this is ideal. However, as we mentioned in Chapter 5, the AcceptEx operation will not complete until at least one byte of data has been received. To prevent malicious attacks or stale connections, a server should cycle through all client socket handles in outstanding AcceptEx operations and call getsockopt with SO_CONNECT_TIME, which will return regardless of whether the socket is actually connected. If it is connected, the return value is greater than zero. A value of -1 indicates it is not connected. If the WSAEventSelect suggestion is implemented, then when the event is signaled it is a good time to check whether the client socket handles in outstanding accept calls are connected. Once an AcceptEx call accepts an incoming connection, it will then wait to receive data, and at this point there is one less outstanding accept call. Once there are no remaining accepts, the event will be signaled on the next incoming client connection. As a word of warning, applications should not under any circumstances close a client socket handle used in an AcceptEx call that has not been accepted because it can lead to memory leaks. For performance reasons, the kernel-mode structures associated with an AcceptEx call will not be cleaned up when the unconnected client handle is closed until a new client connection is established or until the listening socket is closed.

Although it may seem logical and simpler to post AcceptEx requests in one of the worker threads handling notification from the completion port, you should avoid this because socket creation process is expensive. In addition, any complex computations should be avoided within the worker threads so the server may process the completion notifications as fast as possible. One reason socket creation is expensive is the layered architecture of Winsock 2.0. When the server creates a socket, it may be routed through multiple providers, each performing their own tasks, before the socket is created and returned to the application. Chapter 12 discusses layered providers in detail. Instead, a server should create client sockets and post AcceptEx operations from a separate thread. When an overlapped AcceptEx completes in the worker thread, an event can be used to signal the accept issuing thread.

Data Transfers

Once clients are connected, the server will need to transfer data. This process is fairly straightforward, and once again, all data sent or received should be performed with overlapped I/O. By default, each socket has an associated send and receive buffer that is used to buffer outgoing and incoming data, respectively. In most cases these buffers should be left alone, but it is possible to change them or set them to zero by calling setsockopt with the SO_SNDBUF or SO_RCVBUF options.

Let's look at how the system handles a typical send call when the send buffer size is non-zero. When an application makes a send call, if there is sufficient buffer space, the data is copied into the socket's send buffers, the call completes immediately with success, and the completion is posted. On the other hand, if the socket's send buffer is full, then the application's send buffer is locked and the send call fails with WSA_IO_PENDING. After the data in the send buffer is processed (for example, handed down to TCP for processing), then Winsock will process the locked buffer directly. That is, the data is handed directly to TCP from the application's buffer and the socket's send buffer is completely bypassed.

The opposite is true for receiving data. When an overlapped receive call is performed, if data has already been received on the connection, it will be buffered in the socket's receive buffer. This data will be copied directly into the application's buffer (as much as will fit), the receive call returns success, and a completion is posted. However, if the socket's receive buffer is empty, when the overlapped receive call is made, the application's buffer is locked and the call fails with WSA_IO_PENDING. Once data arrives on the connection, it will be copied directly into the application's buffer, bypassing the socket's receive buffer altogether.

Setting the per-socket buffers to zero generally will not increase performance because the extra memory copy can be avoided as long as there are always enough overlapped send and receive operations posted. Disabling the socket's send buffer has less of a performance impact than disabling the receive buffer because the application's send buffer will always be locked until it can be passed down to TCP for processing. However, if the receive buffer is set to zero and there are no outstanding overlapped receive calls, any incoming data can be buffered only at the TCP level. The TCP driver will buffer only up to the receive window size, which is 17 KB—TCP will increase these buffers as needed to this limit; normally the buffers are much smaller. These TCP buffers (one per connection) are allocated out of non-paged pool, which means if the server has 1000 connections and no receives posted at all, 17 MB of the non- paged pool will be consumed! The non-paged pool is a limited resource, and unless the server can guarantee there are always receives posted for a connection, the per-socket receive buffer should be left intact.

Only in a few specific cases will leaving the receive buffer intact lead to decreased performance. Consider the situation in which a server handles many thousands of connections and cannot have a receive posted on each connection (this can become very expensive, as you'll see in the next section). In addition, the clients send data sporadically. Incoming data will be buffered in the per-socket receive buffer and when the server does issue an overlapped receive, it is performing unnecessary work. The overlapped operation issues an I/O request packet (IRP) that completes, immediately after which notification is sent to the completion port. In this case, the server cannot keep enough receives posted, so it is better off performing simple non-blocking receive calls.

TransmitFile and TransmitPackets

For sending data, servers should consider using the TransmitFile and TransmitPackets API functions where applicable. The benefit of these functions is that a great deal of data can be queued for sending on a connection while incurring just a single user-to-kernel mode transition. For example, if the server is sending file data to a client, it simply needs to open a handle to that file and issue a single TransmitFile instead of calling ReadFile followed by a WSASend, which would invoke many user-to-kernel mode transitions. Likewise, if a server needs to send several memory buffers, it also can build an array of TRANSMIT_PACKETS_ELEMENT structures and use the TransmitPackets API. As we mentioned, these APIs allow you to disconnect and re-use the socket handles in subsequent AcceptEx calls.

Resource Management

On a machine with sufficient resources, a Winsock server should have no problem handling thousands of concurrent connections. However, as the server handles increasingly more concurrent connections, a resource limitation will eventually be encountered. The two limits most likely to be encountered are the number of locked pages and non-paged pool usage. The locked pages limitation is less serious and more easily avoided than running out of the non-paged pool.

With every overlapped send or receive operation, it is probable that the data buffers submitted will be locked. When memory is locked, it cannot be paged out of physical memory. The operating system imposes a limit on the amount of memory that may be locked. When this limit is reached, overlapped operations will fail with the WSAENOBUFS error. If a server posts many overlapped receives on each connection, this limit will be reached as the number of connections grow. If a server anticipates handling a very high number of concurrent clients, the server can post a single zero byte receive on each connection. Because there is no buffer associated with the receive operation, no memory needs to be locked. With this approach, the per-socket receive buffer should be left intact because once the zero-byte receive operation completes, the server can simply perform a non-blocking receive to retrieve all the data buffered in the socket's receive buffer. There is no more data pending when the non-blocking receive fails with WSAEWOULDBLOCK. This design would be for servers that require the maximum possible concurrent connections while sacrificing the data throughput on each connection.

Of course, the more you are aware of how the clients will be interacting with the server, the better. In the previous example, a non-blocking receive is performed once the zero-byte receive completes to retrieve the buffered data. If the server knows that clients send data in bursts, then once the zero-byte receive completes, it may post one or more overlapped receives in case the client sends a substantial amount of data (greater than the per-socket receive buffer that is 8 KB by default).

Another important consideration is the page size on the architecture the server is running on. When the system locks memory passed into overlapped operations, it does so on page boundaries. On the x86 architecture, pages are locked in multiples of 4 KB. If an operation posts a 1 KB buffer, then the system is actually locking a 4 KB chunk of memory. To avoid this waste, overlapped send and receive buffers should be a multiple of the page size. The Windows API GetSystemInfo can be called to obtain the page size for the current architecture.

Hitting the non-paged pool limit is a much more serious error and is difficult to recover from. Non-paged pool is the portion of memory that is always resident in physical memory and can never be paged out. Kernel- mode operating system components, such as a driver, typically use the non-paged pool that includes Winsock and the protocol drivers such as tcpip.sys. Each socket created consumes a small portion of non-paged pool that is used to maintain socket state information. When the socket is bound to an address, the TCP/IP stack allocates additional non-paged pool for the local address information. When a socket is then connected, a remote address structure is also allocated by the TCP/IP stack. In all, a connected socket consumes about 2 KB of non-paged pool and a socket returned from accept or AcceptEx uses about 1.5 KB of non-paged pool (because an accepted socket needs only to store the remote address). In addition, each overlapped operation issued on a socket requires an I/O request packet to be allocated, which uses approximately 500 non-paged pool bytes.

As you can see, the amount of non-paged pool each connection uses is not great; however, as the number of clients connecting increases, the amount of non-paged pool the server uses can be significant. For example, consider a server running Windows 2000 (or greater) with 1 GB physical memory. For this amount of memory there will be 256 MB set aside for the non-paged pool. In general, the amount of non-paged pool allocated is one quarter the amount of physical memory with a 256 MB limit on Windows 2000 and later versions and a limit of 128 MB on Windows NT 4.0. With 256 MB of non-paged pool, it is possible to handle 50,000 or more connections, but care must be taken to limit the number of overlapped operations queued for accepting new connections as well as sending and receiving on existing connections. In this example, the connected sockets alone consume 75 MB on non-paged pool (assuming each socket uses 1.5 KB of non-paged pool as mentioned). Therefore, if the zero-byte overlapped receive strategy is used, then a single IRP is allocated for each connection, which uses another 25 MB of non-paged pool.

If the system does run out of non-paged pool, there are two possibilities. In the best-case scenario, Winsock calls will fail with WSAENOBUFS. The worst-case scenario is the system crashes with a terminal error. This typically occurs when a kernel mode component (such as a third-party driver) doesn't handle a failed memory allocation correctly. As such there is no guaranteed way to recover from exhausting the non- paged pool, and furthermore, there is no reliable way of monitoring the available amount of non-paged pool because any kernel mode component can chew up non-paged pool. The main point of this discussion is that there is no magical or programmatic method of determining how many concurrent connections and overlapped operations are acceptable. Also, it is virtually impossible to determine whether the system has run out of non- paged pool or exceeded the locked page count because both will result in Winsock calls failing with WSAENOBUFS. Testing must be performed on the server. Because of these factors, the developer must test the server's performance with varying numbers of concurrent connections and overlapped operations in order to find a happy medium. If programmatic limits are imposed to prevent the server from exhausting non-paged pool, you will know that any WSAENOBUFS failures are generally the result of exceeding the locked page limit, and that can be handled in a graceful manner programmatically, such as further restricting the number of outstanding operations or closing some of the connections.

Server Strategies

In this section, we'll take a look at several strategies for handling resources depending on the nature of the server. Also, the more control you have over the design of the client and server allows you to design both accordingly to avoid the limitations and bottlenecks discussed previously. Again, there is no foolproof method that will work 100 percent in all situations. Servers can be divided roughly into two categories: high throughput and high connections. A high throughput server is more concerned with pushing data on a small number of connections. Of course, the meaning of the phrase "small number of connections" is relative to the amount of resources available on the server. A high connection server is more concerned with handling a large number of connections and is not attempting to push large data amounts.

In the next two sections, we'll discuss both high throughput and high connection server strategies. After that, we'll look at performance numbers gathered from the server samples provided on the companion CD.

High Throughput

An FTP server is an example of a high throughput server. It is concerned with delivering bulk content. In this case, the server is concerned with processing each connection to minimize the amount of time required to transfer the data. To do so, the server must limit the number of concurrent connections because the greater the simultaneous connections, the lower the throughput will be on each connection. An example would be an FTP server that refuses a connection because it is too busy.

The goal for this strategy is I/O. The server should keep enough receives or sends posted to maximize throughput. Because each overlapped I/O requires memory to be locked as well as a small portion of non- paged pool for each IRP associated with the operation, it is important to limit I/O to a small set of connections. It is possible for the server to continually accept connections and have a relatively high number of established connections, but I/O must be limited to a smaller set.

In this case, the server may post a number of sends or receives on a subset of the established clients. For example, the server could handle client connections in a first-in, first-out manner and post a number of overlapped sends and/or receives on the first 100 connections. After those clients are handled, the server can move on the next set of clients in the queue. In this model, the number of outstanding send and receive operations are limited to a smaller set of connections. This prevents the server from blindly posting I/O operations on every connection, which could quickly exhaust the server's resources.

The server should take care to monitor the number of operations outstanding on each connection so it may prevent malicious clients from attacking it. For example, a server designed to receive data from a client, process it, and send some sort of response should keep track of how many sends are outstanding. If the client is simply flooding the server with data but not posting any receives, the server may end up posting dozens of overlapped sends that will never complete. In this case, once the server finds that there are too many outstanding operations, it can close the connection.

Maximizing Connections

Maximizing the number of concurrent client connections is the more difficult of the two strategies. Handling the I/O on each connection becomes difficult. A server cannot simply post one or more sends or receives on each connection because the amount of memory (both in terms of locked pages and non-paged pool) is great. In this scenario, the server is interested in handling many connections at the expense of throughput on each connection. An example of this would be an instant messenger server. The server would handle many thousands of connections but would need to send or receive only a small number of bytes at a time.

For this strategy, the server does not necessarily want to post an overlapped receive on each connection because this would involve locking many pages for each of the receive buffers. Instead, the server can post an overlapped zero-byte receive. Once the receive completes, the server would perform a non-blocking receive until WSAEWOUDLBLOCK is returned. This allows the server to immediately receive all buffered data received on that connection. Because this model is geared toward clients that send data intermittently, it minimizes the number of locked pages but still allows processing of data on each connection.

Performance Numbers

This section covers performance numbers from the different servers provided in Chapters 5 and 6. The various servers tested are those using blocking sockets, non-blocking with select, WSAAsyncSelect, WSAEventSelect, overlapped I/O with events, and overlapped I/O with completion ports. Table 6-3 summarizes the results of these tests. For each I/O model, there are a couple of entries. The first entry is where 7000 connections were attempted from three clients. For all of these tests, the server is an echo server. That is, for each connection that is accepted, data is received and sent back to the client. The first entry for each I/O model represents a high-throughput server where the client sends data as fast as possible to the server. Each of the sample servers does not limit the number of concurrent connections. The remaining entries represent the connections when the clients limit the rate in which they send data so as to not overrun the bandwidth available on the network. The second entry for each I/O model represents 12,000 connections from the client, which is rate limiting the data sent. If the server was able to handle the majority of the 12,000 connections, then the third entry is the maximum number of clients the server was able to handle.

As we mentioned, the servers used are those provided from Chapter 5 except for the I/O completion port server, which is a slightly modified version of the Chapter 5 completion port server except that it limits the number of outstanding operations. This completion port server limits the number of outstanding send operations to 200 and posts just a single receive on each client connection. The client used in this test is the I/O completion port client from Chapter 5. Connections were established in blocks of 1000 clients by specifying the ‘-c 1000' option on the client. The two x86-based clients initiated a maximum of 12,000 connections and the Itanium system was used to establish the remaining clients in blocks of 4000. In the tests that were rate limited, each client block was limited to 200,000 bytes per second (using the ‘-r 200000' switch). So the average send throughput for that entire block of clients was limited to 200,000 bytes per second (not that each client was limited to this amount).

Table 6-3 I/O Method Performance Comparison

I/O Model Attempted/Con nected Memory Used (KB) Non- Paged Pool CPU Usage Threads Throughput (Send/ Receive Bytes Per Second)
Blocking 7000/
1008
25,632 36,12 1 10– 60% 2016 2,198,148/
2,198,148
  12,000/
1008
25,408 36,35 2 5–40% 2016 404,227/
402,227
Non-
blocking
7000/
4011
4208 135,1 23 95– 100 %* 1 0/0
  12,000/
5779
5224 156,2 60 95– 100 %* 1 0/0
WSA-
Async
Select
7000/
1956
3640 38,24 6 75– 85% 3 1,610,204/
1,637,819
  12,000/
4077
4884 42,99 2 90– 100 % 3 652,902/
652,902
WSA-
Event
Select
7000/
6999
10,502 36,40 2 65– 85% 113 4,921,350/
5,186,297
  12,000/
11,080
19,214 39,04 0 50– 60% 192 3,217,493/
3,217,493
  46,000/
45,933
37,392 121,6 24 80– 90% 791 3,851,059/
3,851,059
Over-
lapped (events)
7000/
5558
21,844 34,94 4 65– 85% 66 5,024,723/
4,095,644
  12,000/ 12,000 60,576 48,06 0 35– 45% 195 1,803,878/
1,803,878
  49,000/ 48,997 241,208 155,4 80 85– 95% 792 3,865,152/
3,834,511
Over-
lapped (comple-
tion port)
7000/
7000
36,160 31,12 8 40– 50% 2 6,282,473/
3,893,507
  12,000/ 12,000 59,256 38,86 2 40– 50% 2 5,027,914/
5,027,095
  50,000/ 49,997 242,272 148,1 92 55– 65% 2 4,326,946/
4,326,496

The server was a Pentium 4 1.7 GHz Xeon with 768 MB memory. Clients were established from three machines: Pentium 2 233MHz with 128 MB memory, Pentium 2 350 MHz with 128 MB memory, and an Itanium 733 MHz with 1 GB memory. The test network was a 100 MB isolated hub. All of the machines tested had Windows XP installed.

The blocking model is the poorest performing of all the models. The blocking server spawns two threads for each client connection: one for sending data and one for receiving it. In both test cases, the server was unable to handle a fraction of the connections because it hit a system resource limit on creating threads. Thus the CreateThread call was failing with ERROR_NOT_ENOUGH_MEMORY. The remaining client connections failed with WSAECONNREFUSED.

The non-blocking model faired only somewhat better. It was able to accept more connections but ran into a CPU limitation. The non-blocking server puts all the connected sockets into an FD_SET, which is passed into select. When select completes, the server uses the FD_ISSET macro to search to determine if that socket is signaled. This becomes inefficient because the number of connections increases. Just to determine if a socket is signaled, a linear search through the array is required! To partially alleviate this problem, the server can be redesigned so that it iteratively steps through the FD_SETs returned from select. The only issue is that the server then needs to be able to quickly find the SOCKET_INFO structure associated with that socket handle. In this case, the server can provide a more sophisticated cataloging mechanism, such as a hash tree, which allows quicker lookups. Also note that the non-paged pool usage is extremely high. This is because both AFD and TCP are buffering data on the client connections because the server is unable to read the data fast enough (as indicated by the zero-byte throughput) as indicated by the high CPU usage.

The WSAAsyncSelect model is acceptable for a small number of clients but does not scale well because the overhead of the message loop quickly bogs down its capability to process messages fast enough. In both tests, the server is able to handle only about a third of the connections made. The clients receive many WSAECONNREFUSED errors indicating that the server cannot handle the FD_ACCEPT messages quickly enough so the listen backlog is not exhausted. However, even for those connections accepted, you will notice that the average throughput is rather low (even in the case of the rate limited clients).

Surprisingly, the WSAEventSelect model performed very well. In all the tests, the server was, for the most part, able to handle all the incoming connections while obtaining very good data throughput. The drawback to this model is the overhead required to manage the thread pool for new connections. Because each thread can wait on only 64 events, when new connections are established new threads have to be created to handle them. Also, in the last test case in which more than 45,000 connections were established, the machine became very sluggish. This was most likely due to the great number of threads created to service the many connections. The overhead for switching between the 791 threads becomes significant. The server reached a point at which it was unable to accept any more connections due to numerous WSAENOBUFS errors. In addition, the client application reached its limitation and was unable to sustain the already established connections (we'll discuss this in detail later).

The overlapped I/O with events model is similar to the WSAEventSelect in terms of scalability. Both models rely on thread pools for event notification, and both reach a limit at which the thread switching overhead becomes a factor in how well it handles client communication. The performance numbers for this model almost exactly mirror that of WSAEventSelect. It does surprisingly well until the number of threads increases.

The last entry is for overlapped I/O with completion ports, which is the best performing of all the I/O models. The memory usage (both user and non-paged pool) and accepted clients are similar to both the overlapped I/O with events and WSAEventSelect model. However, the real difference is in CPU usage. The completion port model used only around 60 percent of the CPU, but the other two models required substantially more horsepower to maintain the same number of connections. Another significant difference is that the completion port model also allowed for slightly better throughput.

While carrying out these tests, it became apparent that there was a limitation introduced due to the nature of the data interaction between client and server. The server is designed to be an echo server such that all data received from the client was sent back. Also, each client continually sends data (even if it's at a lower rate) to the server. This results in data always pending on the server's socket (either in the TCP buffers or in AFD's per-socket buffers, which are all non-paged pool). For the three well-performing models, only a single receive is performed at a time; however, this means that for the majority of the time, there is still data pending. It is possible to modify the server to perform a non-blocking receive once data is indicated on the connection. This would drain the data buffered on the machine. The drawback to this approach in this instance is that the client is constantly sending and it is possible that the non-blocking receive could return a great deal of data, which would lead to starvation of other connections (as the thread or completion thread would not be able to handle other events or completion notices). Typically, calling a non-blocking receive until WSAEWOULDBLOCK works on connections where data is transmitted in intervals and not in a continuous manner.

From these performance numbers it is easily deduced that WSAEventSelect and overlapped I/O offer the best performance. For the two event based models, setting up a thread pool for handling event notification is cumbersome but still allows for excellent performance for a moderately stressed server. Once the connections increase and the number of threads increases, then scalability becomes an issue as more CPU is consumed for context switching between threads. The completion port model still offers the ultimate scalability because CPU usage is less of a factor as the number of clients increases.

Winsock Direct and Sockets Direct Protocol

Winsock Direct is a high-speed interconnect introduced on Windows 2000 Datacenter Server. It is a protocol that runs over special hardware available from several vendors, such as Giganet, Compaq, and others. What is so special about Winsock Direct is that it completely bypasses the TCP stack and goes directly to the network interface card, which allows for extremely high-speed data communications. The advantage of Winsock Direct is that it is completely transparent to a TCP Winsock application. That is, if a TCP application is run on a machine with a Winsock Direct capable card, it transparently goes over the Winsock Direct route (given that it is the appropriate route) instead of over a regular Ethernet connection.

The Sockets Direct Protocol is the next evolution of the Winsock Direct protocol. It is designed to run over Infiniband-compatible hardware available in future releases of the Windows operating system. The on the wire protocol is slightly different than that of Winsock Direct but it is still transparent to the applications.

Because Winsock Direct is designed to be transparent, the same issues encountered with "regular" Winsock applications still apply when running over Winsock Direct. Applications still have to manage the number of outstanding overlapped operations so as to not exceed the locked pages or non-paged pool limits.

Conclusion

This chapter focused on writing high-performance, scalable Winsock servers for Windows NT–based operating systems. We discussed several of the Microsoft-specific Winsock extensions that greatly aid programmers in developing these servers. In addition, we covered several approaches to accepting connections so as to minimize the chance a client will receive a connection refused as well as how throughput can be maximized. Afterward we covered resource management, which is the core concept required to writing high performance servers. Finally, we compared the performance of the various I/O models introduced in Chapter 5 to see how well they scale when many client connections are attempted.

Read More Show Less

Customer Reviews

Average Rating 5
( 2 )
Rating Distribution

5 Star

(2)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing all of 2 Customer Reviews
  • Anonymous

    Posted January 28, 2002

    Comprehensive, accessible expert insight

    Finally, the highly anticipated update to the industry standard network application programming bible, Network Programming for Microsoft Windows Second Edition is here! Just as the system developer found the First Edition to be indiscerptible from his programming library, the Second Edition whisks you away to sites before unseen! IPv6 and the various multicast models supported on Windows XP and .Net Server Family, get heavy treatment here. Performance ¿ performance ¿ performance and reliability! Seamlessly integrate support for multiple network configurations into your application, freeing your mobile users to roam from the wired LAN to wireless or even dial-up environments at will. Second Edition ¿ the wait is finally over!

    Was this review helpful? Yes  No   Report this review
  • Anonymous

    Posted May 15, 2001

    The Windows Networking Standard

    Authors tend to obfuscate topics they don't entirely grasp; and very few master the intricacies of difficult subjects, especially those such as network programming. Jones is to Windows programming what Stevens was to Unix. Here you will find, concise language, solid experience and real-world industrial-strength examples. Especially notable is the extensive coverage of the various I/O models and their applications. If you have only one book on network programming in your library... Make it this one!

    Was this review helpful? Yes  No   Report this review
Sort by: Showing all of 2 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)