Upload
venkatritch
View
149
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction To Socket Programming
Prof NB VenkateswarluB.Tech(SVU), M.Tech(IIT-K), Ph.D(BITS, Pilani), PDF(U of Leeds,UK)
ISTE Visiting Fellow 2010-11AITAM, Tekkali
Many Thanks to OrganizersandThanks to Participants
A Small Dose of Questions To Know You Little Briefing about Unix Internals Recapitulation of What is Internet Variety of Addresses involved Socket Concepts Related System Calls Simple TCP Client and Server in action Simple UDP Client and Server in action What is DNS
Today's Themes
What is the Difference between Data Communications and Computer Networks?.
What is firmware?. Why do we need to split a message?. Why do we require so many levels of
control? (Is network system is reliable?) What are physical and logical addresses?. What is the conceptual difference between
DLL and NLL?.
What is the Difference between Internet and internet?
What is fork()? What is signal? What is Process and Thread? What is a device driver?. What is a daemon?. What is exec() What are locks?.
Just Probing Your Unix and OS Knowledge
6
A collection of interconnected networks
Networks: Different depts, labs, etc.
Router: node that connects distinct networks
Host: network endpoints (computer, PDA, light switch, …)
Together, an independently administered entity◦ Enterprise, ISP, etc.
Internetwork
Internet[work]
EE ME
CS
7
Many differences between networks◦ Address formats◦ Performance –
bandwidth/latency◦ Packet size◦ Loss
rate/pattern/handling◦ Routing
How to translate and inter-operate?
Internetwork Challenges
Internet[work]
802.3 Frame relay
ATM
8
Internet vs. internet The Internet: the interconnected set of
networks of the Internet Service Providers (ISPs) and end-networks, providing data communications services.◦ Network of internetworks, and more◦ About 17,000 different ISP networks make up
the Internet◦ Many other “end” networks◦ 100,000,000s of hosts
“The Internet”
9
Links can be◦ Wired or wireless
DLL: Links
Node Link Node
10
NLL: Source To Destination
11
Routing
R
R
R
RRH
H
H
H
R
RH
R
Routers send packet towards destination
H: Hosts
R: Routers
Why do Need to Divide messages?.
Because of Noise Conditions of the ChannelsNoise is rated as: 1 in 105
13
Why do we need to split a message?.No Monopolization?
Packets
Better Link Utilization
14
Short bursts: buffer Buffer sizes varies from network to network. So,
fragmentation takes places What if buffer overflows?
◦ Packets dropped◦ Sender adjusts rate until load = resources “congestion control”
Why do we need to split a message?.What if Network is Overloaded?
Problem: Network Overload
Solution: Buffering and Congestion Control
15
Why do we need to split a message?.What if the Data Doesn’t Fit?
Problem: Packet size
Solution: Fragment data across packets
• On Ethernet, max packet is 1.5KB• Typical web page is 10KB
GETindex.html
GET index.html
16
Implements an agreement between parties on how communication should take place
What do You Understand about Protocol?
Friendly greeting
Muttered reply
Destination?
Madison
Thank you
17
Each protocol offers interfaces ◦ One to higher-level protocols on the same end
hosts Expects one from the layers on which it builds Interface characteristics, e.g. IP service model
◦ A “peer interface” to a counterpart on destinations Syntax and semantics of communications (Assumptions about) data formats
Protocols build upon each other◦ Adds value, improves functionality overall
E.g., a reliable protocol running on top of IP◦ Reuse, avoid re-writing
E.g., OS provides TCP, so apps don’t have to rewrite
1. Protocols Offer Interfaces
18
Protocols are the key to interoperability.◦ Networks are very heterogenous:
◦ The hardware/software of communicating parties are often not built by the same vendor
◦ Yet they can communicate because they use the same protocol
Actually implementations could be different But must adhere to same specification
Protocols exist at many levels.◦ Application level protocols◦ Protocols at the hardware level
2. Protocols Necessary for Interoperability
Ethernet: 3com, etc.Routers: cisco, juniper etc.App: Email, AIM, IE etc.
Hardware/linkNetworkApplication
19
One or more protocols implement the functionality in a layer
◦ Only horizontal (among peers) and vertical (in a host) communication
Protocols/layers can be implemented and modified in isolation
Each layer offers a service to the higher layer, using the services of the lower layer.
“Peer” layers on different systems communicate via a protocol.
◦ higher level protocols (e.g. TCP/IP, Appletalk) can run on multiple lower layers
◦ multiple higher level protocols can share a single physical network
How do protocols/layers work?
20
TCP/IP vs OSI
Application(plus
libraries)
TCP/UDPIP
Data link
Physical
Application
Presentation
Session
Transport
Network
Data link
Physical
21
The Reality: TCP/IP Model
FTP HTTP TFTPNV
TCP UDP
IP
NET1 NET2 NETn… Network protocols implemented by a comb of hw and sw.
Interconnection of n/w technologies into a single logical n/w
Two transport protocols: provide logical channels to apps
App protocols
Note: No strict layering.
App writers can define apps that run on any lower level protocols.
22
The Thin Waist
UDP TCP
Data Link
Physical
Applications
The Hourglass Model
Waist
The waist: minimal, carefully chosen functions. Facilitates interoperability and rapid evolution
FTP HTTP TFTPNV
TCP UDP
IP
NET1 NET2 NETn…
23
TCP/IP Layering
Bridge/SwitchRouter/GatewayHost Host
Application
Transport
Network
Link
Physical
24
Layers & Encapsulation
Get index.html
Connection ID
Source/Destination
Link Address
User A User B
Header
25
Multiple choices at each layer
How to know which one to pick?
Protocol Demultiplexing
FTP HTTP TFTPNV
TCP UDP
IP
NET1 NET2 NETn…
TCP/UDPIPMany
Networks
26
Multiplexing & Demultiplexing Multiple
implementations of each layer◦ How does the receiver
know what version/module of a layer to use?
Packet header includes a demultiplexing field◦ Used to identify the right
module for next layer◦ Filled in by the sender◦ Used by the receiver
Multiplexing occurs at multiple layers. E.g., IP, TCP, …
IP
TCP
IP
TCP
V/HL TOS Length
ID Flags/Offset
TTL Prot. H. Checksum
Source IP address
Destination IP address
Options..
27
TCP Reliable – guarantee
delivery Byte stream – in-order
delivery Checksum for validity Setup connection followed
by data transfer
Transmission Control Protocol (TCP)
Telephone Call• Guaranteed delivery• In-order delivery• Setup connection followed
by conversation
Example TCP applicationsWeb, Email, Telnet
28
User Datagram Protocol (UDP)
Example UDP applicationsMultimedia, voice over IP
UDP• No guarantee of delivery• Not necessarily in-order
delivery• No validity guaranteed• Must address each
independent packet
Postal Mail• Unreliable• Not necessarily in-order
delivery• Must address each reply
29
Transport Service Requirements of Common Applications
no lossno lossno lossloss-tolerant
loss-tolerantloss-tolerantno loss
elasticelasticelasticaudio: 5Kb-1Mbvideo:10Kb-5Mbsame as above few Kbpselastic
nononoyes, 100’s msec
yes, few secsyes, 100’s msecyes and no
file transfere-mail
web documentsreal-time audio/
videostored audio/videointeractive games
financial apps
Application Data loss Bandwidth Time Sensitive
What do you know about big-endian and Little-endian machine?
Byte OrderDifferent computers may have different internal representation of 16 / 32-bit integer (called host byte order).Examples
Big-Endian byte order (e.g., used by Motorola 68000):
Little-Endian byte order (e.g., used by Intel 80x86):
32
◦ TCP/IP specifies a network byte order which is the big-endian byte order.
◦ For some WinSock functions, their arguments (i.e., the parameters to be passed to these functions) must be stored in network byte order.
◦ WinSock provides functions to convert between host byte order and network byte order:
Prototypes of Conversion Functions
A Peep Into Unix Internals
What is a Process?
Processes• A process has text: machine instructions
(may be shared by other processes) data stack
• Process may execute either in user mode or in kernel mode.• Process information are stored in two places:
Process table User table
User mode and Kernel mode
• At any given instant a computer running the Unix system is either executing a process or the kernel itself is running• The computer is in user mode when it is executing instructions in a user process and it is in kernel mode when it is executing instructions in the kernel.• Executing System call ==> User mode to Kernel mode perform I/O operations system clock interrupt
Process Table • Process table: an entry in process table has the following information: process state:
A. running in user mode or kernel modeB. Ready in memory or Ready but swappedC. Sleep in memory or sleep and swapped
PID: process id UID: user id scheduling information signals that is sent to the process but not yet handled a pointer to per-process-region table
• There is a single process table for the entire system
User Table (u area)• Each process has only one private user table.• User table contains information that must be accessible while the process is in execution. A pointer to the process table slot parameters of the current system call, return values
error codes file descriptors for all open files current directory and current root process and file size limits.
• User table is an extension of the process table.
u area
Active process
residentswappable
data
stack
text
Processtable
Per-processregion table
Regiontable
Kerneladdressspace
useraddressspace
Shared Program Text and Software Libraries
• Many programs, such as shell, are often being executed by several users simultaneously. • The text (program) part can be shared.• In order to be shared, a program must be compiled using a special option that arranges the process image so that the variable part(data and stack) and the fixed part (text) are cleanly separated. • An extension to the idea of sharing text is sharing libraries.• Without shared libraries, all the executing programs contain their own copies.
Active process
data
stack
text
Processtable
Per-processregion table
Regiontable
data
stack
text
Referencecount = 2
System Call• A process accesses system resources through system call.• System call for
Process Control: fork: create a new process wait: allow a parent process to synchronize its
execution with the exit of a child process. exec: invoke a new program. exit: terminate process execution
File system: File: open, read, write, lseek, close inode: chdir, chown chmod, stat fstat others: pipe dup, mount, unmount, link, unlink
System call: fork()
• fork: the only way for a user to create a process in Unix operating system.• The process that invokes fork is called parent process and the newly created process is called child process.• The syntax of fork system call:
newpid = fork();• On return from fork system call, the two processes have identical copies of their user-level context except for the return value pid. • In parent process, newpid = child process id• In child process, newpid = 0;
/* forkEx1.c */#include <stdio.h>
main(){ int fpid; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d\n", fpid); } else { printf("Parent Process fpid=%d\n", fpid); } printf("After forking fpid=%d\n", fpid); }
$ cc forkEx1.c -o forkEx1$ forkEx1Before forking ...Child Process fpid=0After forking fpid=0Parent Process fpid=14707After forking fpid=14707$
/* forkEx2.c */#include <stdio.h>
main(){ int fpid; printf("Before forking ...\n"); system("ps"); fpid = fork(); system("ps"); printf("After forking
fpid=%d\n", fpid);}
$ forkEx2Before forking ... PID TTY TIME CMD 14759 pts/9 0:00 tcsh 14778 pts/9 0:00 sh 14777 pts/9 0:00 forkEx2 PID TTY TIME CMD 14781 pts/9 0:00 sh 14759 pts/9 0:00 tcsh 14782 pts/9 0:00 sh 14780 pts/9 0:00 forkEx2 14777 pts/9 0:00 forkEx2After forking fpid=14780$ PID TTY TIME CMD 14781 pts/9 0:00 sh 14759 pts/9 0:00 tcsh 14780 pts/9 0:00 forkEx2After forking fpid=0
$ ps PID TTY TIME CMD 14759 pts/9 0:00 tcsh$
System Call: getpid() getppid()
• Each process has a unique process id (PID). • PID is an integer, typically in the range 0 through 65535.• Kernel assigns the PID when a new process is created.• Processes can obtain their PID by calling getpid().• Each process has a parent process and a corresponding parent process ID.• Processes can obtain their parent’s PID by calling getppid().
/* pid.c */#include <stdio.h>#include <sys/types.h>#include <unistd.h>
main(){ printf("pid=%d ppid=%d\n",getpid(), getppid());}
$ cc pid.c -o pid$ pidpid=14935 ppid=14759$
/* forkEx3.c */#include <stdio.h>#include <sys/types.h>#include <unistd.h>main(){ int fpid; printf("Before forking ...\n"); if((fpid = fork())== 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } printf("After forking fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid());}
$ cc forkEx3.c -o forkEx3$ forkEx3Before forking ...Parent Process fpid=14942 pid=14941 ppid=14759After forking fpid=14942 pid=14941 ppid=14759$ Child Process fpid=0 pid=14942 ppid=1After forking fpid=0 pid=14942 ppid=1
$ ps PID TTY TIME CMD 14759 pts/9 0:00 tcsh
System Call: wait()
• wait system call allows a parent process to wait for the demise of a child process. • See forkEx4.c
#include <stdio.h>#include <sys/types.h>#include <unistd.h>main(){ int fpid, status; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid()); } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid()); } wait(&status); printf("After forking fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());}
$ cc forkEx4.c -o forkEx4$ forkEx4Before forking ...Parent Process fpid=14980 pid=14979 ppid=14759Child Process fpid=0 pid=14980 ppid=14979After forking fpid=0 pid=14980 ppid=14979After forking fpid=14980 pid=14979 ppid=14759$
System Call: exec()
• exec() system call invokes another program by replacing the current process• No new process table entry is created for exec() program. Thus, the total number of processes in the system isn’t changed.• Six different exec functions: execlp, execvp, execl, execv, execle, execve, (see man page for more detail.)• exec system call allows a process to choose its successor.
int execl(file_name, arg0 [, arg1, ..., argn], NULL) char *file_name, *arg0, *arg1, ..., *argn; int execv(file_name, argv) char *file_name, *argv[]; int execle(file_name, arg0 [, arg1, ..., argn], NULL, envp) char *file_name, *arg0, *arg1, ..., *argn, *envp[]; int execve(file_name, argv, envp) char *file_name, *argv[], *envp[]; int execlp(file_name, arg0 [, arg1, ..., argn], NULL) char *file_name, *arg0, *arg1, ..., *argn; int execvp(file_name, argv) char *file_name, *argv[];
/* execEx1.c */#include <stdio.h>#include <unistd.h>
main(){ printf("Before execing ...\n"); execl("/bin/date", "date", 0); printf("After exec\n"); }
$ execEx1Before execing ...Sun May 9 16:39:17 CST 1999$
/* execEx2.c */#include <sys/types.h>#include <unistd.h> #include <stdio.h>
main(){ int fpid; printf("Before execing ...\n"); fpid = fork(); if (fpid == 0) { execl("/bin/date", "date", 0); } printf("After exec and fpid=%d\n",fpid); }
$ execEx2Before execing ...After exec and fpid=14903$ Sun May 9 16:47:08 CST 1999$
Handling Signal
• A signal is a message from one process to another. • Signal are sometime called “software interrupt” • Signals usually occur asynchronously.• Signals can be sent A. by one process to anther (or to itself) B. by the kernel to a process.• Unix signals are content-free. That is the only thing that can be said about a signal is “it has arrived or not”
Handling Signal
• Most signals have predefined meanings: A. sighup (HangUp): when a terminal is closed, the hangup signal is sent to every process in control terminal. B. sigint (interrupt): ask politely a process to terminate. C. sigquit (quit): ask a process to terminate and produce a codedump. D. sigkill (kill): force a process to terminate.• See signEx1.c
#include <stdio.h>#include <sys/types.h>#include <unistd.h>main() { int fpid, *status; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid()); for(;;); /* loop forever */ } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid()); } wait(status); /* wait for child process */ printf("After forking fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());}
$ cc sigEx1.c -o sigEx1$ sigEx1 &Before forking ...Parent Process fpid=14989 pid=14988 ppid=14759Child Process fpid=0 pid=14989 ppid=14988$ ps PID TTY TIME CMD 14988 pts/9 0:00 sigEx1 14759 pts/9 0:01 tcsh 14989 pts/9 0:09 sigEx1$ kill -9 14989$ ps ...
Scheduling Processes
• On a time sharing system, the kernel allocates the CPU to a process for a period of time (time slice or time quantum) preempts the process and schedules another one when time slice expired, and reschedules the process to continue execution at a later time.• The scheduler use round-robin with multilevel feedback algorithm to choose which process to be executed: A. Kernel allocates the CPU to a process for a time slice. B. preempts a process that exceeds its time slice. C. feeds it back into one of the several priority queues.
Process Priority
swapperwait for Disk IOwait for bufferwait for inode
...
wait for child exitUser level 0User level 1
User level n
...
Kernel ModeUser Mode
ProcessesPriority Levels
Process Scheduling (Unix System V)
• There are 3 processes A, B, C under the following assumptions: A. they are created simultaneously with initial priority 60. B. the clock interrupt the system 60 times per second. C. these processes make no system call. D. No other process are ready to run E. CPU usage calculation: CPU = decay(CPU) = CPU/2 F. Process priority calculation: priority = CPU/2 + 60. G. Rescheduling Calculation is done once per second.
Process A Priority CPU count
Process B Priority CPU count
Process C Priority CPU count
60 0 … 60
75 30
67 15
63 7 …
6776 33
60 0
60 0 … 60
75 30
67 15
63 7 ...
60 0
60 0
60 0 … 60
75 30
67 15
1
2
3
4
0
Booting
• When the computer is powered on or rebooted, a short built-in program (maybe store in ROM) reads the first block or two of the disk into memory. These blocks contain a loader program, which was placed on the disk when disk is formatted.• The loader is started. The loader searches the root directory for /unix or /root/unix and load the file into memory• The kernel starts to execute.
The first processes
• The kernel initializes its internal data structures: it constructs linked list of free inodes, regions, page table• The kernel creates u area and initializes slot 0 of process table• Process 0 is created• Process 0 forks, invoking the fork algorithm directly from the Kernel. Process 1 is created.• In kernel mode, Process 1 creates user-level context (regions) and copy code (/etc/init) to the new region.• Process 1 calls exec (executes init).
init process
• The init process is a process dispatcher:spawning processes, allow users to login.• Init reads /etc/inittab and spawns getty• when a user login successfully, getty goes through a login procedure and execs a login shell.• Init executes the wait system call, monitoring the death of its child processes and the death of orphaned processes by exiting parent.
Init fork/execa getty progrma
to manage the line
Getty prints “login:” message and
waits for someoneto login
The login processprints the
password message, read the password
then check the password
The shell runsprograms for the
user unitl the user logs off
When the shelldies, init wakes up
and fork/exec a getty for the line
File Subsystem
• A file system is a collection of files and directories on a disk or tape in standard UNIX file system format. • Each UNIX file system contains four major parts: A. boot block: B. superblock: C. i-node table: D. data block: file storage
File System Layout
Block 0: bootstrap
Block 1: superblock
Block 2
Block n
...
Block n+1
The last Block
...
Block 2 - n:i-nodes
Block n+1 - last:Files
Boot Block
• A boot block may contains several physical blocks.• Note that a physical block contains 512 bytes
(or 1K or 2KB)• A boot block contains a short loader program for
booting• It is blank on other file systems.
Superblock
• Superblock contains key information about a file system• Superblock information: A. Size of a file system and status:
label: name of this file systemsize: the number of logic blocksdate: the last modification date of super block.
B. information of i-nodesthe number of i-nodesthe number of free i-nodes
C. information of data block: free data blocks.• The information of a superblock is loaded into memory.
I-nodes
• i-node: index node (information node)• i-list: the list of i-nodes • i-number: the index of i-list.• The size of an i-node: 64 bytes. • i-node 0 is reserved.• i-node 1 is the root directory.• i-node structure: next page
I-node structuremode
owner
timestamp
Size
Block count
Direct blocks0-9
Double indirect
Triple indirect
Single indirect
Data block
Data block
Data block
Indirect block
...
Data block
Data block
Data block
...
Indirect block
Indirect block
Indirect block
...
Reference count
I-node structure
• mode: A. type: file, directory, pipe, symbolic link B. Access: read/write/execute (owner, group,)
• owner: who own this I-node (file, directory, ...)• timestamp: creation, modification, access time• size: the number of bytes• block count: the number of data blocks• direct blocks: pointers to the data• single indirect: pointer to a data block which
pointers to the data blocks (128 data blocks).• Double indirect: (128*128=16384 data blocks)• Triple indirect: (128*128*128 data blocks)
Data Block
• A data block has 512 bytes. A. Some FS has 1K or 2k bytes per blocks.B. See blocks size effect (next page)
• A data block may contains data of files or data of a directory.• File: a stream of bytes. • Directory format:
i-# Next size File name pad
Report.txt
home
john
bin
find
alex jenny
notes
grep
i-# Next 10 Report.txt pad i-# Next 3
bin pad i-# Next 5 notes pad 0 Next
Boot Block
SuperBlock
i-node
i-node
i-node
i-node
...
...
...
Current Dir
Report.txt
source
notes...
...
...
...
i-nodes
Data Blocks
Report.txt
home
kc
source
find
alex
notes
grep
Device driver &
HardwarecontrolCurrent
directoryinode
u areai-node
i-node
i-node
...
...
In-core inodes
In-core inode table
• UNIX system keeps regular files and directories on block devices such as disk or tape, • Such disk space are called physical device address space.• The kernel deals on a logical level with file system (logical device address space) rather than with disks.• Disk driver can transfer logical addresses into physical device addresses.• In-core (memory resident) inode table stores the inode information in kernel space.
In-core inode table
• An in-core inode contains A. all the information of inode in disks. B. status of in-core inode inode is locked, inode data changed file data changed. C. the logic device number of the file system. D. inode number E. reference count
File table
• The kernel have a global data structure, called file table, to store information of file access.• Each entry in file table contains: A. a pointer to in-core inode table B. the offset of next read or write in the file C. access rights (r/w) allowed to the opening process. D. reference count.
User File Descriptor table
• Each process has a user file descriptor table to identify all opened files.• An entry in user file descriptor table pointer to an entry of kernel’s global file table.• Entry 0: standard input• Entry 1: standard output• Entry 2: error output
System Call: open
• open: A process may open a existing file to read or write• syntax: fd = open(pathname, mode); A. pathname is the filename to be opened B. mode: read/write• Example
#include <stdio.h>#include <sys/types.h>#include <fcntl.h>
main(){ int fd1, fd2, fd3; printf("Before open ...\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("./openEx1.c", O_WRONLY); fd3 = open("/etc/passwd", O_RDONLY); printf("fd1=%d fd2=%d fd3=%d \n", fd1, fd2, fd3);}
$ cc openEx1.c -o openEx1$ openEx1Before open ...fd1=3 fd2=4 fd3=5 $
…
CNT=2/etc/passwd
CNT=1./openEx2.c
in-core inodes
Pointer to Descriptor table
U area
User filedescriptor table
0
1
2
3
4
5
6
7
.
.
....
...
CNT=1 R
CNT=1 W
...
CNT=1 R
file table
...
...
...
System Call: read
• read: A process may read an opened file• syntax: fd = read(fd, buffer, count); A. fd: file descriptor B. buffer: data to be stored in C. count: the number (count) of byte• Example
#include <stdio.h>#include <sys/types.h>#include <fcntl.h>
main(){ int fd1, fd2, fd3; char buf1[20], buf2[20]; buf1[19]='\0'; buf2[19]='\0'; printf("=======\n"); fd1 = open("/etc/passwd", O_RDONLY); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd1, buf2, 19); printf("fd1=%d buf2=%s \n",fd1, buf2); printf("=======\n");}
$ cc openEx2.c -o openEx2$ openEx2=======fd1=3 buf1=root:x:0:1:Super-Usfd1=3 buf2=er:/:/sbin/shdaemo =======$
#include <stdio.h>#include <sys/types.h>#include <fcntl.h>main(){ int fd1, fd2, fd3; char buf1[20], buf2[20]; buf1[19]='\0'; buf2[19]='\0'; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("/etc/passwd", O_RDONLY); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd2, buf2, 19); printf("fd2=%d buf2=%s \n",fd2, buf2); printf("======\n");}
$ cc openEx3.c -o openEx3$ openEx3======fd1=3 buf1=root:x:0:1:Super-Us fd2=4 buf2=root:x:0:1:Super-Us ======$
…
CNT=2/etc/passwd
...
in-core inodes
Descriptortable
U area
User filedescriptor table
0
1
2
3
4
5
6
7
.
.
....
...
CNT=1 R
...
...
CNT=1 R
file table
...
...
...
System Call: dup
• dup: copy a file descriptor into the first free slot of the user file descriptor table.• syntax: newfd = dup(fd); A. fd: file descriptor Example
#include <stdio.h>#include <sys/types.h>#include <fcntl.h>main(){ int fd1, fd2, fd3; char buf1[20], buf2[20]; buf1[19]='\0'; buf2[19]='\0'; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = dup(fd1); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd2, buf2, 19); printf("fd2=%d buf2=%s \n",fd2, buf2); printf("======\n"); char buf1[20], buf2[20];}
$ cc openEx4.c -o openEx4$ openEx4======fd1=3 buf1=root:x:0:1:Super-Us fd2=4 buf2=er:/:/sbin/shdaemo ======$
…
CNT=1/etc/passwd
...
in-core inodes
Descriptortable
U area
User filedescriptor table
0
1
2
3
4
5
6
7
.
.
....
...
CNT=2 R
...
...
...
file table
...
...
...
System Call: creat
• creat: A process may create a new file by creat system call• syntax: fd = write(pathname, mode); A. pathname: file name B. mode: read/write Example
System Call: close
• close: A process may close a file by close system call• syntax: close(fd); A. fd: file descriptor Example
System Call: write
• write: A process may write data to an opened file• syntax: fd = write(fd, buffer, count); A. fd: file descriptor B. buffer: data to be stored in C. count: the number (count) of byte• Example
/* creatEx1.c */#include <stdio.h>#include <sys/types.h>#include <fcntl.h>main(){ int fd1; char *buf1="I am a string\n"; char *buf2="second line\n"; printf("======\n"); fd1 = creat("./testCreat.txt", O_WRONLY); write(fd1, buf1, 20); write(fd1, buf2, 30); printf("fd1=%d buf1=%s \n",fd1, buf1); close(fd1); chmod("./testCreat.txt", 0666); printf("======\n");}
$ cc creatEx1.c -o creatEx1$ creatEx1======fd1=3 buf1=I am a string ======$ ls -l testCreat.txt-rw-rw-rw- 1 cheng staff 50 May 10 20:37 testCreat.txt$ more testCreat.txt...
System Call: stat/fstat
• stat/fstat: A process may query the status of a file (locked) file type, file owner, access permission. file size, number of links, inode number, access time.• syntax: stat(pathname, statbuffer); fstat(fd, statbuffer); A. pathname: file name B. statbuffer: read in data C. fd: file descriptor Example
/* statEx1.c */#include <sys/stat.h>main(){ int fd1, fd2, fd3; struct stat bufStat1, bufStat2; char buf1[20], buf2[20]; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("./statEx1", O_RDONLY); fstat(fd1, &bufStat1); fstat(fd2, &bufStat2); printf("fd1=%d inode no=%d block size=%d blocks=%d\n", fd1, bufStat1.st_ino,bufStat1.st_blksize, bufStat1.st_blocks); printf("fd2=%d inode no=%d block size=%d blocks=%d\n", fd2, bufStat2.st_ino,bufStat2.st_blksize, bufStat2.st_blocks); printf("======\n");}
$ cc statEx1.c -o statEx1$ statEx1======fd1=3 inode no=21954 block size=8192 blocks=6fd2=4 inode no=190611 block size=8192 blocks=======...
System Call: link/unlink
• link: hardlink a file to another • syntax: link(sourceFile, targetFile); unlink(file) A. sourceFile targetFile, file: file name Example: Lab exercise: write a c program which use link/unlink system call. Use ls -l to see the reference count.
System Call: chdir
• chdir: A process may change the current directory of a processl• syntax: chdir(pathname); A. pathname: file name Example
#include <stdio.h>#include <sys/types.h>#include <fcntl.h>
main(){ chdir("/usr/bin"); system("ls -l");}
$ ls -l /usr/bin$
pipe(int a[]) FILE* popen(char *command, char *mode) pclose(FILE*) mknod(char *, S_IFIFO|0644, 0) mknod filename p mkfifo filename
Pipes and Named pipes
Signals and signal handling
Signal Description
SIGABRT Process abort signal.
SIGALRM Alarm clock.
SIGFPE Erroneous arithmetic operation.
SIGHUP Hangup.
SIGILL Illegal instruction.
SIGINT Terminal interrupt signal.
SIGKILL Kill (cannot be caught or ignored).
SIGPIPE Write on a pipe with no one to read it.
SIGQUIT Terminal quit signal.
SIGSEGV Invalid memory reference.
SIGTERM Termination signal.
SIGUSR1 User-defined signal 1.
SIGUSR2 User-defined signal 2.
SIGCHLD Child process terminated or stopped.
SIGCONT Continue executing, if stopped.
SIGSTOP Stop executing (cannot be caught or ignored).
SIGTSTP Terminal stop signal.
SIGTTIN Background process attempting read.
SIGTTOU Background process attempting write.
SIGBUS Bus error.
SIGPOLL Pollable event.
SIGPROF Profiling timer expired.
SIGSYS Bad system call.
SIGTRAP Trace/breakpoint trap.
SIGURG High bandwidth data is available at a socket.
SIGVTALRM Virtual timer expired.
SIGXCPU CPU time limit exceeded.
SIGXFSZ File size limit exceeded.
Signal Handling Related Functions
int signal(int signo, void (*f)(int) );
Signal number
Handler
#include <stdio.h> /* standard I/O functions */#include <unistd.h> /* standard unix functions, like getpid() */#include <sys/types.h> /* various type definitions, like pid_t */#include <signal.h> /* signal name macros, and the signal() prototype *//* first, here is the signal handler */void catch_int(int sig_num){ /* re-set the signal handler again to catch_int, for next time */ signal(SIGINT, catch_int); /* and print the message */ printf("Don't do that"); fflush(stdout);}/* and somewhere later in the code.... *//* set the INT (Ctrl-C) signal handler to 'catch_int' */signal(SIGINT, catch_int);/* now, lets get into an infinite loop of doing nothing. */for ( ;; ) pause();
}
#include <stdio.h> /* standard I/O functions */#include <unistd.h> /* standard unix functions, like getpid() */#include <signal.h> /* signal name macros, and the signal() prototype *//* first, here is the signal handler */void catch_int(int sig_num){ /* re-set the signal handler again to catch_int, for next time */ signal(SIGINT, catch_int); printf("Don't do that\n"); fflush(stdout);}int main(int argc, char* argv[]){ /* set the INT (Ctrl-C) signal handler to 'catch_int' */ signal(SIGINT, catch_int); /* now, lets get into an infinite loop of doing nothing. */ for ( ;; ) pause();}
Signal setsSignal sets are data types (structures) to represent multiple signals. The following functions are used manipulate them.int sigemptyset(sigset_t *set); This function initializes the signal set pointed by set variable such that it contains no signals in it.int sigfillset(segset_t *set);This function fills the signal set pointed by set variable such that it contains all signals in it.int sigaddset(segset_t *set,int signo);This function adds a signal (with signal number signo) to the signal set pointed by set variable.int sigdelset(segset_t *set,int signo);This function removes a signal (with signal number signo) from the signal set pointed by set variable.int issigmember(segset_t *set,int signo);This function checks a signal (with signal number signo) is in the signal set pointed by set variable or not.int sigpending(sigset_t *set); This function returns the set of signals that are blocked from delivery and currently pending to the signal set pointed by set variable.int sigsuspend(sigset_t *set); This function sets the signal mask of the process to the signal set pointed by set variable. Also, the process is suspended until a
signal is caught or until a signal occurs that terminates the process.
sigprocmask( int how, const sigset_t *set , sigset_t *oldset );
SIG_BLOCKSIG_UNBLOCKSIG_SETMASK
int sigaction(int signo, const struct sigaction *act, struct sigaction *oact);
struct sigaction{ void (*sa_handler)(); /*pointer to function or SIG_DFL or SIG_IGN*/
sigset_t sa_mask/ /*additional signal to be blocked during execution of hander*/
int sa_flags; /*special flags and options*/}
Message Queuse#include <stdio.h>#include <sys/types.h>#include <sys/ipc.h>#include <sys/msg.h>int main(int argc, char* argv[]){ /* create a private message queue, with access only to the owner. */struct msgbuf* msg; struct msgbuf* recv_msg; int rc; int queue_id = msgget(IPC_PRIVATE, 0600);
if (queue_id == -1) { perror("main: msgget"); exit(1); } printf("message queue created, queue id '%d'.\n", queue_id); msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world")); msg->mtype = 1; strcpy(msg->mtext, "hello world"); rc = msgsnd(queue_id, msg, strlen(msg->mtext)+1, 0); if (rc == -1) { perror("main: msgsnd"); exit(1); } free(msg); printf("message placed on the queue successfully.\n"); recv_msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world")); rc = msgrcv(queue_id, recv_msg, strlen("hello world")+1, 0, 0); if (rc == -1) { perror("main: msgrcv"); exit(1); } printf("msgrcv: received message: mtype '%d'; mtext '%s'\n", recv_msg->mtype, recv_msg->mtext); return 0;}
Let Us Return to TCP/IP Programming
115
Address + Port (TCP/IP)
FTP [21]
HTTP [80]
SMTP [25]
Telnet [23]
192.168.19.1
192.168.19.3
192.168.19.2
192.168.19.2 [21]
192.168.19. [21]192.168.19.2[21]192.168.19.2 [21]
198.163.197.4
198.163.197.4 [x]
192.168.19.0
Internet
116
Network Addressing Analogy
412-268-8000 ext.123
Central Number
Applications/Servers
WebPort 80
MailPort 25
Exchange
Area Code
412-268-8000 ext.654
IP Address
Network No.
Host Number
Telephone No
15-441 Students Clients
Professors at CMU
Network ProgrammingTelephone Call
Port No.Extension
117
Concept of Port Numbers
◦Port numbers are used to identify “entities” on a host
◦Port numbers can be Well-known (port 0-1023) Dynamic or private (port 1024-
65535)
◦Servers/daemons usually use well-known ports Any client can identify the
server/service HTTP = 80, FTP = 21, Telnet = 23, ... /etc/service defines well-known ports
◦Clients usually use dynamic ports Assigned by the kernel at run time
TCP/UDP
IP
Ethernet Adapter
NTPdaemon
Web server
port 123 port 80
What are Ports
Consider Railway Station Counter 0: Platform TicketsCounter 1: EnquiriesCounter 2: Reservations-------Counter 8: Current ReservationsCounter 9: Cancellations
Each host machine has an IP address When a packet arrives at a host
119
medellin.cs.columbia.edu
(128.59.21.14)
cluster.cs.columbia.edu
(128.59.21.14, 128.59.16.7, 128.59.16.5, 128.59.16.4)
newworld.cs.umass.edu
(128.119.245.93)
Transfer file to/from remote host
Client/server model◦ Client: side that initiates transfer (either to/from remote)◦ Server: remote host
ftp: RFC 959
ftp server: port 21
An ExampleFTP: The File Transfer Protocol
file transfer FTPserver
FTPuser
interface
FTPclient
local filesystem
remote filesystem
user at host
Ftp client contacts ftp server at port 21, specifying TCP as transport protocol
Two parallel TCP connections opened:◦ Control: exchange
commands, responses between client, server.
“out of band control”◦ Data: file data to/from
server
FTPclient
FTPserver
TCP control connection
port 21
TCP data connectionport 20
The interface that the OS provides to its networking subsystem
application layer
transport layer (TCP/UDP)network layer (IP)
link layer (e.g. ethernet)physical layer
application layer
transport layer (TCP/UDP)network layer (IP)
link layer (e.g. ethernet)physical layer
OS networkstack
Sockets as means for inter-process communication (IPC)
Client Process Server ProcessSocke
tOS network
stack
Socket
Internet
Internet
Internet
Address the machine on the network◦ By IP address
Address the process◦ By the “port”-number
The pair of IP-address + port – makes up a “socket-address”
Internet Connections (TCP/IP)
Connection socket pair(128.2.194.242:3479, 208.216.181.15:80)
Server(port 80)
Client
Client socket address128.2.194.242:3479
Server socket address
208.216.181.15:80
Client host address128.2.194.242
Server host address208.216.181.15
Note: 3479 is anephemeral port allocated
by the kernel
Note: 80 is a well-known portassociated with Web servers
Examples of client programs◦ Web browsers, ftp, telnet, ssh
How does a client find the server?◦ The IP address in the server socket address identifies the
host◦ The (well-known) port in the server socket address
identifies the service, and thus implicitly identifies the server process that performs that service.
◦ Examples of well known ports Port 7: Echo server Port 23: Telnet server Port 25: Mail server Port 80: Web server
Clients
Using Ports to Identify Services
Web server(port 80)
Client host
Server host 128.2.194.242
Echo server(port 7)
Service request for128.2.194.242:80
(i.e., the Web server)
Web server(port 80)
Echo server(port 7)
Service request for128.2.194.242:7
(i.e., the echo server)
Kernel
Kernel
Client
Client
Servers are long-running processes (daemons).◦ Created at boot-time (typically) by the init process
(process 1)◦ Run continuously until the machine is turned off.
Each server waits for requests to arrive on a well-known port associated with a particular service.◦ Port 7: echo server◦ Port 23: telnet server◦ Port 25: mail server◦ Port 80: HTTP server
Other applications should choose between 1024 and 65535
Servers
See /etc/services for a comprehensive list of the services available on a Linux machine.
What is a socket?◦ To the kernel, a socket is an endpoint of communication.◦ To an application, a socket is a file descriptor that lets the
application read/write from/to the network. Remember: All Unix I/O devices, including networks, are
modeled as files.
Clients and servers communicate with each by reading from and writing to socket descriptors.
The main distinction between regular file I/O and socket I/O is how the application “opens” the socket descriptors.
Sockets
128
Endpoint Address◦ Generic Endpoint Address
The socket abstraction accommodates many protocol families. It supports many address families. It defines the following generic endpoint address: ( address family, endpoint address in that family )
Data type for generic endpoint address:
◦ TCP/IP Endpoint Address For TCP/IP, an endpoint address is composed of the
following items: Address family is AF_INET (Address Family for
InterNET). Endpoint address in that family is composed of an IP
address and a port number.
129
The IP address identifies a particular computer, while the port number identifies a particular application running on that computer.
The TCP/IP endpoint address is a special instance of the generic one:
Port Number A port number identifies an application running on a
computer. When a client program is executed, WinSock randomly
chooses an unused port number for it. Each server program must have a pre-specified port
number, so that the client can contact the server.
130
The port number is composed of 16 bits, and its possible values are used in the following manner: 0 - 1023: For well-known server applications. 1024 - 49151: For user-defined server applications
(typical range to be used is 1024 - 5000). 49152 - 65535: For client programs.
Port numbers for some well-known server applications: WWW server using TCP: 80 Telnet server using TCP: 23 SMTP (email) server using TCP: 25 SNMP server using UDP: 161.
131
Unix File Descriptor TableDescriptor Table
0
1
2
3
4Data structure for file 0
Data structure for file 1
Data structure for file 2
Standard input
Standard output
Standard error
132
Socket Descriptor Data StructureDescriptor Table
0
1
2
3
4
Family: PF_INETService: SOCK_STREAMLocal IP: 111.22.3.4Remote IP: 123.45.6.78Local Port: 2249Remote Port: 3726
Family: PF_INETService: SOCK_STREAMLocal IP: 111.22.3.4Remote IP: 123.45.6.78Local Port: 2249Remote Port: 3726
133
Hierarchical vs. flat◦ Wisconsin / Madison / UW-Campus / Aditya
vs. Aditya:123-45-6789
◦ Ethernet addresses are flat
What information would routers need to route to Ethernet addresses?◦ Hierarchical structure crucial for designing scalable binding from interface
name to route◦ Route to a general area, then to a specific location
What type of Hierarchy?◦ How many levels?◦ Same hierarchy depth for everyone?
Address broken in segments of increasing specificity◦ Uniform for everybody: needs centralized management◦ Non-uniform: more flexible, needs careful decentralized management
Addressing in IP: Considerations
134
IP Addresses Fixed length: 32 bits
Total IP address size: 4 billion
Initial class-ful structure (1981)◦ Class A: 128 networks, 16M hosts◦ Class B: 16K networks, 64K hosts◦ Class C: 2M networks, 256 hosts
135
IP Address Classes(Some are Obsolete)
Network ID Host ID
Network ID Host ID
8 16
Class A32
0
Class B 10
Class C 110
Multicast AddressesClass D 1110
Reserved for experimentsClass E 1111
24
136
Address would specify prefix for forwarding table◦ Simple lookup
www.cmu.edu address 128.2.11.43◦ Class B address – class + network is 128.2◦ Lookup 128.2 in forwarding table◦ Prefix – part of address that really matters for routing
Forwarding table contains◦ List of class+network entries◦ A few fixed prefix lengths (8/16/24)
Large tables◦ 2 Million class C networks
Original IP Route Lookup
137
Subnet Addressing: RFC917 (1984)
Original goal: network part would uniquely identify a single physical network
Inefficient address space usage◦ Class A & B networks too big
Also, very few LANs have close to 64K hosts Easy for networks to (claim to) outgrow class-C
◦ Each physical network must have one network number
Routing table size is too high
Need simple way to reduce the number of network numbers assigned◦ Subnetting: Split up single network address ranges◦ Fizes routing table size problem, partially
138
Add another “floating” layer to hierarchy
Variable length subnet masks◦ Could subnet a class B into several chunks
Subnetting
Network Host
Network HostSubnet
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0SubnetMask
139
Assume an organization was assigned address 150.100 (class B)
Assume < 100 hosts per subnet (department)
How many host bits do we need?◦ Seven
What is the network mask?◦ 11111111 11111111 11111111 10000000◦ 255.255.255.128
Subnetting Example
140
Forwarding Example• Host configured with IP adress and subnet
mask• Subnet number = IP (AND) Mask• (Subnet number, subnet mask) Outgoing
I/F
D = destination IP addressFor each forwarding table entry (SN, SM OI)
D1 = SM & Dif (D1 == SN)
Deliver on OIElse
Forward to default router
141
Address space depletion◦ In danger of running out of classes A and B◦ Why?
Class C too small for most domains Very few class A – very careful about giving them out Class B poses greatest problem
◦ Class B sparsely populated But people refuse to give it back
Inefficient Address Usage
142
Allows arbitrary split between network & host part of address ◦ Do not use classes to determine network ID◦ Use common part of address as network number◦ Allows handing out arbitrary sized chunks of address
space◦ E.g., addresses 192.4.16 - 192.4.31 have the first 20 bits
in common. Thus, we use these 20 bits as the network number 192.4.16/20
Enables more efficient usage of address space (and router tables)◦ Use single entry for range in forwarding tables◦ Combine forwarding entries when possible
Classless Inter-Domain Routing(CIDR) – RFC1338
143
Network is allocated 8 contiguous chunks of 256-host addresses 200.10.0.0 to 200.10.7.255◦ Allocation uses 3 bits of class C space◦ Remaining 20 bits are network number, written
as 201.10.0.0/21
Replaces 8 class C routing entries with 1 combined entry◦ Routing protocols carry prefix with destination
network address
CIDR Example
144
Network (network portion): Get allocated portion of ISP’s address
space:
ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20
Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23
Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23
Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23
... ….. …. ….
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23
IP Addresses: How to Get One?
145
How does an ISP get block of addresses?◦ From Regional Internet Registries (RIRs)
ARIN (North America, Southern Africa), APNIC (Asia-Pacific), RIPE (Europe, Northern Africa), LACNIC (South America)
How about a single host?◦ Hard-coded by system admin in a file◦ DHCP: Dynamic Host Configuration Protocol:
dynamically get address: “plug-and-play” Host broadcasts “DHCP discover” msg DHCP server responds with “DHCP offer” msg Host requests IP address: “DHCP request” msg DHCP server sends address: “DHCP ack” msg
IP Addresses: How to Get One?
146
Back to CIDR
Provider is given 201.10.0.0/21
201.10.0.0/22 201.10.4.0/24 201.10.5.0/24 201.10.6.0/23
Provider
CIDR implications:
Longest prefix match
Route aggregation
147
Global Address Example
Receiver
Packet
R
Sender2
34
1
2
34
1
2
34
1
R2
R3
R1
R
RR 3
R 4
R 3
R
148
Source Routing Example
Receiver
Packet
R1, R2, R3, R
Sender2
34
1
2
34
1
2
34
1
R2
R3
R1
R2, R3, R
R3, R
R
149
Virtual Circuits Example
Receiver
Packet
1,5 3,7
Sender2
34
1 1,7 4,2
2
34
1
2
34
1
2,2 3,6
R2
R3
R1
5 7
2
6
• Network picks a path• Assigns VC numbers for flow on each link• Populates forwarding table
5 7
2
6
150
Routing Gets Packet to Correct Local Network◦ Based on IP address◦ Router sees that destination address is of local machine
Still Need to Get Packet to Host◦ Using link-layer protocol◦ Need to know hardware address
Same Issue for Any Local Communication◦ Find local machine, given its IP address
Finding a Local Machine
host host host
LAN 1
...
routerWAN
128.2.198.222
128.2.254.36
Destination = 128.2.198.222
151
◦ Diagrammed for Ethernet (6-byte MAC addresses) Low-Level Protocol
◦ Operates only within local network◦ Determines mapping from IP address to hardware (MAC)
address◦ Mapping determined dynamically
No need to statically configure tables Only requirement is that each host know its own IP address
Address Resolution Protocol (ARP)
op
Sender MAC address
Sender IP Address
Target MAC address
Target IP Address
• op: Operation– 1: request– 2: reply
• Sender– Host sending ARP
message• Target
– Intended receiver of message
152
Requestor◦ Fills in own IP and MAC address as “sender”
Why include its MAC address? Mapping
◦ Fills desired host IP address in target IP address Sending
◦ Send to MAC address ff:ff:ff:ff:ff:ff Ethernet broadcast
ARP Requestop
Sender MAC address
Sender IP Address
Target MAC address
Target IP Address
• op: Operation– 1: request
• Sender– Host that wants to
determine MAC address of another machine
• Target– Other machine
153
Responder becomes “sender”◦ Fill in own IP and MAC address◦ Set requestor as target◦ Send to requestor’s MAC address
ARP Reply
op
Sender MAC address
Sender IP Address
Target MAC address
Target IP Address
• op: Operation– 2: reply
• Sender– Host with desired IP
address• Target
– Original requestor
154
Host 128.2.209.100 when plugged into CS ethernet Dest 128.2.209.100 routing to same machine Dest 128.2.0.0 other hosts on same ethernet Dest 127.0.0.0 special loopback address Dest 0.0.0.0 default route to rest of Internet
◦ Main CS router: gigrouter.net.cs.cmu.edu (128.2.254.36)
Host Routing Table Example
Destination Gateway Genmask Iface128.2.209.100 0.0.0.0 255.255.255.255 eth0128.2.0.0 0.0.0.0 255.255.0.0 eth0127.0.0.0 0.0.0.0 255.0.0.0 lo0.0.0.0 128.2.254.36 0.0.0.0 eth0
Auto-configuration IP address, netmask, gateway, hostname, etc., etc.
◦ Type by hand!!!
IPv4 option 1: RARP (Reverse ARP)◦ Data-link protocol
Uses ARP format. New opcodes: “Request reverse”, “reply reverse”
◦ Send query: Request-reverse [ether addr], server responds with IP Used primarily by diskless nodes, when they first initialize, to
find their Internet address
IPv4 option 2: DHCP ◦ Dynamic Host Configuration Protocol◦ ARP is fine for assigning an IP, but is very limited◦ DHCP can provide all the info necessary
DHCP
DHCPOFFER◦ IP addressing information◦ Boot file/server information (for network booting)◦ DNS name servers◦ Lots of other stuff - protocol is extensible; half of the options reserved for
local site definition and use.
DHCPDISCOVER - broadcast
DHCPOFFER
DHCPREQUEST
DHCPACK
DHCP Features Lease-based assignment
◦ Clients can renew: Servers really should preserve this information across client & server reboots.
Provide host configuration information◦ Not just IP address stuff.◦ NTP servers, IP config, link layer config,…
Use:◦ Generic config for desktops/dial-in/etc.
Assign IP address/etc., from pool◦ Specific config for particular machines
Central configuration management
Network Layer4-
159
DHCP: Dynamic Host Configuration Protocol
Goal: allow host to dynamically obtain its IP address from network server when it joins networkCan renew its lease on address in useAllows reuse of addresses (only hold address while connected an “on”)Support for mobile users who want to join network (more shortly)
DHCP overview:◦ host broadcasts “DHCP discover” msg [optional]◦ DHCP server responds with “DHCP offer” msg [optional]◦ host requests IP address: “DHCP request” msg◦ DHCP server sends address: “DHCP ack” msg
Network Layer4-
160
DHCP client-server scenario
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
DHCP server
arriving DHCP client needsaddress in thisnetwork
Network Layer4-
161
DHCP client-server scenarioDHCP server: 223.1.2.5 arriving
client
time
DHCP discover
src : 0.0.0.0, 68 dest.: 255.255.255.255,67yiaddr: 0.0.0.0transaction ID: 654
DHCP offer
src: 223.1.2.5, 67 dest: 255.255.255.255, 68yiaddrr: 223.1.2.4transaction ID: 654Lifetime: 3600 secs
DHCP request
src: 0.0.0.0, 68 dest:: 255.255.255.255, 67yiaddrr: 223.1.2.4transaction ID: 655Lifetime: 3600 secs
DHCP ACK
src: 223.1.2.5, 67 dest: 255.255.255.255, 68yiaddrr: 223.1.2.4transaction ID: 655Lifetime: 3600 secs
DHCP: more than IP address
DHCP can return more than just allocated IP address on subnet: address of first-hop router for client name and IP address of DNS sever network mask (indicating network versus
host portion of address)
DHCP: example
connecting laptop needs its IP address, addr of first-hop router, addr of DNS server: use DHCP
router(runs DHCP)
DHCPUDP
IPEthPhy
DHCP
DHCP
DHCP
DHCP
DHCP
DHCPUDP
IPEthPhy
DHCP
DHCP
DHCP
DHCPDHCP
DHCP request encapsulated in UDP, encapsulated in IP, encapsulated in 802.1 Ethernet Ethernet frame broadcast (dest: FFFFFFFFFFFF) on LAN, received at router running DHCP server
Ethernet demux’ed to IP demux’ed, UDP demux’ed to DHCP
168.1.1.1
DCP server formulates DHCP ACK containing client’s IP address, IP address of first-hop router for client, name & IP address of DNS server
router(runs DHCP)
DHCPUDP
IPEthPhy
DHCP
DHCP
DHCP
DHCP
DHCPUDP
IPEthPhy
DHCP
DHCP
DHCP
DHCP
DHCP
encapsulation of DHCP server, frame forwarded to client, demux’ing up to DHCP at client
client now knows its IP address, name and IP address of DSN server, IP address of its first-hop router
DHCP: example
DHCP: wireshark output (home LAN)
Message type: Boot Reply (2)Hardware type: EthernetHardware address length: 6Hops: 0Transaction ID: 0x6b3a11b7Seconds elapsed: 0Bootp flags: 0x0000 (Unicast)Client IP address: 192.168.1.101 (192.168.1.101)Your (client) IP address: 0.0.0.0 (0.0.0.0)Next server IP address: 192.168.1.1 (192.168.1.1)Relay agent IP address: 0.0.0.0 (0.0.0.0)Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)Server host name not givenBoot file name not givenMagic cookie: (OK)Option: (t=53,l=1) DHCP Message Type = DHCP ACKOption: (t=54,l=4) Server Identifier = 192.168.1.1Option: (t=1,l=4) Subnet Mask = 255.255.255.0Option: (t=3,l=4) Router = 192.168.1.1Option: (6) Domain Name Server Length: 12; Value: 445747E2445749F244574092; IP Address: 68.87.71.226; IP Address: 68.87.73.242; IP Address: 68.87.64.146Option: (t=15,l=20) Domain Name = "hsd1.ma.comcast.net."
reply
Message type: Boot Request (1)Hardware type: EthernetHardware address length: 6Hops: 0Transaction ID: 0x6b3a11b7Seconds elapsed: 0Bootp flags: 0x0000 (Unicast)Client IP address: 0.0.0.0 (0.0.0.0)Your (client) IP address: 0.0.0.0 (0.0.0.0)Next server IP address: 0.0.0.0 (0.0.0.0)Relay agent IP address: 0.0.0.0 (0.0.0.0)Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)Server host name not givenBoot file name not givenMagic cookie: (OK)Option: (t=53,l=1) DHCP Message Type = DHCP RequestOption: (61) Client identifier Length: 7; Value: 010016D323688A; Hardware type: Ethernet Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)Option: (t=50,l=4) Requested IP Address = 192.168.1.101Option: (t=12,l=5) Host Name = "nomad"Option: (55) Parameter Request List Length: 11; Value: 010F03062C2E2F1F21F92B 1 = Subnet Mask; 15 = Domain Name 3 = Router; 6 = Domain Name Server 44 = NetBIOS over TCP/IP Name Server ……
request
IPv6 Auto-configuration Serverless (“Stateless”). No manual config at all.
◦ Only configures addressing items, NOT other host things Use DHCP for such things
Link-local address◦ 1111 1110 10 :: 64 bit interface ID (usually from
Ethernet addr) (fe80::/64 prefix)
◦ Uniqueness test (“anyone using this address?”)◦ Router contact (solicit, or wait for announcement)
Contains globally unique prefix Usually: Concatenate this prefix with local ID -> globally
unique IPv6 ID
167
DNS Design
DNS Today
Domain Name System
168
Need naming to identify resources
Once identified, resource must be located
How to name resource?◦ Naming hierarchy
How do we efficiently locate resources?◦ DNS: name location (IP address)
Challenge: How do we scale these to the wide area?
Naming
169
Lookup a Central DNS? Single point of failure Traffic volume Distant centralized database Single point of update Doesn’t scale!
Obvious Solutions (1)
170
Why not use /etc/hosts? Original Name to Address Mapping
◦ Flat namespace◦ Lookup mapping in /etc/hosts ◦ Downloaded regularly
Count of hosts was increasing: machine per domain machine per user◦ Many more downloads◦ Many more updates
Obvious Solutions (2)
171
Basically a wide-area distributed database of name to IP mappings
Goals:◦ Scalability◦ Decentralized maintenance◦ Robustness◦ Global scope
Names mean the same thing everywhere
◦ Don’t need Atomicity Strong consistency
Domain Name System Goals
172
Conceptually, programmers can view the DNS database as a collection of millions of host entry structures:
◦ in_addr is a struct consisting of 4-byte IP address Functions for retrieving host entries from DNS:◦gethostbyname: query key is a DNS host name.◦gethostbyaddr: query key is an IP address.
Programmer’s View of DNS
/* DNS host entry structure */ struct hostent { char *h_name; /* official domain name of host */ char **h_aliases; /* null-terminated array of domain names */ int h_addrtype; /* host address type (AF_INET) */ int h_length; /* length of an address, in bytes */ char **h_addr_list; /* null-terminated array of in_addr structs */ };
173
DNS Message Format
Identification
No. of Questions
No. of Authority RRs
Questions (variable number of answers)
Answers (variable number of resource records)
Authority (variable number of resource records)
Additional Info (variable number of resource records)
Flags
No. of Answer RRs
No. of Additional RRs
Name, type fields for a query
RRs in response to queryRecords for authoritative servers
Additional “helpful info that may be used
12 bytes
174
Identification◦ Used to match up request/response
Flags◦ 1-bit to mark query or response◦ 1-bit to mark authoritative or not◦ 1-bit to request recursive resolution◦ 1-bit to indicate support for recursive resolution
DNS Header Fields
175
FOR IN class:
Type=A◦ name is hostname◦ value is IP address
Type=NS◦ name is domain (e.g. foo.com)◦ value is name of authoritative
name server for this domain
DNS Records
RR format: (class, name, value, type, ttl)
• DB contains tuples called resource records (RRs)– Classes = Internet (IN), Chaosnet (CH), etc.– Each class defines value associated with type
• Type=CNAME– name is an alias name
for some “canonical” (the real) name
– value is canonical name• Type=MX
– value is hostname of mailserver associated with name
176
Different kinds of mappings are possible:◦ Simple case: 1-1 mapping between domain name and IP
addr: kittyhawk.cmcl.cs.cmu.edu maps to 128.2.194.242
◦ Multiple domain names maps to the same IP address: eecs.mit.edu and cs.mit.edu both map to 18.62.1.6
◦ Single domain name maps to multiple IP addresses: aol.com and www.aol.com map to multiple IP addrs.
◦ Some valid domain names don’t map to any IP address: for example: cs.wisc.edu
Properties of DNS Host Entries
177
DNS Design: Hierarchy Definitions
root (.)
edunetorg
ukcom
gwu ucb wisc cmumit
cs ee
wail
• Each node in hierarchy stores a list of names that end with same suffix
• Suffix = path up tree
• E.g., given this tree, where would following be stored:
• Fred.com• Fred.edu• Fred.wisc.edu• Fred.cs.wisc.edu• Fred.cs.cmu.edu
178
DNS Design: Zone Definitions
root
edunetorg
ukcomca
gwu ucb cmu bu mit
cs ece
cmcl Single node
Subtree
Complete Tree
• Zone = contiguous section of name space
• E.g., Complete tree, single node or subtree
• A zone has an associated set of name servers
• Must store list of names and tree links
179
Zones are created by convincing owner node to create/delegate a subzone◦ Records within zone store multiple redundant
name servers◦ Primary/master name server updated manually◦ Secondary/redundant servers updated by zone
transfer of name space Zone transfer is a bulk transfer of the “configuration” of a
DNS server – uses TCP to ensure reliability
Example:◦ CS.WISC.EDU created by WISC.EDU administrators◦ Who creates WISC.EDU or .EDU?
DNS Design: Cont.
180
Responsible for “root” zone
Approx. 13 root name servers worldwide◦ Currently {a-
m}.root-servers.net
Local name servers contact root servers when they cannot resolve a name◦ Configured with
well-known root servers
DNS: Root Name Servers
181
Each host has a resolver◦ Typically a library that applications can link to◦ Resolves contacts name server◦ Local name servers hand-configured (e.g.
/etc/resolv.conf)
Name servers◦ Either responsible for some zone or…◦ Local servers
Do lookup of distant host names for local hosts Typically answer queries about local zone
Servers/Resolvers
182
Steps for resolving www.wisc.edu◦ Application calls gethostbyname() (RESOLVER)◦ Resolver contacts local name server (S1)
◦ S1 queries root server (S2) for (www.wisc.edu)
◦ S2 returns NS record for wisc.edu (S3)
◦ What about A record for S3? This is what the additional information section is for
(PREFETCHING)
◦ S1 queries S3 for www.wisc.edu
◦ S3 returns A record for www.wisc.edu
Can return multiple A records what does this mean?
Typical Resolution
183
Recursive query: Server goes out and
searches for more info (recursive)
Only returns final answer or “not found”
Iterative query: Server responds with as
much as it knows (iterative)
“I don’t know this name, but ask this server”
Workload impact on choice?
Local server typically does recursive
Root/distant server does iterative
Lookup Methods
requesting hostsurf.eurecom.fr
gaia.cs.umass.edu
root name server
local name serverdns.eurecom.fr
1
2
34
5 6 authoritative name server
dns.cs.umass.edu
intermediate name serverdns.umass.edu
7
8
iterated query
184
Are all servers/names likely to be equally popular?◦ Why might this be a problem? How can we solve this problem?
DNS responses are cached ◦ Quick response for repeated translations◦ Other queries may reuse some parts of lookup
NS records for domains
DNS negative queries are cached◦ Don’t have to repeat past mistakes◦ E.g. misspellings, search strings in resolv.conf
Cached data periodically times out◦ Lifetime (TTL) of data controlled by owner of data◦ TTL passed with every record
Workload and Caching
185
Typical Resolution
Clientresolver
Local DNS server
root & edu DNS server
ns1.wisc.edu DNS server
www.cs.wisc.edu
NS ns1.wisc.eduwww.cs.wisc.edu
NS ns1.cs.wisc.edu
A www=IPaddr
ns1.cs.wisc.eduDNS
server
186
Subsequent Lookup Example
ClientLocal
DNS server
root & edu DNS server
wisc.edu DNS server
cs.wisc.eduDNS
server
ftp.cs.wisc.edu
ftp=IPaddr
ftp.cs.wisc.edu
187
DNS servers are replicated◦ Name service available if ≥ one replica is up◦ Queries can be load balanced between replicas
UDP used for queries◦ Need reliability must implement this on top of UDP!◦ Why not just use TCP?
Try alternate servers on timeout◦ Exponential backoff when retrying same server
Same identifier for all queries◦ Don’t care which server responds
Reliability
188
Task◦ Given IP address, find its name◦ When is this needed?
Method◦ Maintain separate hierarchy
based on IP names◦ Write 128.2.194.242 as
242.194.2.128.in-addr.arpa Why is the address reversed?
Managing◦ Authority manages IP addresses
assigned to it◦ E.g., CMU manages name space
2.128.in-addr.arpa
Reverse DNS
edu
cmu
cs
kittyhawk128.2.194.242
cmcl
unnamed root
arpa
in-addr
128
2
194
242
189
Name servers can add additional data to response
Typically used for prefetching◦ CNAME/MX/NS typically point to another host
name◦ Responses include address of host referred to in
“additional section”
Prefetching
190
Generic Top Level Domains (gTLD) = .com, .net, .org, etc…
Country Code Top Level Domain (ccTLD) = .us, .ca, .fi, .uk, etc…
Root server ({a-m}.root-servers.net) also used to cover gTLD domains◦ Load on root servers was growing quickly!◦ Moving .com, .net, .org off root servers was
clearly necessary to reduce load done Aug 2000
DNS Today: Root Zone
191
.info general info .biz businesses .aero air-transport industry .coop business cooperatives .name individuals .pro accountants, lawyers, and
physicians .museum museums Only new one actives so far
= .info, .biz, .name
New gTLDs
192
No centralized caching per site◦ Each machine runs own caching local server◦ Why is this a problem?◦ How many hosts do we need to share cache? recent studies
suggest 10-20 hosts
Hit rate for DNS = 80% 1 - (#DNS/#connections)◦ Is this good or bad?
Most Internet traffic is Web◦ What does a typical page look like? average of 4-5 imbedded
objects needs 4-5 transfers◦ This alone accounts for 80% hit rate!
Lower TTLs for A records does not affect performance
DNS performance really relies more on NS-record caching
DNS Performance
Programmers Perspective
194
Socket API introduced in BSD4.1
UNIX, 1981 explicitly created, used,
released by apps client/server paradigm two types of transport
service via socket API: ◦ unreliable datagram ◦ reliable, byte stream-
oriented
Socket programming
a host-local, application-created,
OS-controlled interface (a “door”) into which
application process can both send and
receive messages to/from another
application process
socket
Goal: learn how to build client/server application that communicate using sockets
195
Server and Client
TCP/UDP
IP
Ethernet Adapter
Server
TCP/UDP
IP
Ethernet Adapter
Clients
Server and Client exchange messages over the network through a common Socket API
Socket API
hardware
kernel space
user spaceports
196
Socket: a door between application process and end-end-transport protocol (UDP or TCP)
TCP service: reliable transfer of bytes from one process to another
Socket-programming using TCP
process
TCP withbuffers,
variables
socket
controlled byapplicationdeveloper
controlled byoperating
system
host orserver
process
TCP withbuffers,
variables
socket
controlled byapplicationdeveloper
controlled byoperatingsystem
host orserver
internet
197
Client must contact server server process must first
be running server must have created
socket (door) that welcomes client’s contact
Client contacts server by: creating client-local TCP
socket specifying IP address, port
number of server process When client creates
socket: client TCP establishes connection to server TCP
When contacted by client, server TCP creates new socket for server process to communicate with client◦ allows server to talk with
multiple clients◦ source port numbers
used to distinguish clients (more in Chap 3)
Socket programming with TCP
TCP provides reliable, in-order transfer of bytes (“pipe”) between client and server
application viewpoint
198
A stream is a sequence of characters that flow into or out of a process.
An input stream is attached to some input source for the process, eg, keyboard or socket.
An output stream is attached to an output source, eg, monitor or socket.
Stream jargon
199
Example client-server app:
1) client reads line from standard input (inFromUser stream) , sends to server via socket (outToServer stream)
2) server reads line from socket
3) server converts line to uppercase, sends back to client
4) client reads, prints modified line from socket (inFromServer stream)
Socket programming with TCP
outT
oSer
ver
to network from network
inFr
omS
erve
r
inFr
omU
ser
keyboard monitor
Process
clientSocket
inputstream
inputstream
outputstream
TCPsocket
Clientprocess
client TCP socket
200
Client/server socket interaction: TCP
wait for incomingconnection requestconnectionSocket =welcomeSocket.accept()
create socket,port=x, forincoming request:welcomeSocket =
ServerSocket()
create socket,connect to hostid, port=xclientSocket =
Socket()
closeconnectionSocket
read reply fromclientSocket
closeclientSocket
Server (running on hostid) Client
send request usingclientSocketread request from
connectionSocket
write reply toconnectionSocket
TCP connection setup
201
UDP: no “connection” between client and server
no handshaking sender explicitly attaches
IP address and port of destination to each packet
server must extract IP address, port of sender from received packet
UDP: transmitted data may be received out of order, or lost
Socket programming with UDP
application viewpoint
UDP provides unreliable transfer of groups of bytes (“datagrams”)
between client and server
202
Client/server socket interaction: UDP
closeclientSocket
Server (running on hostid)
read reply fromclientSocket
create socket,clientSocket = DatagramSocket()
Client
Create, address (hostid, port=x,send datagram request using clientSocket
create socket,port=x, forincoming request:serverSocket = DatagramSocket()
read request fromserverSocket
write reply toserverSocketspecifying clienthost address,port number
203
Example: Java client (UDP)
sendP
ack
et
to network from network
rece
iveP
ack
et
inF
rom
Use
r
keyboard monitor
Process
clientSocket
UDPpacket
inputstream
UDPpacket
UDPsocket
Output: sends packet (TCP sent “byte stream”)
Input: receives packet (TCP received “byte stream”)
Clientprocess
client UDP socket
This contains the protocol specific addressing information that is passed from the user process to the kernel and vice versa
Each of the protocols supported by a socket implementation have their own socket address structure sockaddr_suffix
Where suffix represents the protocol familyEx: sockaddr_in – Internet/IPv4 socket address structure sockaddr_ipx – IPX socket address structure
Socket address structure
The generic socket address structure sockaddr { address family protocol specific data };
The internet/IPv4 socked address structure sockaddr_in { in_family Internet address family sin_port Transport layer Port Number in_addr sin_addr IP address; sin_zero[8] Padding ;};
int8_t signed 8-bit integer - <sys/types.h>
uint8_t unsigned 8-bit integer - <sys/types.h>
int16_t signed 16-bit integer - <sys/types.h>
uint16_t unsigned 16-bit integer - <sys/types.h>
int32_t signed 32-bit integer - <sys/types.h>
uint32_t unsigned 32-bit integer - <sys/types.h>
sa_family_t address family of - <sys/socket.h>
socklen_t length of socket address structure -<sys/socket.h>
in_addr_t IPv4 address, normally uint32_t <netinet/in.h>
in_port_t TCP/UDP port, normally uint16_t <netinet/in.h>
Datatypes required by POSIX
Byte ordering◦ Network byte order◦ Host byte order◦ htons(l), ntohs(l)
Memory content initialization◦ memset(buffer,value,buffersize)
Data copying and comparison◦ memcpy(dest,src,num_of_bytes)◦ memcmp(buffer1,buffer2,num_of_bytes)
Data manipulation functions
IP address notation conversion◦ Integer notation◦ Dotted decimal notation
status inet_aton(ddstring_pointer,address_pointer) Returns 1 on success 0 on error
ddstring_pointer inet_ntoa(address_pointer)
address_pointer inet_addr(ddstring_pointer) *deprecated
Continued..
sockfd socket(domain, type, protocol)◦ domain is the protocol/address family AF_INET,AF_IPX..◦ type is the the type of service
SOCK_DGRAM,SOCK_STREAM …◦ protocol is the specific protocol that is supported by the
protocol family specified(as param1)◦ Returns a fresh socket descriptor on success, –1 on error
status close(sockfd)◦ Flushes(supposed to) the pending I/O to disk ◦ Returns –1 on error
Initialisation and Shutdown
status bind(sockfd,ptr_to_sockaddr,sockaddr_size)◦ Associates the sockaddr with sockfd◦ The rules for successful binding depend on the
protocol family of the socket(specified during call to socket)
◦ Necessary for receiving connections on STREAM socket
status listen(sockfd,backlog)◦ Notifies the willingness to accept connections◦ backlog Maximum number of established
connections yet to be notified to their respective user processes(calls to accepts)
◦ On unbounded sockets an implicit bind is done with IN_ADDRANY and a random port as the address and port parameters respectively
* Above calls return –1 on error
Connection Management
struct sockaddr_in { unsigned short sin_family; /* address family (always AF_INET) */ unsigned short sin_port; /* port num in network byte order */ struct in_addr sin_addr; /* IP addr in network byte order */ unsigned char sin_zero[8]; /* pad to sizeof(struct sockaddr) */ };
connfd accept(sockfd,ptr_to_sockaddr,ptr_to_sockaddr_size)◦ Blocks till a connection gets established on sockfd and
returns a new file descriptor on which I/O can be performed with the remote entity
◦ Fills the sockaddr and size parameters with the address information (and it’s size respectively) of the connecting entity
◦ bind and listen are assumed to have been called on sockfd prior to calling accept
status connect(sockfd, ptr_to_sockaddr, sockaddr_size)◦ Initiates a new connection with the entity addressed by
sockaddr in case of a STREAM socket◦ Sets the default remote address for I/O in case of DGRAM
socket
* Above calls return –1 on error
Continued…
SEND: int send(int sockfd, const void *msg, int len, int flags);◦ msg: message you want to send◦ len: length of the message◦ flags := 0◦ returned: the number of bytes actually sent
RECEIVE: int recv(int sockfd, void *buf, int len, unsigned int flags);◦ buf: buffer to receive the message◦ len: length of the buffer (“don’t give me more!”)◦ flags := 0◦ returned: the number of bytes received
SEND (DGRAM-style): int sendto(int sockfd, const void *msg, int len, int flags, const struct sockaddr *to, int tolen);◦ msg: message you want to send◦ len: length of the message◦ flags := 0◦ to: socket address of the remote process◦ tolen: = sizeof(struct sockaddr)◦ returned: the number of bytes actually sent
RECEIVE (DGRAM-style): int recvfrom(int sockfd, void *buf, int len, unsigned int flags, struct sockaddr *from, int *fromlen);◦ buf: buffer to receive the message◦ len: length of the buffer (“don’t give me more!”)◦ from: socket address of the process that sent the data◦ fromlen:= sizeof(struct sockaddr)◦ flags := 0◦ returned: the number of bytes received
CLOSE: close (socketfd);
Concurrent server
Client+server: connection-oriented
SOCKETBIND
LISTEN
CONNECT
ACCEPT
RECEIVE
RECEIVE
SEND
SEND
CLOSE
TCP three-way handshake
Client+server: connectionless
CREATE
BIND
SEND
SEND
CLOSE
RECEIVE
Step by Step Explanation of having Connection Oriented Server and Client
217
For example: web server
What does a web server need to do so that a web client can connect to it?TCP
IP
Ethernet Adapter
Web Server
Port 80
TCP Server
Since web traffic uses TCP, the web server must create a socket of type SOCK_STREAM
int fd; /* socket descriptor */
if((fd = socket(AF_INET, SOCK_STREAM, 0)) < 0) {perror(“socket”);exit(1);
}
• socket returns an integer (socket descriptor)• fd < 0 indicates that an error occurred
• AF_INET associates a socket with the Internet protocol family• SOCK_STREAM selects the TCP protocol
Socket I/O: socket()
219
A socket can be bound to a port
int fd; /* socket descriptor */struct sockaddr_in srv; /* used by bind() */
/* create the socket */
srv.sin_family = AF_INET; /* use the Internet addr family */
srv.sin_port = htons(80); /* bind socket ‘fd’ to port 80*/
/* bind: a client may connect to any of my addresses */srv.sin_addr.s_addr = htonl(INADDR_ANY);
if(bind(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) {perror("bind"); exit(1);
}
• Still not quite ready to communicate with a client...
Socket I/O: bind()
220
listen indicates that the server will accept a connection
Socket I/O: listen()
int fd; /* socket descriptor */struct sockaddr_in srv; /* used by bind() */
/* 1) create the socket *//* 2) bind the socket to a port */
if(listen(fd, 5) < 0) {perror(“listen”);exit(1);
}
• Still not quite ready to communicate with a client...
222
accept blocks waiting for a connection
Socket I/O: accept()
int fd; /* socket descriptor */struct sockaddr_in srv; /* used by bind() */struct sockaddr_in cli; /* used by accept() */int newfd; /* returned by accept() */int cli_len = sizeof(cli); /* used by accept() */
/* 1) create the socket *//* 2) bind the socket to a port *//* 3) listen on the socket */
newfd = accept(fd, (struct sockaddr*) &cli, &cli_len);if(newfd < 0) {
perror("accept"); exit(1);}
• accept returns a new socket (newfd) with the same properties as the original socket (fd)• newfd < 0 indicates that an error occurred
223
Socket I/O: accept() continued...struct sockaddr_in cli; /* used by accept() */int newfd; /* returned by accept() */int cli_len = sizeof(cli); /* used by accept() */
newfd = accept(fd, (struct sockaddr*) &cli, &cli_len);if(newfd < 0) {
perror("accept");exit(1);
}• How does the server know which client it is?• cli.sin_addr.s_addr contains the client’s IP address• cli.sin_port contains the client’s port number
• Now the server can exchange data with the client by using read and write on the descriptor newfd.
• Why does accept need to return a new descriptor?
224
read can be used with a socket read blocks waiting for data from the
client but does not guarantee that sizeof(buf) is read
Socket I/O: read()
int fd; /* socket descriptor */char buf[512]; /* used by read() */int nbytes; /* used by read() */
/* 1) create the socket *//* 2) bind the socket to a port *//* 3) listen on the socket *//* 4) accept the incoming connection */
if((nbytes = read(newfd, buf, sizeof(buf))) < 0) {perror(“read”); exit(1);
}
225
For example: web client
How does a web client connect to a web server?
TCP Client
TCP
IP
Ethernet Adapter
2 Web Clients
226
IP Addresses are commonly written as strings (“128.2.35.50”), but programs deal with IP addresses as integers.
Dealing with IP Addresses
struct sockaddr_in srv;
srv.sin_addr.s_addr = inet_addr(“128.2.35.50”);if(srv.sin_addr.s_addr == (in_addr_t) -1) {
fprintf(stderr, "inet_addr failed!\n"); exit(1);}
Converting a numerical address to a string:
struct sockaddr_in srv;char *t = inet_ntoa(srv.sin_addr);if(t == 0) {
fprintf(stderr, “inet_ntoa failed!\n”); exit(1);}
Converting strings to numerical address:
227
Gethostbyname provides interface to DNS Additional useful calls
◦ Gethostbyaddr – returns hostent given sockaddr_in◦ Getservbyname
Used to get service description (typically port number) Returns servent based on name
Translating Names to Addresses
#include <netdb.h>
struct hostent *hp; /*ptr to host info for remote*/ struct sockaddr_in peeraddr;char *name = “www.cs.cmu.edu”;
peeraddr.sin_family = AF_INET; hp = gethostbyname(name) peeraddr.sin_addr.s_addr = ((struct in_addr*)(hp->h_addr))->s_addr;
228
connect allows a client to connect to a server...
Socket I/O: connect()
int fd; /* socket descriptor */struct sockaddr_in srv; /* used by connect() */
/* create the socket */
/* connect: use the Internet address family */srv.sin_family = AF_INET;
/* connect: socket ‘fd’ to port 80 */srv.sin_port = htons(80);
/* connect: connect to IP Address “128.2.35.50” */srv.sin_addr.s_addr = inet_addr(“128.2.35.50”);
if(connect(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) {perror(”connect"); exit(1);
}
229
write can be used with a socket
Socket I/O: write()
int fd; /* socket descriptor */struct sockaddr_in srv; /* used by connect() */char buf[512]; /* used by write() */int nbytes; /* used by write() */
/* 1) create the socket *//* 2) connect() to the server */
/* Example: A client could “write” a request to a server */if((nbytes = write(fd, buf, sizeof(buf))) < 0) {
perror(“write”);exit(1);
}
230
Review: TCP Client-Server Interaction
socket()
bind()
listen()
accept()
write()
read()
read()
TCP Server
close()
socket()
TCP Client
connect()
write()
read()
close()
connection establishment
data request
data reply
end-of-file notification
Example: C client (TCP)/* client.c */void main(int argc, char *argv[]){ struct sockaddr_in sad; /* structure to hold an IP address */ int clientSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */
char Sentence[128]; char modifiedSentence[128];
host = argv[1]; port = atoi(argv[2]);
clientSocket = socket(PF_INET, SOCK_STREAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure
*/ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_port = htons((u_short)port); ptrh = gethostbyname(host); /* Convert host name to IP address
*/memcpy(&sad.sin_addr, ptrh->h_addr, ptrh->h_length);
connect(clientSocket, (struct sockaddr *)&sad, sizeof(sad));
Create client socket, connect to server
Example: C client (TCP), cont.
gets(Sentence);
n=write(clientSocket, Sentence, strlen(Sentence)+1);
n=read(clientSocket, modifiedSentence, sizeof(modifiedSentence)); printf("FROM SERVER: %s\n”,modifiedSentence);
close(clientSocket); }
Get input stream
from user
Send lineto server
Read linefrom server
Close connection
Example: C server (TCP)/* server.c */void main(int argc, char *argv[]){ struct sockaddr_in sad; /* structure to hold an IP address */ struct sockaddr_in cad;int welcomeSocket, connectionSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */
char clientSentence[128]; char capitalizedSentence[128];
port = atoi(argv[1]);
welcomeSocket = socket(PF_INET, SOCK_STREAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */ sad.sin_port = htons((u_short)port);/* set the port number */
bind(welcomeSocket, (struct sockaddr *)&sad, sizeof(sad));
Create welcoming socket at port &
Bind a local address
Example: C server (TCP), cont/* Specify the maximum number of clients that can be queued */listen(welcomeSocket, 10)
while(1) {
connectionSocket=accept(welcomeSocket, (struct sockaddr *)&cad, &alen); n=read(connectionSocket, clientSentence, sizeof(clientSentence)); /* capitalize Sentence and store the result in capitalizedSentence*/
n=write(connectionSocket, capitalizedSentence, strlen(capitalizedSentence)+1);
close(connectionSocket); } }
Write out the result to socket
End of while loop,loop back and wait foranother client connection
Wait, on welcoming socket for contact by a client
Outline for typical concurrent server
Status transition
*after return from accept
*after fork()returns
*after socketclose()
Socket programming with UDP
UDP: no “connection” between client and server
• no handshaking• sender explicitly attaches IP
address and port of destination to each packet
• server must extract IP address, port of sender from received packet
UDP: transmitted data may be received out of order, or lost
application viewpoint
UDP provides unreliable transfer of groups of bytes (“datagrams”)
between client and server
239
For example: NTP daemon
What does a UDP server need to do so that a UDP client can connect to it?
UDP Server Example
UDP
IP
Ethernet Adapter
NTPdaemon
Port 123
240
The UDP server must create a datagram socket…
Socket I/O: socket()
int fd; /* socket descriptor */
if((fd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {perror(“socket”);exit(1);
}
• socket returns an integer (socket descriptor)• fd < 0 indicates that an error occurred
• AF_INET: associates a socket with the Internet protocol family• SOCK_DGRAM: selects the UDP protocol
241
A socket can be bound to a port
Socket I/O: bind()
int fd; /* socket descriptor */struct sockaddr_in srv; /* used by bind() */
/* create the socket */
/* bind: use the Internet address family */srv.sin_family = AF_INET;
/* bind: socket ‘fd’ to port 80*/srv.sin_port = htons(80);
/* bind: a client may connect to any of my addresses */srv.sin_addr.s_addr = htonl(INADDR_ANY);
if(bind(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) {perror("bind"); exit(1);
}
• Now the UDP server is ready to accept packets…
242
read does not provide the client’s address to the UDP server
Socket I/O: recvfrom()
int fd; /* socket descriptor */struct sockaddr_in srv; /* used by bind() */struct sockaddr_in cli; /* used by recvfrom() */char buf[512]; /* used by recvfrom() */int cli_len = sizeof(cli); /* used by recvfrom() */int nbytes; /* used by recvfrom() */
/* 1) create the socket *//* 2) bind to the socket */
nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) &cli, &cli_len);
if(nbytes < 0) {perror(“recvfrom”); exit(1);
}
243
Socket I/O: recvfrom() continued...nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */,
(struct sockaddr*) cli, &cli_len);
• The actions performed by recvfrom• returns the number of bytes read (nbytes)• copies nbytes of data into buf• returns the address of the client (cli)• returns the length of cli (cli_len)• don’t worry about flags
244
How does a UDP client communicate with a UDP server?
UDP Client Example
TCP
IP
Ethernet Adapter
2 UDP Clients
ports
245
write is not allowed Notice that the UDP client does not bind a port number
◦ a port number is dynamically assigned when the first sendto is called
Socket I/O: sendto()
int fd; /* socket descriptor */struct sockaddr_in srv; /* used by sendto() */
/* 1) create the socket */
/* sendto: send data to IP Address “128.2.35.50” port 80 */srv.sin_family = AF_INET;srv.sin_port = htons(80); srv.sin_addr.s_addr = inet_addr(“128.2.35.50”);
nbytes = sendto(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) &srv, sizeof(srv));
if(nbytes < 0) {perror(“sendto”); exit(1);
}
246
Review: UDP Client-ServerInteraction
socket()
bind()
recvfrom()
sendto()
UDP Server
socket()
UDP Client
sendto()
recvfrom()
close()
blocks until datagramreceived from a client
data request
data reply
Example: C client (UDP)/* client.c */void main(int argc, char *argv[]){ struct sockaddr_in sad; /* structure to hold an IP address */ int clientSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */
char Sentence[128]; char modifiedSentence[128];
host = argv[1]; port = atoi(argv[2]);
clientSocket = socket(PF_INET, SOCK_DGRAM, 0);
/* determine the server's address */memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure
*/ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_port = htons((u_short)port); ptrh = gethostbyname(host); /* Convert host name to IP address
*/memcpy(&sad.sin_addr, ptrh->h_addr, ptrh->h_length);
Create client socket, NO connection to server
Example: C client (UDP), cont.
gets(Sentence);
addr_len =sizeof(struct sockaddr); n=sendto(clientSocket, Sentence, strlen(Sentence)+1, (struct sockaddr *) &sad, addr_len);
n=recvfrom(clientSocket, modifiedSentence, sizeof(modifiedSentence). (struct sockaddr *) &sad, &addr_len); printf("FROM SERVER: %s\n”,modifiedSentence);
close(clientSocket); }
Get input stream
from user
Send lineto server
Read linefrom server
Close connection
Example: C server (UDP)/* server.c */void main(int argc, char *argv[]){ struct sockaddr_in sad; /* structure to hold an IP address */ struct sockaddr_in cad;int serverSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */
char clientSentence[128]; char capitalizedSentence[128];
port = atoi(argv[1]);
serverSocket = socket(PF_INET, SOCK_DGRAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */ sad.sin_port = htons((u_short)port);/* set the port number */
bind(serverSocket, (struct sockaddr *)&sad, sizeof(sad));
Create welcoming socket at port &
Bind a local address
250
How can the UDP server service multiple ports simultaneously?
The UDP Server
UDP
IP
Ethernet Adapter
UDP Server
Port 2000Port 3000
251
What problems does this code have?
UDP Server: Servicing Two Ports
int s1; /* socket descriptor 1 */int s2; /* socket descriptor 2 */
/* 1) create socket s1 *//* 2) create socket s2 *//* 3) bind s1 to port 2000 *//* 4) bind s2 to port 3000 */
while(1) {recvfrom(s1, buf, sizeof(buf), ...);/* process buf */
recvfrom(s2, buf, sizeof(buf), ...);/* process buf */
}
client 1 server client 2
call connectcall accept
call read
ret connectret accept
call connectcall fgets
User goesout to lunch
Client 1 blockswaiting for userto type in data
Client 2 blockswaiting to completeits connection request until afterlunch!
Server blockswaiting fordata fromClient 1
Server Flaw
Concurrent Serversclient 1 server client 2
call connectcall accept
ret connectret accept
call connect
call fgets
User goesout to lunch
Client 1 blockswaiting for user to type in data
call acceptret connect
ret accept call fgets
write
write
call read
end readclose
close
call read (don’t block)
call read
while (1) { newsock = (int *)malloc(sizeof (int)); *newsock=accept(sock, (struct sockaddr *)&from,
&fromlen); if (*newsock < 0) error("Accepting"); printf("A connection has been accepted from %s\n", inet_ntoa((struct in_addr)from.sin_addr)); retval = pthread_create(&tid, NULL,
ConnectionThread, (void *)newsock); if (retval != 0) { error("Error, could not create thread"); } }
Multithreaded Server
/****** ConnectionThread **********/ void *ConnectionThread(void *arg) { int sock, n, len; char buffer[BUFSIZE]; char *msg = "Got your message"; sock = *(int *)arg; len = strlen(msg); n = read(sock,buffer,BUFSIZE-1); while (n > 0) { buffer[n]='\0'; printf("Message is %s\n",buffer); n = write(sock,msg,len); if (n < len) error("Error writing"); n = read(sock,buffer,BUFSIZE-1); if (n < 0) error("Error reading"); } if (close(sock) < 0) error("closing"); pthread_exit(NULL); return NULL; }
Concurrency
• Threading– Easier to understand– Race conditions increase complexity
• Select()– Explicit control flows, no race conditions– Explicit control more complicated
• There is no clear winner, but you MUST use select()…
What is select()?
• Monitor multiple descriptors• How does it work?
– Setup sets of sockets to monitor– select(): blocking until something
happens– “Something” could be
• Incoming connection: accept()• Clients sending data: read()• Pending data to send: write()• Timeout
Concurrency – Step 1
• Allowing address reuse
• Then we set the sockets to be non-blocking
int sock, opts=1;
sock = socket(...); // To give you an idea of where the new code goes
setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &opts, sizeof(opts));
if((opts = fcntl(sock, F_GETFL)) < 0) { // Get current optionsprintf(“Error...\n”);...
}opts = (opts | O_NONBLOCK); // Don't clobber your old settingsif(fcntl(sock, F_SETFL, opts) < 0) {
printf(“Error...\n”);...
}
bind(...); // To again give you an idea where the new code goes
Concurrency – Step 2
• Monitor sockets with select()– int select(int maxfd, fd_set *readfds, fd_set
*writefds, fd_set *exceptfds, const struct timespec *timeout);
• maxfd– max file descriptor + 1
• fd_set: bit vector with FD_SETSIZE bits– readfds: bit vector of read descriptors to
monitor– writefds: bit vector of write descriptors to
monitor– exceptfds: set to NULL
• timeout– how long to wait without activity before
returning
What about bit vectors?
• void FD_ZERO(fd_set *fdset);– clear out all bits
• void FD_SET(int fd, fd_set *fdset); – set one bit
• void FD_CLR(int fd, fd_set *fdset); – clear one bit
• int FD_ISSET(int fd, fd_set *fdset); – test whether fd bit is set
The Server// socket() call and non-blocking code is above this point
if((bind(sockfd, (struct sockaddr *) &saddr, sizeof(saddr)) < 0) { // bind!printf(“Error binding\n”);...
}
if(listen(sockfd, 5) < 0) { // listen for incoming connectionsprintf(“Error listening\n”);...
}
clen=sizeof(caddr);
// Setup pool.read_set with an FD_ZERO() and FD_SET() for// your server socket file descriptor. (whatever socket() returned)
while(1) {pool.ready_set = pool.read_set; // Save the current statepool.nready = select(pool.maxfd+1, &pool.ready_set, &pool.write_set, NULL, NULL);
if(FD_ISSET(sockfd, &pool.ready_set)) { // Check if there is an incoming connisock=accept(sockfd, (struct sockaddr *) &caddr, &clen); // accept itadd_client(isock, &pool); // add the client by the incoming socket fd
}
check_clients(&pool); // check if any data needs to be sent/received from clients}
...
close(sockfd);
What is pool?
typedef struct { /* represents a pool of connected descriptors */ int maxfd; /* largest descriptor in read_set */ fd_set read_set; /* set of all active read descriptors */ fd_set write_set; /* set of all active read descriptors */ fd_set ready_set; /* subset of descriptors ready for reading */ int nready; /* number of ready descriptors from select */ int maxi; /* highwater index into client array */ int clientfd[FD_SETSIZE]; /* set of active descriptors */ rio_t clientrio[FD_SETSIZE]; /* set of active read buffers */
... // ADD WHAT WOULD BE HELPFUL FOR PROJECT1} pool;
What about checking clients?
• The main loop only tests for incoming connections– There are other reasons the server wakes
up– Clients are sending data, pending data to
write to buffer, clients closing connections, etc.
• Store all client file descriptors– in pool
• Keep the while(1) loop thin– Delegate to functions
• Come up with your own design
264
maxfds: number of descriptors to be tested◦ descriptors (0, 1, ... maxfds-1) will be tested
readfds: a set of fds we want to check if data is available◦ returns a set of fds ready to read◦ if input argument is NULL, not interested in that condition
writefds: returns a set of fds ready to write exceptfds: returns a set of fds with exception conditions
Socket I/O: select()int select(int maxfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
FD_CLR(int fd, fd_set *fds); /* clear the bit for fd in fds */FD_ISSET(int fd, fd_set *fds); /* is the bit for fd in fds? */FD_SET(int fd, fd_set *fds); /* turn on the bit for fd in fds */FD_ZERO(fd_set *fds); /* clear all bits in fds */
265
timeout◦ if NULL, wait forever and return only when one of the
descriptors is ready for I/O◦ otherwise, wait up to a fixed amount of time specified by
timeout if we don’t want to wait at all, create a timeout structure with timer
value equal to 0
Refer to the man page for more information
Socket I/O: select()
int select(int maxfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
struct timeval {long tv_sec; /* seconds /long tv_usec; /* microseconds */
}
266
Socket I/O: select()
int s1, s2; /* socket descriptors */fd_set readfds; /* used by select() */
/* create and bind s1 and s2 */while(1) {
FD_ZERO(&readfds); /* initialize the fd set */
FD_SET(s1, &readfds); /* add s1 to the fd set */FD_SET(s2, &readfds); /* add s2 to the fd set */
if(select(s2+1, &readfds, 0, 0, 0) < 0) {perror(“select”);exit(1);
}if(FD_ISSET(s1, &readfds)) {
recvfrom(s1, buf, sizeof(buf), ...);/* process buf */
}/* do the same for s2 */
}
• select allows synchronous I/O multiplexing
267
TCP
IP
Ethernet Adapter
Web Server
Port 80
How can a a web server managemultiple connections simultaneously?
Port 8001
More Details About a Web Server
Lecture 3: 9-4-01 268
Now the web server can support multiple connections...
Socket I/O: select()int fd, next=0; /* original socket */int newfd[10]; /* new socket descriptors */while(1) {
fd_set readfds;FD_ZERO(&readfds); FD_SET(fd, &readfds);
/* Now use FD_SET to initialize other newfd’s that have already been returned by accept() */
select(maxfd+1, &readfds, 0, 0, 0);if(FD_ISSET(fd, &readfds)) {
newfd[next++] = accept(fd, ...); }/* do the following for each descriptor newfd[n] */if(FD_ISSET(newfd[n], &readfds)) {
read(newfd[n], buf, sizeof(buf));/* process data */
}}
269
A Few Programming Notes:Representing Packets
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Type |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Length | Checksum |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type: 4-byte integerLength: 2-byte integerChecksum: 2-byte integerAddress: 4-byte IP address
270
A Few Programming Notes:Building a Packet in a Buffer
struct packet {u_int32_t type;u_int16_t length;u_int16_t checksum;u_int32_t address;
};
/* ================================================== */char buf[1024];struct packet *pkt;
pkt = (struct packet*) buf;pkt->type = htonl(1);pkt->length = htons(2);pkt->checksum = htons(3);pkt->address = htonl(4);
#include <stdio.h> /* for printf() and fprintf() */#include <sys/socket.h> /* for socket(), connect(),
sendto(), and recvfrom()
*/#include <arpa/inet.h> /* for sockaddr_in and
inet_addr() */#include <stdlib.h> /* for atoi() and exit() */#include <string.h> /* for memset() */#include <unistd.h> /* for close() */
#define ECHOMAX 255 /* Longest string to echo */
EchoClient.c – #include’s
int main(int argc, char *argv[]){ int sock; /* Socket descriptor */ struct sockaddr_in echoServAddr; /* Echo server address */ struct sockaddr_in fromAddr; /* Source address of echo */ unsigned short echoServPort =7; /* Echo server port */ unsigned int fromSize; /* address size for recvfrom() */ char *servIP=“172.24.23.4”; /* IP address of server
*/ char *echoString=“I hope this works”; /* String to send
to echo server */ char echoBuffer[ECHOMAX+1]; /* Buffer for receiving
echoed string */ int echoStringLen; /* Length of string to echo */ int respStringLen; /* Length of received response */
EchoClient.c -variable declarations
/* Create a datagram/UDP socket */ sock = socket(AF_INET, SOCK_DGRAM, 0);
/* Construct the server address structure */ memset(&echoServAddr, 0, sizeof(echoServAddr)); /* Zero
out structure */ echoServAddr.sin_family = AF_INET; /* Internet addr family */ echoServAddr.sin_addr.s_addr = htonl(servIP); /* Server IP
address */ echoServAddr.sin_port = htons(echoServPort); /* Server port
*/
/* Send the string to the server */ sendto(sock, echoString, echoStringLen, 0, (struct sockaddr *)
&echoServAddr, sizeof(echoServAddr);/* Recv a response */
EchoClient.c - creating the socket and sending
fromSize = sizeof(fromAddr); recvfrom(sock, echoBuffer, ECHOMAX, 0, (struct sockaddr *)
&fromAddr, &fromSize); /* Error checks like packet is received from the same server*/
/* null-terminate the received data */ echoBuffer[echoStringLen] = '\0'; printf("Received: %s\n", echoBuffer); /* Print the echoed arg
*/close(sock); exit(0);} /* end of main () */
EchoClient.c – receiving and printing
int main(int argc, char *argv[]){ int sock; /* Socket */ struct sockaddr_in echoServAddr; /* Local address */ struct sockaddr_in echoClntAddr; /* Client address */ unsigned int cliAddrLen; /* Length of incoming message */ char echoBuffer[ECHOMAX]; /* Buffer for echo string */ unsigned short echoServPort =7; /* Server port */ int recvMsgSize; /* Size of received message */ /* Create socket for sending/receiving datagrams */ sock = socket(AF_INET, SOCK_DGRAM, 0); /* Construct local address structure */ memset(&echoServAddr, 0, sizeof(echoServAddr)); /* Zero out
structure */ echoServAddr.sin_family = AF_INET; /* Internet address family
*/ echoServAddr.sin_addr.s_addr = htonl(“172.24.23.4”); echoServAddr.sin_port = htons(echoServPort); /* Local port */
/* Bind to the local address */ bind(sock, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr);
EchoServer.c
for (;;) /* Run forever */ { cliAddrLen = sizeof(echoClntAddr);
/* Block until receive message from a client */ recvMsgSize = recvfrom(sock, echoBuffer, ECHOMAX, 0, (struct sockaddr *) &echoClntAddr, &cliAddrLen);
printf("Handling client %s\n", inet_ntoa(echoClntAddr.sin_addr));
/* Send received datagram back to the client */ sendto(sock, echoBuffer, recvMsgSize, 0, (struct sockaddr *) &echoClntAddr, sizeof(echoClntAddr); } } /* end of main () */
Error handling is must
The setsockopt() function manipulates options associated with a socket. Options can exist at multiple protocol levels. However, the options are always present at the uppermost socket level. Options affect socket operations, such as the routing of packets, out-of-band data transfer, and so on.
The setsockopt function#include <sys/socket.h>setsockopt(int s, int level, int optname, const void *optval, socklen_t optlen);
The level argument specifies the protocol level at which the option resides. To set options at the socket level, specify the level argument as SOL_SOCKET. To set options at other levels, supply the appropriate protocol number for the protocol controlling the option. For example, to indicate that an option is interpreted by the TCP (Transport Control Protocol), set level to the protocol number of TCP.
The following options are supported for setsockopt(): SO_DEBUG Provides the ability to turn on recording of debugging
information. This option takes an int value in the optval argument. This is a BOOL option.
SO_BROADCAST Permits sending of broadcast messages, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOL option.
SO_REUSEADDR Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOLoption.
Level Argument
SO_KEEPALIVE Keeps connections active by enabling periodic transmission of messages, if this is supported by the protocol.
If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option takes an int value in the optval argument. This is a BOOL option.
SO_LINGER Specifies whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option takes a linger structure in the optval argument.
SO_OOBINLINE Specifies whether the socket leaves received out-of-band data (data marked urgent) in line. This option takes an int value in optval argument. This is a BOOL option.
SO_SNDBUF Sets send buffer size information. This option takes an int value in the optval argument.
SO_RCVBUF Sets receive buffer size information. This option takes an int value in the optval argument.
SO_DONTROUTE Specifies whether outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option takes an int value in the optval argument. This is a BOOL option.
TCP_NODELAY Specifies whether the Nagle algorithm used by TCP for send coalescing is to be disabled. This option takes an int value in the optval argument. This is a BOOL option.
For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.
RETURN VALUES If successful, setsockopt() returns a zero. If
a failure occurs, it returns a value of -1 and sets errno to one of the following values:
EBADF s is not a valid descriptor ENOTSOCK s is not a socket descriptor ENOPROTOOPT optname is unknown at
indicated level EFAULT optval is an invalid pointer
Sample Usage: int skt, int sndsize; err = setsockopt(skt, SOL_SOCKET, SO_SNDBUF, (char *)&sndsize, (int)sizeof(sndsize));or: int skt, int sndsize; err = setsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sndsize, (int)sizeof(sndsize));
int optval; int optlen; char *optval2; // set SO_REUSEADDR on a socket to true (1): optval =
1; setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval);
// bind a socket to a device name (might not work on all systems): optval2 = "eth1";
// 4 bytes long, so 4, below: setsockopt(s2, SOL_SOCKET, SO_BINDTODEVICE, optval2, 4);
// see if the SO_BROADCAST flag is set: getsockopt(s3, SOL_SOCKET, SO_BROADCAST, &optval, &optlen);
if (optval != 0) { print("SO_BROADCAST enabled on s3!\n"); }
Example
ESCRIPTION The getsockopt() function retrieves the current value for a socket option associated
with a socket of any type, in any state, and stores the result in optval. Options may exist at multiple protocol levels, but they are always present at the uppermost socket' level. Options affect socket operations, such as the routing of packets, out-of-band data transfer, and so on.
The level argument specifies the protocol level at which the option resides. To retrieve options at the socket level, specify the level argument as SOL_SOCKET. To retrieve options at other levels, supply the appropriate protocol number for the protocol controlling the option. For example, to indicate that an option is to be interpreted by the TCP (Transport Control Protocol), set level to the protocol number of TCP.
The value associated with the selected option is returned in the buffer optval. The integer pointed to by optlen should originally contain the size of this buffer; on return, it is set to the size of the value returned. For SO_LINGER, this is the size of a struct linger; for most other options it is the size of an integer.
The application is responsible for allocating any memory space pointed to directly or indirectly by any of the parameters it specified.
If an option has not been set with setsockopt(), getsockopt() returns the default value for the option.
The getsockopt function#include <sys/socket.h>int getsockopt(int s, int level, int optname, void *optval, socklen_t *optlen);
O_DEBUG Reports whether debugging information is being recorded. This option stores an int value in the optval argument. This is a BOOL option.
SO_ACCEPTCONN Reports whether socket listening is enabled. This option stores an int value in the optval argument. This is a BOOL option.
SO_BROADCAST Reports whether transmission of broadcast messages is supported, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOL option.
SO_REUSEADDR Reports whether the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOLoption.
SO_KEEPALIVE Reports whether connections are kept active with periodic transmission of messages, if this is supported by the protocol.
If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option stores an int value in the optval argument. This is a BOOL option.
SO_LINGER Reports whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option stores a linger structure in the optval argument.
SO_OOBINLINE Reports whether the socket leaves received out-of-band data (data marked urgent) in line. This option stores an int value in optval argument. This is a BOOL option.
SO_SNDBUF Reports send buffer size information. This option stores an int value in the optval argument.
SO_RCVBUF Reports receive buffer size information. This option stores an int value in the optval argument.
SO_ERROR Reports information about error status and clears it. This option stores an int value in the optval argument.
SO_TYPE Reports the socket type. This option stores an int value in the optval argument. SO_DONTROUTE Reports whether outgoing messages bypass the standard routing facilities.
The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option stores an int value in the optval argument. This is a BOOL option.
SO_MAX_MSG_SIZE Maximum size of a message for message-oriented socket types (for example, SOCK_DGRAM). Has no meaning for stream-oriented sockets. This option stores an int value in the optval argument.
CP_NODELAY Specifies whether the Nagle algorithm used by TCP for send coalescing is disabled. This option stores an int value in the optval argument. This is a BOOL option.
For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.
RETURN VALUES If successful, getsockopt() returns a zero. If a failure
occurs, it returns a value of -1 and sets errno to one of the following values:
EBADF The parameter s is not a valid descriptor. ENOPROTOOPT The option is unknown at the level
indicated. ENOTSOCK The parameter s is a file, not a socket.
int sockbufsize = 0; int size = sizeof(int);
err = getsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sockbufsize, &size);
Thank You