Introduction to Socket Programming-NBV

Introduction To Socket Programming

Prof NB VenkateswarluB.Tech(SVU), M.Tech(IIT-K), Ph.D(BITS, Pilani), PDF(U of Leeds,UK)

ISTE Visiting Fellow 2010-11AITAM, Tekkali

Many Thanks to OrganizersandThanks to Participants

A Small Dose of Questions To Know You Little Briefing about Unix Internals Recapitulation of What is Internet Variety of Addresses involved Socket Concepts Related System Calls Simple TCP Client and Server in action Simple UDP Client and Server in action What is DNS

Today's Themes

What is the Difference between Data Communications and Computer Networks?.

What is firmware?. Why do we need to split a message?. Why do we require so many levels of

control? (Is network system is reliable?) What are physical and logical addresses?. What is the conceptual difference between

DLL and NLL?.

What is the Difference between Internet and internet?

What is fork()? What is signal? What is Process and Thread? What is a device driver?. What is a daemon?. What is exec() What are locks?.

Just Probing Your Unix and OS Knowledge

6

A collection of interconnected networks

Networks: Different depts, labs, etc.

Router: node that connects distinct networks

Host: network endpoints (computer, PDA, light switch, …)

Together, an independently administered entity◦ Enterprise, ISP, etc.

Internetwork

Internet[work]

EE ME

CS

7

Many differences between networks◦ Address formats◦ Performance –

bandwidth/latency◦ Packet size◦ Loss

rate/pattern/handling◦ Routing

How to translate and inter-operate?

Internetwork Challenges

Internet[work]

802.3 Frame relay

ATM

8

Internet vs. internet The Internet: the interconnected set of

networks of the Internet Service Providers (ISPs) and end-networks, providing data communications services.◦ Network of internetworks, and more◦ About 17,000 different ISP networks make up

the Internet◦ Many other “end” networks◦ 100,000,000s of hosts

“The Internet”

9

Links can be◦ Wired or wireless

DLL: Links

Node Link Node

10

NLL: Source To Destination

11

Routing

R

R

R

RRH

H

H

H

R

RH

R

Routers send packet towards destination

H: Hosts

R: Routers

Why do Need to Divide messages?.

Because of Noise Conditions of the ChannelsNoise is rated as: 1 in 105

13

Why do we need to split a message?.No Monopolization?

Packets

Better Link Utilization

14

Short bursts: buffer Buffer sizes varies from network to network. So,

fragmentation takes places What if buffer overflows?

◦ Packets dropped◦ Sender adjusts rate until load = resources “congestion control”

Why do we need to split a message?.What if Network is Overloaded?

Problem: Network Overload

Solution: Buffering and Congestion Control

15

Why do we need to split a message?.What if the Data Doesn’t Fit?

Problem: Packet size

Solution: Fragment data across packets

• On Ethernet, max packet is 1.5KB• Typical web page is 10KB

GETindex.html

GET index.html

16

Implements an agreement between parties on how communication should take place

What do You Understand about Protocol?

Friendly greeting

Muttered reply

Destination?

Madison

Thank you

17

Each protocol offers interfaces ◦ One to higher-level protocols on the same end

hosts Expects one from the layers on which it builds Interface characteristics, e.g. IP service model

◦ A “peer interface” to a counterpart on destinations Syntax and semantics of communications (Assumptions about) data formats

Protocols build upon each other◦ Adds value, improves functionality overall

E.g., a reliable protocol running on top of IP◦ Reuse, avoid re-writing

E.g., OS provides TCP, so apps don’t have to rewrite

1. Protocols Offer Interfaces

18

Protocols are the key to interoperability.◦ Networks are very heterogenous:

◦ The hardware/software of communicating parties are often not built by the same vendor

◦ Yet they can communicate because they use the same protocol

Actually implementations could be different But must adhere to same specification

Protocols exist at many levels.◦ Application level protocols◦ Protocols at the hardware level

2. Protocols Necessary for Interoperability

Ethernet: 3com, etc.Routers: cisco, juniper etc.App: Email, AIM, IE etc.

Hardware/linkNetworkApplication

19

One or more protocols implement the functionality in a layer

◦ Only horizontal (among peers) and vertical (in a host) communication

Protocols/layers can be implemented and modified in isolation

Each layer offers a service to the higher layer, using the services of the lower layer.

“Peer” layers on different systems communicate via a protocol.

◦ higher level protocols (e.g. TCP/IP, Appletalk) can run on multiple lower layers

◦ multiple higher level protocols can share a single physical network

How do protocols/layers work?

20

TCP/IP vs OSI

Application(plus

libraries)

TCP/UDPIP

Data link

Physical

Application

Presentation

Session

Transport

Network

Data link

Physical

21

The Reality: TCP/IP Model

FTP HTTP TFTPNV

TCP UDP

IP

NET1 NET2 NETn… Network protocols implemented by a comb of hw and sw.

Interconnection of n/w technologies into a single logical n/w

Two transport protocols: provide logical channels to apps

App protocols

Note: No strict layering.

App writers can define apps that run on any lower level protocols.

22

The Thin Waist

UDP TCP

Data Link

Physical

Applications

The Hourglass Model

Waist

The waist: minimal, carefully chosen functions. Facilitates interoperability and rapid evolution

FTP HTTP TFTPNV

TCP UDP

IP

NET1 NET2 NETn…

23

TCP/IP Layering

Bridge/SwitchRouter/GatewayHost Host

Application

Transport

Network

Link

Physical

24

Layers & Encapsulation

Get index.html

Connection ID

Source/Destination

Link Address

User A User B

Header

25

Multiple choices at each layer

How to know which one to pick?

Protocol Demultiplexing

FTP HTTP TFTPNV

TCP UDP

IP

NET1 NET2 NETn…

TCP/UDPIPMany

Networks

26

Multiplexing & Demultiplexing Multiple

implementations of each layer◦ How does the receiver

know what version/module of a layer to use?

Packet header includes a demultiplexing field◦ Used to identify the right

module for next layer◦ Filled in by the sender◦ Used by the receiver

Multiplexing occurs at multiple layers. E.g., IP, TCP, …

IP

TCP

IP

TCP

V/HL TOS Length

ID Flags/Offset

TTL Prot. H. Checksum

Source IP address

Destination IP address

Options..

27

TCP Reliable – guarantee

delivery Byte stream – in-order

delivery Checksum for validity Setup connection followed

by data transfer

Transmission Control Protocol (TCP)

Telephone Call• Guaranteed delivery• In-order delivery• Setup connection followed

by conversation

Example TCP applicationsWeb, Email, Telnet

28

User Datagram Protocol (UDP)

Example UDP applicationsMultimedia, voice over IP

UDP• No guarantee of delivery• Not necessarily in-order

delivery• No validity guaranteed• Must address each

independent packet

Postal Mail• Unreliable• Not necessarily in-order

delivery• Must address each reply

29

Transport Service Requirements of Common Applications

no lossno lossno lossloss-tolerant

loss-tolerantloss-tolerantno loss

elasticelasticelasticaudio: 5Kb-1Mbvideo:10Kb-5Mbsame as above few Kbpselastic

nononoyes, 100’s msec

yes, few secsyes, 100’s msecyes and no

file transfere-mail

web documentsreal-time audio/

videostored audio/videointeractive games

financial apps

Application Data loss Bandwidth Time Sensitive

What do you know about big-endian and Little-endian machine?

Byte OrderDifferent computers may have different internal representation of 16 / 32-bit integer (called host byte order).Examples

Big-Endian byte order (e.g., used by Motorola 68000):

Little-Endian byte order (e.g., used by Intel 80x86):

32

◦ TCP/IP specifies a network byte order which is the big-endian byte order.

◦ For some WinSock functions, their arguments (i.e., the parameters to be passed to these functions) must be stored in network byte order.

◦ WinSock provides functions to convert between host byte order and network byte order:

Prototypes of Conversion Functions

A Peep Into Unix Internals

What is a Process?

Processes• A process has text: machine instructions

(may be shared by other processes) data stack

• Process may execute either in user mode or in kernel mode.• Process information are stored in two places:

Process table User table

User mode and Kernel mode

• At any given instant a computer running the Unix system is either executing a process or the kernel itself is running• The computer is in user mode when it is executing instructions in a user process and it is in kernel mode when it is executing instructions in the kernel.• Executing System call ==> User mode to Kernel mode perform I/O operations system clock interrupt

Process Table • Process table: an entry in process table has the following information: process state:

A. running in user mode or kernel modeB. Ready in memory or Ready but swappedC. Sleep in memory or sleep and swapped

PID: process id UID: user id scheduling information signals that is sent to the process but not yet handled a pointer to per-process-region table

• There is a single process table for the entire system

User Table (u area)• Each process has only one private user table.• User table contains information that must be accessible while the process is in execution. A pointer to the process table slot parameters of the current system call, return values

error codes file descriptors for all open files current directory and current root process and file size limits.

• User table is an extension of the process table.

u area

Active process

residentswappable

data

stack

text

Processtable

Per-processregion table

Regiontable

Kerneladdressspace

useraddressspace

Shared Program Text and Software Libraries

• Many programs, such as shell, are often being executed by several users simultaneously. • The text (program) part can be shared.• In order to be shared, a program must be compiled using a special option that arranges the process image so that the variable part(data and stack) and the fixed part (text) are cleanly separated. • An extension to the idea of sharing text is sharing libraries.• Without shared libraries, all the executing programs contain their own copies.

Active process

data

stack

text

Processtable

Per-processregion table

Regiontable

data

stack

text

Referencecount = 2

System Call• A process accesses system resources through system call.• System call for

Process Control: fork: create a new process wait: allow a parent process to synchronize its

execution with the exit of a child process. exec: invoke a new program. exit: terminate process execution

File system: File: open, read, write, lseek, close inode: chdir, chown chmod, stat fstat others: pipe dup, mount, unmount, link, unlink

System call: fork()

• fork: the only way for a user to create a process in Unix operating system.• The process that invokes fork is called parent process and the newly created process is called child process.• The syntax of fork system call:

newpid = fork();• On return from fork system call, the two processes have identical copies of their user-level context except for the return value pid. • In parent process, newpid = child process id• In child process, newpid = 0;

/* forkEx1.c */#include <stdio.h>

main(){ int fpid; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d\n", fpid); } else { printf("Parent Process fpid=%d\n", fpid); } printf("After forking fpid=%d\n", fpid); }

$ cc forkEx1.c -o forkEx1$ forkEx1Before forking ...Child Process fpid=0After forking fpid=0Parent Process fpid=14707After forking fpid=14707$

/* forkEx2.c */#include <stdio.h>

main(){ int fpid; printf("Before forking ...\n"); system("ps"); fpid = fork(); system("ps"); printf("After forking

fpid=%d\n", fpid);}

$ forkEx2Before forking ... PID TTY TIME CMD 14759 pts/9 0:00 tcsh 14778 pts/9 0:00 sh 14777 pts/9 0:00 forkEx2 PID TTY TIME CMD 14781 pts/9 0:00 sh 14759 pts/9 0:00 tcsh 14782 pts/9 0:00 sh 14780 pts/9 0:00 forkEx2 14777 pts/9 0:00 forkEx2After forking fpid=14780$ PID TTY TIME CMD 14781 pts/9 0:00 sh 14759 pts/9 0:00 tcsh 14780 pts/9 0:00 forkEx2After forking fpid=0

$ ps PID TTY TIME CMD 14759 pts/9 0:00 tcsh$

System Call: getpid() getppid()

• Each process has a unique process id (PID). • PID is an integer, typically in the range 0 through 65535.• Kernel assigns the PID when a new process is created.• Processes can obtain their PID by calling getpid().• Each process has a parent process and a corresponding parent process ID.• Processes can obtain their parent’s PID by calling getppid().

/* pid.c */#include <stdio.h>#include <sys/types.h>#include <unistd.h>

main(){ printf("pid=%d ppid=%d\n",getpid(), getppid());}

$ cc pid.c -o pid$ pidpid=14935 ppid=14759$

/* forkEx3.c */#include <stdio.h>#include <sys/types.h>#include <unistd.h>main(){ int fpid; printf("Before forking ...\n"); if((fpid = fork())== 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } printf("After forking fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid());}

$ cc forkEx3.c -o forkEx3$ forkEx3Before forking ...Parent Process fpid=14942 pid=14941 ppid=14759After forking fpid=14942 pid=14941 ppid=14759$ Child Process fpid=0 pid=14942 ppid=1After forking fpid=0 pid=14942 ppid=1

$ ps PID TTY TIME CMD 14759 pts/9 0:00 tcsh

System Call: wait()

• wait system call allows a parent process to wait for the demise of a child process. • See forkEx4.c

#include <stdio.h>#include <sys/types.h>#include <unistd.h>main(){ int fpid, status; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n",

fpid, getpid(), getppid()); } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n",

fpid, getpid(), getppid()); } wait(&status); printf("After forking fpid=%d pid=%d ppid=%d\n",

fpid, getpid(), getppid());}

$ cc forkEx4.c -o forkEx4$ forkEx4Before forking ...Parent Process fpid=14980 pid=14979 ppid=14759Child Process fpid=0 pid=14980 ppid=14979After forking fpid=0 pid=14980 ppid=14979After forking fpid=14980 pid=14979 ppid=14759$

System Call: exec()

• exec() system call invokes another program by replacing the current process• No new process table entry is created for exec() program. Thus, the total number of processes in the system isn’t changed.• Six different exec functions: execlp, execvp, execl, execv, execle, execve, (see man page for more detail.)• exec system call allows a process to choose its successor.

int execl(file_name, arg0 [, arg1, ..., argn], NULL) char *file_name, *arg0, *arg1, ..., *argn; int execv(file_name, argv) char *file_name, *argv[]; int execle(file_name, arg0 [, arg1, ..., argn], NULL, envp) char *file_name, *arg0, *arg1, ..., *argn, *envp[]; int execve(file_name, argv, envp) char *file_name, *argv[], *envp[]; int execlp(file_name, arg0 [, arg1, ..., argn], NULL) char *file_name, *arg0, *arg1, ..., *argn; int execvp(file_name, argv) char *file_name, *argv[];

/* execEx1.c */#include <stdio.h>#include <unistd.h>

main(){ printf("Before execing ...\n"); execl("/bin/date", "date", 0); printf("After exec\n"); }

$ execEx1Before execing ...Sun May 9 16:39:17 CST 1999$

/* execEx2.c */#include <sys/types.h>#include <unistd.h> #include <stdio.h>

main(){ int fpid; printf("Before execing ...\n"); fpid = fork(); if (fpid == 0) { execl("/bin/date", "date", 0); } printf("After exec and fpid=%d\n",fpid); }

$ execEx2Before execing ...After exec and fpid=14903$ Sun May 9 16:47:08 CST 1999$

Handling Signal

• A signal is a message from one process to another. • Signal are sometime called “software interrupt” • Signals usually occur asynchronously.• Signals can be sent A. by one process to anther (or to itself) B. by the kernel to a process.• Unix signals are content-free. That is the only thing that can be said about a signal is “it has arrived or not”

Handling Signal

• Most signals have predefined meanings: A. sighup (HangUp): when a terminal is closed, the hangup signal is sent to every process in control terminal. B. sigint (interrupt): ask politely a process to terminate. C. sigquit (quit): ask a process to terminate and produce a codedump. D. sigkill (kill): force a process to terminate.• See signEx1.c

#include <stdio.h>#include <sys/types.h>#include <unistd.h>main() { int fpid, *status; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n",

fpid, getpid(), getppid()); for(;;); /* loop forever */ } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n",

fpid, getpid(), getppid()); } wait(status); /* wait for child process */ printf("After forking fpid=%d pid=%d ppid=%d\n",

fpid, getpid(), getppid());}

$ cc sigEx1.c -o sigEx1$ sigEx1 &Before forking ...Parent Process fpid=14989 pid=14988 ppid=14759Child Process fpid=0 pid=14989 ppid=14988$ ps PID TTY TIME CMD 14988 pts/9 0:00 sigEx1 14759 pts/9 0:01 tcsh 14989 pts/9 0:09 sigEx1$ kill -9 14989$ ps ...

Scheduling Processes

• On a time sharing system, the kernel allocates the CPU to a process for a period of time (time slice or time quantum) preempts the process and schedules another one when time slice expired, and reschedules the process to continue execution at a later time.• The scheduler use round-robin with multilevel feedback algorithm to choose which process to be executed: A. Kernel allocates the CPU to a process for a time slice. B. preempts a process that exceeds its time slice. C. feeds it back into one of the several priority queues.

Process Priority

swapperwait for Disk IOwait for bufferwait for inode

...

wait for child exitUser level 0User level 1

User level n

...

Kernel ModeUser Mode

ProcessesPriority Levels

Process Scheduling (Unix System V)

• There are 3 processes A, B, C under the following assumptions: A. they are created simultaneously with initial priority 60. B. the clock interrupt the system 60 times per second. C. these processes make no system call. D. No other process are ready to run E. CPU usage calculation: CPU = decay(CPU) = CPU/2 F. Process priority calculation: priority = CPU/2 + 60. G. Rescheduling Calculation is done once per second.

Process A Priority CPU count

Process B Priority CPU count

Process C Priority CPU count

60 0 … 60

75 30

67 15

63 7 …

6776 33

60 0

60 0 … 60

75 30

67 15

63 7 ...

60 0

60 0

60 0 … 60

75 30

67 15

1

2

3

4

0

Booting

• When the computer is powered on or rebooted, a short built-in program (maybe store in ROM) reads the first block or two of the disk into memory. These blocks contain a loader program, which was placed on the disk when disk is formatted.• The loader is started. The loader searches the root directory for /unix or /root/unix and load the file into memory• The kernel starts to execute.

The first processes

• The kernel initializes its internal data structures: it constructs linked list of free inodes, regions, page table• The kernel creates u area and initializes slot 0 of process table• Process 0 is created• Process 0 forks, invoking the fork algorithm directly from the Kernel. Process 1 is created.• In kernel mode, Process 1 creates user-level context (regions) and copy code (/etc/init) to the new region.• Process 1 calls exec (executes init).

init process

• The init process is a process dispatcher:spawning processes, allow users to login.• Init reads /etc/inittab and spawns getty• when a user login successfully, getty goes through a login procedure and execs a login shell.• Init executes the wait system call, monitoring the death of its child processes and the death of orphaned processes by exiting parent.

Init fork/execa getty progrma

to manage the line

Getty prints “login:” message and

waits for someoneto login

The login processprints the

password message, read the password

then check the password

The shell runsprograms for the

user unitl the user logs off

When the shelldies, init wakes up

and fork/exec a getty for the line

File Subsystem

• A file system is a collection of files and directories on a disk or tape in standard UNIX file system format. • Each UNIX file system contains four major parts: A. boot block: B. superblock: C. i-node table: D. data block: file storage

File System Layout

Block 0: bootstrap

Block 1: superblock

Block 2

Block n

...

Block n+1

The last Block

...

Block 2 - n:i-nodes

Block n+1 - last:Files

Boot Block

• A boot block may contains several physical blocks.• Note that a physical block contains 512 bytes

(or 1K or 2KB)• A boot block contains a short loader program for

booting• It is blank on other file systems.

Superblock

• Superblock contains key information about a file system• Superblock information: A. Size of a file system and status:

label: name of this file systemsize: the number of logic blocksdate: the last modification date of super block.

B. information of i-nodesthe number of i-nodesthe number of free i-nodes

C. information of data block: free data blocks.• The information of a superblock is loaded into memory.

I-nodes

• i-node: index node (information node)• i-list: the list of i-nodes • i-number: the index of i-list.• The size of an i-node: 64 bytes. • i-node 0 is reserved.• i-node 1 is the root directory.• i-node structure: next page

I-node structuremode

owner

timestamp

Size

Block count

Direct blocks0-9

Double indirect

Triple indirect

Single indirect

Data block

Data block

Data block

Indirect block

...

Data block

Data block

Data block

...

Indirect block

Indirect block

Indirect block

...

Reference count

I-node structure

• mode: A. type: file, directory, pipe, symbolic link B. Access: read/write/execute (owner, group,)

• owner: who own this I-node (file, directory, ...)• timestamp: creation, modification, access time• size: the number of bytes• block count: the number of data blocks• direct blocks: pointers to the data• single indirect: pointer to a data block which

pointers to the data blocks (128 data blocks).• Double indirect: (128*128=16384 data blocks)• Triple indirect: (128*128*128 data blocks)

Data Block

• A data block has 512 bytes. A. Some FS has 1K or 2k bytes per blocks.B. See blocks size effect (next page)

• A data block may contains data of files or data of a directory.• File: a stream of bytes. • Directory format:

i-# Next size File name pad

Report.txt

home

john

bin

find

alex jenny

notes

grep

i-# Next 10 Report.txt pad i-# Next 3

bin pad i-# Next 5 notes pad 0 Next

Boot Block

SuperBlock

i-node

i-node

i-node

i-node

...

...

...

Current Dir

Report.txt

source

notes...

...

...

...

i-nodes

Data Blocks

Report.txt

home

kc

source

find

alex

notes

grep

Device driver &

HardwarecontrolCurrent

directoryinode

u areai-node

i-node

i-node

...

...

In-core inodes

In-core inode table

• UNIX system keeps regular files and directories on block devices such as disk or tape, • Such disk space are called physical device address space.• The kernel deals on a logical level with file system (logical device address space) rather than with disks.• Disk driver can transfer logical addresses into physical device addresses.• In-core (memory resident) inode table stores the inode information in kernel space.

In-core inode table

• An in-core inode contains A. all the information of inode in disks. B. status of in-core inode inode is locked, inode data changed file data changed. C. the logic device number of the file system. D. inode number E. reference count

File table

• The kernel have a global data structure, called file table, to store information of file access.• Each entry in file table contains: A. a pointer to in-core inode table B. the offset of next read or write in the file C. access rights (r/w) allowed to the opening process. D. reference count.

User File Descriptor table

• Each process has a user file descriptor table to identify all opened files.• An entry in user file descriptor table pointer to an entry of kernel’s global file table.• Entry 0: standard input• Entry 1: standard output• Entry 2: error output

System Call: open

• open: A process may open a existing file to read or write• syntax: fd = open(pathname, mode); A. pathname is the filename to be opened B. mode: read/write• Example

#include <stdio.h>#include <sys/types.h>#include <fcntl.h>

main(){ int fd1, fd2, fd3; printf("Before open ...\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("./openEx1.c", O_WRONLY); fd3 = open("/etc/passwd", O_RDONLY); printf("fd1=%d fd2=%d fd3=%d \n", fd1, fd2, fd3);}

$ cc openEx1.c -o openEx1$ openEx1Before open ...fd1=3 fd2=4 fd3=5 $

…

CNT=2/etc/passwd

CNT=1./openEx2.c

in-core inodes

Pointer to Descriptor table

U area

User filedescriptor table

0

1

2

3

4

5

6

7

.

.

....

...

CNT=1 R

CNT=1 W

...

CNT=1 R

file table

...

...

...

System Call: read

• read: A process may read an opened file• syntax: fd = read(fd, buffer, count); A. fd: file descriptor B. buffer: data to be stored in C. count: the number (count) of byte• Example


main(){ int fd1, fd2, fd3; char buf1[20], buf2[20]; buf1[19]='\0'; buf2[19]='\0'; printf("=======\n"); fd1 = open("/etc/passwd", O_RDONLY); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd1, buf2, 19); printf("fd1=%d buf2=%s \n",fd1, buf2); printf("=======\n");}

$ cc openEx2.c -o openEx2$ openEx2=======fd1=3 buf1=root:x:0:1:Super-Usfd1=3 buf2=er:/:/sbin/shdaemo =======$

#include <stdio.h>#include <sys/types.h>#include <fcntl.h>main(){ int fd1, fd2, fd3; char buf1[20], buf2[20]; buf1[19]='\0'; buf2[19]='\0'; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("/etc/passwd", O_RDONLY); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd2, buf2, 19); printf("fd2=%d buf2=%s \n",fd2, buf2); printf("======\n");}

$ cc openEx3.c -o openEx3$ openEx3======fd1=3 buf1=root:x:0:1:Super-Us fd2=4 buf2=root:x:0:1:Super-Us ======$

…

CNT=2/etc/passwd

...

in-core inodes

Descriptortable

U area


0

1

2

3

4

5

6

7

.

.

....

...

CNT=1 R

...

...

CNT=1 R

file table

...

...

...

System Call: dup

• dup: copy a file descriptor into the first free slot of the user file descriptor table.• syntax: newfd = dup(fd); A. fd: file descriptor Example

#include <stdio.h>#include <sys/types.h>#include <fcntl.h>main(){ int fd1, fd2, fd3; char buf1[20], buf2[20]; buf1[19]='\0'; buf2[19]='\0'; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = dup(fd1); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd2, buf2, 19); printf("fd2=%d buf2=%s \n",fd2, buf2); printf("======\n"); char buf1[20], buf2[20];}

$ cc openEx4.c -o openEx4$ openEx4======fd1=3 buf1=root:x:0:1:Super-Us fd2=4 buf2=er:/:/sbin/shdaemo ======$

…

CNT=1/etc/passwd

...

in-core inodes

Descriptortable

U area


0

1

2

3

4

5

6

7

.

.

....

...

CNT=2 R

...

...

...

file table

...

...

...

System Call: creat

• creat: A process may create a new file by creat system call• syntax: fd = write(pathname, mode); A. pathname: file name B. mode: read/write Example

System Call: close

• close: A process may close a file by close system call• syntax: close(fd); A. fd: file descriptor Example

System Call: write

• write: A process may write data to an opened file• syntax: fd = write(fd, buffer, count); A. fd: file descriptor B. buffer: data to be stored in C. count: the number (count) of byte• Example

/* creatEx1.c */#include <stdio.h>#include <sys/types.h>#include <fcntl.h>main(){ int fd1; char *buf1="I am a string\n"; char *buf2="second line\n"; printf("======\n"); fd1 = creat("./testCreat.txt", O_WRONLY); write(fd1, buf1, 20); write(fd1, buf2, 30); printf("fd1=%d buf1=%s \n",fd1, buf1); close(fd1); chmod("./testCreat.txt", 0666); printf("======\n");}

$ cc creatEx1.c -o creatEx1$ creatEx1======fd1=3 buf1=I am a string ======$ ls -l testCreat.txt-rw-rw-rw- 1 cheng staff 50 May 10 20:37 testCreat.txt$ more testCreat.txt...

System Call: stat/fstat

• stat/fstat: A process may query the status of a file (locked) file type, file owner, access permission. file size, number of links, inode number, access time.• syntax: stat(pathname, statbuffer); fstat(fd, statbuffer); A. pathname: file name B. statbuffer: read in data C. fd: file descriptor Example

/* statEx1.c */#include <sys/stat.h>main(){ int fd1, fd2, fd3; struct stat bufStat1, bufStat2; char buf1[20], buf2[20]; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("./statEx1", O_RDONLY); fstat(fd1, &bufStat1); fstat(fd2, &bufStat2); printf("fd1=%d inode no=%d block size=%d blocks=%d\n", fd1, bufStat1.st_ino,bufStat1.st_blksize, bufStat1.st_blocks); printf("fd2=%d inode no=%d block size=%d blocks=%d\n", fd2, bufStat2.st_ino,bufStat2.st_blksize, bufStat2.st_blocks); printf("======\n");}

$ cc statEx1.c -o statEx1$ statEx1======fd1=3 inode no=21954 block size=8192 blocks=6fd2=4 inode no=190611 block size=8192 blocks=======...

System Call: link/unlink

• link: hardlink a file to another • syntax: link(sourceFile, targetFile); unlink(file) A. sourceFile targetFile, file: file name Example: Lab exercise: write a c program which use link/unlink system call. Use ls -l to see the reference count.

System Call: chdir

• chdir: A process may change the current directory of a processl• syntax: chdir(pathname); A. pathname: file name Example


main(){ chdir("/usr/bin"); system("ls -l");}

$ ls -l /usr/bin$

pipe(int a[]) FILE* popen(char *command, char *mode) pclose(FILE*) mknod(char *, S_IFIFO|0644, 0) mknod filename p mkfifo filename

Pipes and Named pipes

Signals and signal handling

Signal Description

SIGABRT Process abort signal.

SIGALRM Alarm clock.

SIGFPE Erroneous arithmetic operation.

SIGHUP Hangup.

SIGILL Illegal instruction.

SIGINT Terminal interrupt signal.

SIGKILL Kill (cannot be caught or ignored).

SIGPIPE Write on a pipe with no one to read it.

SIGQUIT Terminal quit signal.

SIGSEGV Invalid memory reference.

SIGTERM Termination signal.

SIGUSR1 User-defined signal 1.

SIGUSR2 User-defined signal 2.

SIGCHLD Child process terminated or stopped.

SIGCONT Continue executing, if stopped.

SIGSTOP Stop executing (cannot be caught or ignored).

SIGTSTP Terminal stop signal.

SIGTTIN Background process attempting read.

SIGTTOU Background process attempting write.

SIGBUS Bus error.

SIGPOLL Pollable event.

SIGPROF Profiling timer expired.

SIGSYS Bad system call.

SIGTRAP Trace/breakpoint trap.

SIGURG High bandwidth data is available at a socket.

SIGVTALRM Virtual timer expired.

SIGXCPU CPU time limit exceeded.

SIGXFSZ File size limit exceeded.

Signal Handling Related Functions

int signal(int signo, void (*f)(int) );

Signal number

Handler

#include <stdio.h> /* standard I/O functions */#include <unistd.h> /* standard unix functions, like getpid() */#include <sys/types.h> /* various type definitions, like pid_t */#include <signal.h> /* signal name macros, and the signal() prototype *//* first, here is the signal handler */void catch_int(int sig_num){ /* re-set the signal handler again to catch_int, for next time */ signal(SIGINT, catch_int); /* and print the message */ printf("Don't do that"); fflush(stdout);}/* and somewhere later in the code.... *//* set the INT (Ctrl-C) signal handler to 'catch_int' */signal(SIGINT, catch_int);/* now, lets get into an infinite loop of doing nothing. */for ( ;; ) pause();

}

#include <stdio.h> /* standard I/O functions */#include <unistd.h> /* standard unix functions, like getpid() */#include <signal.h> /* signal name macros, and the signal() prototype *//* first, here is the signal handler */void catch_int(int sig_num){ /* re-set the signal handler again to catch_int, for next time */ signal(SIGINT, catch_int); printf("Don't do that\n"); fflush(stdout);}int main(int argc, char* argv[]){ /* set the INT (Ctrl-C) signal handler to 'catch_int' */ signal(SIGINT, catch_int); /* now, lets get into an infinite loop of doing nothing. */ for ( ;; ) pause();}

Signal setsSignal sets are data types (structures) to represent multiple signals. The following functions are used manipulate them.int sigemptyset(sigset_t *set); This function initializes the signal set pointed by set variable such that it contains no signals in it.int sigfillset(segset_t *set);This function fills the signal set pointed by set variable such that it contains all signals in it.int sigaddset(segset_t *set,int signo);This function adds a signal (with signal number signo) to the signal set pointed by set variable.int sigdelset(segset_t *set,int signo);This function removes a signal (with signal number signo) from the signal set pointed by set variable.int issigmember(segset_t *set,int signo);This function checks a signal (with signal number signo) is in the signal set pointed by set variable or not.int sigpending(sigset_t *set); This function returns the set of signals that are blocked from delivery and currently pending to the signal set pointed by set variable.int sigsuspend(sigset_t *set); This function sets the signal mask of the process to the signal set pointed by set variable. Also, the process is suspended until a

signal is caught or until a signal occurs that terminates the process.

sigprocmask( int how, const sigset_t *set , sigset_t *oldset );

SIG_BLOCKSIG_UNBLOCKSIG_SETMASK

int sigaction(int signo, const struct sigaction *act, struct sigaction *oact);

struct sigaction{ void (*sa_handler)(); /*pointer to function or SIG_DFL or SIG_IGN*/

sigset_t sa_mask/ /*additional signal to be blocked during execution of hander*/

int sa_flags; /*special flags and options*/}

Message Queuse#include <stdio.h>#include <sys/types.h>#include <sys/ipc.h>#include <sys/msg.h>int main(int argc, char* argv[]){ /* create a private message queue, with access only to the owner. */struct msgbuf* msg; struct msgbuf* recv_msg; int rc; int queue_id = msgget(IPC_PRIVATE, 0600);

if (queue_id == -1) { perror("main: msgget"); exit(1); } printf("message queue created, queue id '%d'.\n", queue_id); msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world")); msg->mtype = 1; strcpy(msg->mtext, "hello world"); rc = msgsnd(queue_id, msg, strlen(msg->mtext)+1, 0); if (rc == -1) { perror("main: msgsnd"); exit(1); } free(msg); printf("message placed on the queue successfully.\n"); recv_msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world")); rc = msgrcv(queue_id, recv_msg, strlen("hello world")+1, 0, 0); if (rc == -1) { perror("main: msgrcv"); exit(1); } printf("msgrcv: received message: mtype '%d'; mtext '%s'\n", recv_msg->mtype, recv_msg->mtext); return 0;}

Let Us Return to TCP/IP Programming

115

Address + Port (TCP/IP)

FTP [21]

HTTP [80]

SMTP [25]

Telnet [23]

192.168.19.1

192.168.19.3

192.168.19.2

192.168.19.2 [21]

192.168.19. [21]192.168.19.2[21]192.168.19.2 [21]

198.163.197.4

198.163.197.4 [x]

192.168.19.0

Internet

116

Network Addressing Analogy

412-268-8000 ext.123

Central Number

Applications/Servers

WebPort 80

MailPort 25

Exchange

Area Code

412-268-8000 ext.654

IP Address

Network No.

Host Number

Telephone No

15-441 Students Clients

Professors at CMU

Network ProgrammingTelephone Call

Port No.Extension

117

Concept of Port Numbers

◦Port numbers are used to identify “entities” on a host

◦Port numbers can be Well-known (port 0-1023) Dynamic or private (port 1024-

65535)

◦Servers/daemons usually use well-known ports Any client can identify the

server/service HTTP = 80, FTP = 21, Telnet = 23, ... /etc/service defines well-known ports

◦Clients usually use dynamic ports Assigned by the kernel at run time

TCP/UDP

IP

Ethernet Adapter

NTPdaemon

Web server

port 123 port 80

What are Ports

Consider Railway Station Counter 0: Platform TicketsCounter 1: EnquiriesCounter 2: Reservations-------Counter 8: Current ReservationsCounter 9: Cancellations

Each host machine has an IP address When a packet arrives at a host

119

medellin.cs.columbia.edu

(128.59.21.14)

cluster.cs.columbia.edu

(128.59.21.14, 128.59.16.7, 128.59.16.5, 128.59.16.4)

newworld.cs.umass.edu

(128.119.245.93)

Transfer file to/from remote host

Client/server model◦ Client: side that initiates transfer (either to/from remote)◦ Server: remote host

ftp: RFC 959

ftp server: port 21

An ExampleFTP: The File Transfer Protocol

file transfer FTPserver

FTPuser

interface

FTPclient

local filesystem

remote filesystem

user at host

Ftp client contacts ftp server at port 21, specifying TCP as transport protocol

Two parallel TCP connections opened:◦ Control: exchange

commands, responses between client, server.

“out of band control”◦ Data: file data to/from

server

FTPclient

FTPserver

TCP control connection

port 21

TCP data connectionport 20

The interface that the OS provides to its networking subsystem

application layer

transport layer (TCP/UDP)network layer (IP)

link layer (e.g. ethernet)physical layer

application layer

transport layer (TCP/UDP)network layer (IP)

link layer (e.g. ethernet)physical layer

OS networkstack

Sockets as means for inter-process communication (IPC)

Client Process Server ProcessSocke

tOS network

stack

Socket

Internet

Internet

Internet

Address the machine on the network◦ By IP address

Address the process◦ By the “port”-number

The pair of IP-address + port – makes up a “socket-address”

Internet Connections (TCP/IP)

Connection socket pair(128.2.194.242:3479, 208.216.181.15:80)

Server(port 80)

Client

Client socket address128.2.194.242:3479

Server socket address

208.216.181.15:80

Client host address128.2.194.242

Server host address208.216.181.15

Note: 3479 is anephemeral port allocated

by the kernel

Note: 80 is a well-known portassociated with Web servers

Examples of client programs◦ Web browsers, ftp, telnet, ssh

How does a client find the server?◦ The IP address in the server socket address identifies the

host◦ The (well-known) port in the server socket address

identifies the service, and thus implicitly identifies the server process that performs that service.

◦ Examples of well known ports Port 7: Echo server Port 23: Telnet server Port 25: Mail server Port 80: Web server

Clients

Using Ports to Identify Services

Web server(port 80)

Client host

Server host 128.2.194.242

Echo server(port 7)

Service request for128.2.194.242:80

(i.e., the Web server)

Web server(port 80)

Echo server(port 7)

Service request for128.2.194.242:7

(i.e., the echo server)

Kernel

Kernel

Client

Client

Servers are long-running processes (daemons).◦ Created at boot-time (typically) by the init process

(process 1)◦ Run continuously until the machine is turned off.

Each server waits for requests to arrive on a well-known port associated with a particular service.◦ Port 7: echo server◦ Port 23: telnet server◦ Port 25: mail server◦ Port 80: HTTP server

Other applications should choose between 1024 and 65535

Servers

See /etc/services for a comprehensive list of the services available on a Linux machine.

What is a socket?◦ To the kernel, a socket is an endpoint of communication.◦ To an application, a socket is a file descriptor that lets the

application read/write from/to the network. Remember: All Unix I/O devices, including networks, are

modeled as files.

Clients and servers communicate with each by reading from and writing to socket descriptors.

The main distinction between regular file I/O and socket I/O is how the application “opens” the socket descriptors.

Sockets

128

Endpoint Address◦ Generic Endpoint Address

The socket abstraction accommodates many protocol families. It supports many address families. It defines the following generic endpoint address: ( address family, endpoint address in that family )

Data type for generic endpoint address:

◦ TCP/IP Endpoint Address For TCP/IP, an endpoint address is composed of the

following items: Address family is AF_INET (Address Family for

InterNET). Endpoint address in that family is composed of an IP

address and a port number.

129

The IP address identifies a particular computer, while the port number identifies a particular application running on that computer.

The TCP/IP endpoint address is a special instance of the generic one:

Port Number A port number identifies an application running on a

computer. When a client program is executed, WinSock randomly

chooses an unused port number for it. Each server program must have a pre-specified port

number, so that the client can contact the server.

130

The port number is composed of 16 bits, and its possible values are used in the following manner: 0 - 1023: For well-known server applications. 1024 - 49151: For user-defined server applications

(typical range to be used is 1024 - 5000). 49152 - 65535: For client programs.

Port numbers for some well-known server applications: WWW server using TCP: 80 Telnet server using TCP: 23 SMTP (email) server using TCP: 25 SNMP server using UDP: 161.

131

Unix File Descriptor TableDescriptor Table

0

1

2

3

4Data structure for file 0

Data structure for file 1

Data structure for file 2

Standard input

Standard output

Standard error

132

Socket Descriptor Data StructureDescriptor Table

0

1

2

3

4

Family: PF_INETService: SOCK_STREAMLocal IP: 111.22.3.4Remote IP: 123.45.6.78Local Port: 2249Remote Port: 3726

Family: PF_INETService: SOCK_STREAMLocal IP: 111.22.3.4Remote IP: 123.45.6.78Local Port: 2249Remote Port: 3726

133

Hierarchical vs. flat◦ Wisconsin / Madison / UW-Campus / Aditya

vs. Aditya:123-45-6789

◦ Ethernet addresses are flat

What information would routers need to route to Ethernet addresses?◦ Hierarchical structure crucial for designing scalable binding from interface

name to route◦ Route to a general area, then to a specific location

What type of Hierarchy?◦ How many levels?◦ Same hierarchy depth for everyone?

Address broken in segments of increasing specificity◦ Uniform for everybody: needs centralized management◦ Non-uniform: more flexible, needs careful decentralized management

Addressing in IP: Considerations

134

IP Addresses Fixed length: 32 bits

Total IP address size: 4 billion

Initial class-ful structure (1981)◦ Class A: 128 networks, 16M hosts◦ Class B: 16K networks, 64K hosts◦ Class C: 2M networks, 256 hosts

135

IP Address Classes(Some are Obsolete)

Network ID Host ID

Network ID Host ID

8 16

Class A32

0

Class B 10

Class C 110

Multicast AddressesClass D 1110

Reserved for experimentsClass E 1111

24

136

Address would specify prefix for forwarding table◦ Simple lookup

www.cmu.edu address 128.2.11.43◦ Class B address – class + network is 128.2◦ Lookup 128.2 in forwarding table◦ Prefix – part of address that really matters for routing

Forwarding table contains◦ List of class+network entries◦ A few fixed prefix lengths (8/16/24)

Large tables◦ 2 Million class C networks

Original IP Route Lookup

137

Subnet Addressing: RFC917 (1984)

Original goal: network part would uniquely identify a single physical network

Inefficient address space usage◦ Class A & B networks too big

Also, very few LANs have close to 64K hosts Easy for networks to (claim to) outgrow class-C

◦ Each physical network must have one network number

Routing table size is too high

Need simple way to reduce the number of network numbers assigned◦ Subnetting: Split up single network address ranges◦ Fizes routing table size problem, partially

138

Add another “floating” layer to hierarchy

Variable length subnet masks◦ Could subnet a class B into several chunks

Subnetting

Network Host

Network HostSubnet

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0SubnetMask

139

Assume an organization was assigned address 150.100 (class B)

Assume < 100 hosts per subnet (department)

How many host bits do we need?◦ Seven

What is the network mask?◦ 11111111 11111111 11111111 10000000◦ 255.255.255.128

Subnetting Example

140

Forwarding Example• Host configured with IP adress and subnet

mask• Subnet number = IP (AND) Mask• (Subnet number, subnet mask) Outgoing

I/F

D = destination IP addressFor each forwarding table entry (SN, SM OI)

D1 = SM & Dif (D1 == SN)

Deliver on OIElse

Forward to default router

141

Address space depletion◦ In danger of running out of classes A and B◦ Why?

Class C too small for most domains Very few class A – very careful about giving them out Class B poses greatest problem

◦ Class B sparsely populated But people refuse to give it back

Inefficient Address Usage

142

Allows arbitrary split between network & host part of address ◦ Do not use classes to determine network ID◦ Use common part of address as network number◦ Allows handing out arbitrary sized chunks of address

space◦ E.g., addresses 192.4.16 - 192.4.31 have the first 20 bits

in common. Thus, we use these 20 bits as the network number 192.4.16/20

Enables more efficient usage of address space (and router tables)◦ Use single entry for range in forwarding tables◦ Combine forwarding entries when possible

Classless Inter-Domain Routing(CIDR) – RFC1338

143

Network is allocated 8 contiguous chunks of 256-host addresses 200.10.0.0 to 200.10.7.255◦ Allocation uses 3 bits of class C space◦ Remaining 20 bits are network number, written

as 201.10.0.0/21

Replaces 8 class C routing entries with 1 combined entry◦ Routing protocols carry prefix with destination

network address

CIDR Example

144

Network (network portion): Get allocated portion of ISP’s address

space:

ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23

Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23

Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23

... ….. …. ….

Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

IP Addresses: How to Get One?

145

How does an ISP get block of addresses?◦ From Regional Internet Registries (RIRs)

ARIN (North America, Southern Africa), APNIC (Asia-Pacific), RIPE (Europe, Northern Africa), LACNIC (South America)

How about a single host?◦ Hard-coded by system admin in a file◦ DHCP: Dynamic Host Configuration Protocol:

dynamically get address: “plug-and-play” Host broadcasts “DHCP discover” msg DHCP server responds with “DHCP offer” msg Host requests IP address: “DHCP request” msg DHCP server sends address: “DHCP ack” msg

IP Addresses: How to Get One?

146

Back to CIDR

Provider is given 201.10.0.0/21

201.10.0.0/22 201.10.4.0/24 201.10.5.0/24 201.10.6.0/23

Provider

CIDR implications:

Longest prefix match

Route aggregation

147

Global Address Example

Receiver

Packet

R

Sender2

34

1

2

34

1

2

34

1

R2

R3

R1

R

RR 3

R 4

R 3

R

148

Source Routing Example

Receiver

Packet

R1, R2, R3, R

Sender2

34

1

2

34

1

2

34

1

R2

R3

R1

R2, R3, R

R3, R

R

149

Virtual Circuits Example

Receiver

Packet

1,5 3,7

Sender2

34

1 1,7 4,2

2

34

1

2

34

1

2,2 3,6

R2

R3

R1

5 7

2

6

• Network picks a path• Assigns VC numbers for flow on each link• Populates forwarding table

5 7

2

6

150

Routing Gets Packet to Correct Local Network◦ Based on IP address◦ Router sees that destination address is of local machine

Still Need to Get Packet to Host◦ Using link-layer protocol◦ Need to know hardware address

Same Issue for Any Local Communication◦ Find local machine, given its IP address

Finding a Local Machine

host host host

LAN 1

...

routerWAN

128.2.198.222

128.2.254.36

Destination = 128.2.198.222

151

◦ Diagrammed for Ethernet (6-byte MAC addresses) Low-Level Protocol

◦ Operates only within local network◦ Determines mapping from IP address to hardware (MAC)

address◦ Mapping determined dynamically

No need to statically configure tables Only requirement is that each host know its own IP address

Address Resolution Protocol (ARP)

op

Sender MAC address

Sender IP Address

Target MAC address

Target IP Address

• op: Operation– 1: request– 2: reply

• Sender– Host sending ARP

message• Target

– Intended receiver of message

152

Requestor◦ Fills in own IP and MAC address as “sender”

Why include its MAC address? Mapping

◦ Fills desired host IP address in target IP address Sending

◦ Send to MAC address ff:ff:ff:ff:ff:ff Ethernet broadcast

ARP Requestop

Sender MAC address

Sender IP Address

Target MAC address

Target IP Address

• op: Operation– 1: request

• Sender– Host that wants to

determine MAC address of another machine

• Target– Other machine

153

Responder becomes “sender”◦ Fill in own IP and MAC address◦ Set requestor as target◦ Send to requestor’s MAC address

ARP Reply

op

Sender MAC address

Sender IP Address

Target MAC address

Target IP Address

• op: Operation– 2: reply

• Sender– Host with desired IP

address• Target

– Original requestor

154

Host 128.2.209.100 when plugged into CS ethernet Dest 128.2.209.100 routing to same machine Dest 128.2.0.0 other hosts on same ethernet Dest 127.0.0.0 special loopback address Dest 0.0.0.0 default route to rest of Internet

◦ Main CS router: gigrouter.net.cs.cmu.edu (128.2.254.36)

Host Routing Table Example

Destination Gateway Genmask Iface128.2.209.100 0.0.0.0 255.255.255.255 eth0128.2.0.0 0.0.0.0 255.255.0.0 eth0127.0.0.0 0.0.0.0 255.0.0.0 lo0.0.0.0 128.2.254.36 0.0.0.0 eth0

Auto-configuration IP address, netmask, gateway, hostname, etc., etc.

◦ Type by hand!!!

IPv4 option 1: RARP (Reverse ARP)◦ Data-link protocol

Uses ARP format. New opcodes: “Request reverse”, “reply reverse”

◦ Send query: Request-reverse [ether addr], server responds with IP Used primarily by diskless nodes, when they first initialize, to

find their Internet address

IPv4 option 2: DHCP ◦ Dynamic Host Configuration Protocol◦ ARP is fine for assigning an IP, but is very limited◦ DHCP can provide all the info necessary

DHCP

DHCPOFFER◦ IP addressing information◦ Boot file/server information (for network booting)◦ DNS name servers◦ Lots of other stuff - protocol is extensible; half of the options reserved for

local site definition and use.

DHCPDISCOVER - broadcast

DHCPOFFER

DHCPREQUEST

DHCPACK

DHCP Features Lease-based assignment

◦ Clients can renew: Servers really should preserve this information across client & server reboots.

Provide host configuration information◦ Not just IP address stuff.◦ NTP servers, IP config, link layer config,…

Use:◦ Generic config for desktops/dial-in/etc.

Assign IP address/etc., from pool◦ Specific config for particular machines

Central configuration management

Network Layer4-

159

DHCP: Dynamic Host Configuration Protocol

Goal: allow host to dynamically obtain its IP address from network server when it joins networkCan renew its lease on address in useAllows reuse of addresses (only hold address while connected an “on”)Support for mobile users who want to join network (more shortly)

DHCP overview:◦ host broadcasts “DHCP discover” msg [optional]◦ DHCP server responds with “DHCP offer” msg [optional]◦ host requests IP address: “DHCP request” msg◦ DHCP server sends address: “DHCP ack” msg

Network Layer4-

160

DHCP client-server scenario

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

A

BE

DHCP server

arriving DHCP client needsaddress in thisnetwork

Network Layer4-

161

DHCP client-server scenarioDHCP server: 223.1.2.5 arriving

client

time

DHCP discover

src : 0.0.0.0, 68 dest.: 255.255.255.255,67yiaddr: 0.0.0.0transaction ID: 654

DHCP offer

src: 223.1.2.5, 67 dest: 255.255.255.255, 68yiaddrr: 223.1.2.4transaction ID: 654Lifetime: 3600 secs

DHCP request

src: 0.0.0.0, 68 dest:: 255.255.255.255, 67yiaddrr: 223.1.2.4transaction ID: 655Lifetime: 3600 secs

DHCP ACK

src: 223.1.2.5, 67 dest: 255.255.255.255, 68yiaddrr: 223.1.2.4transaction ID: 655Lifetime: 3600 secs

DHCP: more than IP address

DHCP can return more than just allocated IP address on subnet: address of first-hop router for client name and IP address of DNS sever network mask (indicating network versus

host portion of address)

DHCP: example

connecting laptop needs its IP address, addr of first-hop router, addr of DNS server: use DHCP

router(runs DHCP)

DHCPUDP

IPEthPhy

DHCP

DHCP

DHCP

DHCP

DHCP

DHCPUDP

IPEthPhy

DHCP

DHCP

DHCP

DHCPDHCP

DHCP request encapsulated in UDP, encapsulated in IP, encapsulated in 802.1 Ethernet Ethernet frame broadcast (dest: FFFFFFFFFFFF) on LAN, received at router running DHCP server

Ethernet demux’ed to IP demux’ed, UDP demux’ed to DHCP

168.1.1.1

DCP server formulates DHCP ACK containing client’s IP address, IP address of first-hop router for client, name & IP address of DNS server

router(runs DHCP)

DHCPUDP

IPEthPhy

DHCP

DHCP

DHCP

DHCP

DHCPUDP

IPEthPhy

DHCP

DHCP

DHCP

DHCP

DHCP

encapsulation of DHCP server, frame forwarded to client, demux’ing up to DHCP at client

client now knows its IP address, name and IP address of DSN server, IP address of its first-hop router

DHCP: example

DHCP: wireshark output (home LAN)

Message type: Boot Reply (2)Hardware type: EthernetHardware address length: 6Hops: 0Transaction ID: 0x6b3a11b7Seconds elapsed: 0Bootp flags: 0x0000 (Unicast)Client IP address: 192.168.1.101 (192.168.1.101)Your (client) IP address: 0.0.0.0 (0.0.0.0)Next server IP address: 192.168.1.1 (192.168.1.1)Relay agent IP address: 0.0.0.0 (0.0.0.0)Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)Server host name not givenBoot file name not givenMagic cookie: (OK)Option: (t=53,l=1) DHCP Message Type = DHCP ACKOption: (t=54,l=4) Server Identifier = 192.168.1.1Option: (t=1,l=4) Subnet Mask = 255.255.255.0Option: (t=3,l=4) Router = 192.168.1.1Option: (6) Domain Name Server Length: 12; Value: 445747E2445749F244574092; IP Address: 68.87.71.226; IP Address: 68.87.73.242; IP Address: 68.87.64.146Option: (t=15,l=20) Domain Name = "hsd1.ma.comcast.net."

reply

Message type: Boot Request (1)Hardware type: EthernetHardware address length: 6Hops: 0Transaction ID: 0x6b3a11b7Seconds elapsed: 0Bootp flags: 0x0000 (Unicast)Client IP address: 0.0.0.0 (0.0.0.0)Your (client) IP address: 0.0.0.0 (0.0.0.0)Next server IP address: 0.0.0.0 (0.0.0.0)Relay agent IP address: 0.0.0.0 (0.0.0.0)Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)Server host name not givenBoot file name not givenMagic cookie: (OK)Option: (t=53,l=1) DHCP Message Type = DHCP RequestOption: (61) Client identifier Length: 7; Value: 010016D323688A; Hardware type: Ethernet Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)Option: (t=50,l=4) Requested IP Address = 192.168.1.101Option: (t=12,l=5) Host Name = "nomad"Option: (55) Parameter Request List Length: 11; Value: 010F03062C2E2F1F21F92B 1 = Subnet Mask; 15 = Domain Name 3 = Router; 6 = Domain Name Server 44 = NetBIOS over TCP/IP Name Server ……

request

IPv6 Auto-configuration Serverless (“Stateless”). No manual config at all.

◦ Only configures addressing items, NOT other host things Use DHCP for such things

Link-local address◦ 1111 1110 10 :: 64 bit interface ID (usually from

Ethernet addr) (fe80::/64 prefix)

◦ Uniqueness test (“anyone using this address?”)◦ Router contact (solicit, or wait for announcement)

Contains globally unique prefix Usually: Concatenate this prefix with local ID -> globally

unique IPv6 ID

167

DNS Design

DNS Today

Domain Name System

168

Need naming to identify resources

Once identified, resource must be located

How to name resource?◦ Naming hierarchy

How do we efficiently locate resources?◦ DNS: name location (IP address)

Challenge: How do we scale these to the wide area?

Naming

169

Lookup a Central DNS? Single point of failure Traffic volume Distant centralized database Single point of update Doesn’t scale!

Obvious Solutions (1)

170

Why not use /etc/hosts? Original Name to Address Mapping

◦ Flat namespace◦ Lookup mapping in /etc/hosts ◦ Downloaded regularly

Count of hosts was increasing: machine per domain machine per user◦ Many more downloads◦ Many more updates

Obvious Solutions (2)

171

Basically a wide-area distributed database of name to IP mappings

Goals:◦ Scalability◦ Decentralized maintenance◦ Robustness◦ Global scope

Names mean the same thing everywhere

◦ Don’t need Atomicity Strong consistency

Domain Name System Goals

172

Conceptually, programmers can view the DNS database as a collection of millions of host entry structures:

◦ in_addr is a struct consisting of 4-byte IP address Functions for retrieving host entries from DNS:◦gethostbyname: query key is a DNS host name.◦gethostbyaddr: query key is an IP address.

Programmer’s View of DNS

/* DNS host entry structure */ struct hostent { char *h_name; /* official domain name of host */ char **h_aliases; /* null-terminated array of domain names */ int h_addrtype; /* host address type (AF_INET) */ int h_length; /* length of an address, in bytes */ char **h_addr_list; /* null-terminated array of in_addr structs */ };

173

DNS Message Format

Identification

No. of Questions

No. of Authority RRs

Questions (variable number of answers)

Answers (variable number of resource records)

Authority (variable number of resource records)

Additional Info (variable number of resource records)

Flags

No. of Answer RRs

No. of Additional RRs

Name, type fields for a query

RRs in response to queryRecords for authoritative servers

Additional “helpful info that may be used

12 bytes

174

Identification◦ Used to match up request/response

Flags◦ 1-bit to mark query or response◦ 1-bit to mark authoritative or not◦ 1-bit to request recursive resolution◦ 1-bit to indicate support for recursive resolution

DNS Header Fields

175

FOR IN class:

Type=A◦ name is hostname◦ value is IP address

Type=NS◦ name is domain (e.g. foo.com)◦ value is name of authoritative

name server for this domain

DNS Records

RR format: (class, name, value, type, ttl)

• DB contains tuples called resource records (RRs)– Classes = Internet (IN), Chaosnet (CH), etc.– Each class defines value associated with type

• Type=CNAME– name is an alias name

for some “canonical” (the real) name

– value is canonical name• Type=MX

– value is hostname of mailserver associated with name

176

Different kinds of mappings are possible:◦ Simple case: 1-1 mapping between domain name and IP

addr: kittyhawk.cmcl.cs.cmu.edu maps to 128.2.194.242

◦ Multiple domain names maps to the same IP address: eecs.mit.edu and cs.mit.edu both map to 18.62.1.6

◦ Single domain name maps to multiple IP addresses: aol.com and www.aol.com map to multiple IP addrs.

◦ Some valid domain names don’t map to any IP address: for example: cs.wisc.edu

Properties of DNS Host Entries

177

DNS Design: Hierarchy Definitions

root (.)

edunetorg

ukcom

gwu ucb wisc cmumit

cs ee

wail

• Each node in hierarchy stores a list of names that end with same suffix

• Suffix = path up tree

• E.g., given this tree, where would following be stored:

• Fred.com• Fred.edu• Fred.wisc.edu• Fred.cs.wisc.edu• Fred.cs.cmu.edu

178

DNS Design: Zone Definitions

root

edunetorg

ukcomca

gwu ucb cmu bu mit

cs ece

cmcl Single node

Subtree

Complete Tree

• Zone = contiguous section of name space

• E.g., Complete tree, single node or subtree

• A zone has an associated set of name servers

• Must store list of names and tree links

179

Zones are created by convincing owner node to create/delegate a subzone◦ Records within zone store multiple redundant

name servers◦ Primary/master name server updated manually◦ Secondary/redundant servers updated by zone

transfer of name space Zone transfer is a bulk transfer of the “configuration” of a

DNS server – uses TCP to ensure reliability

Example:◦ CS.WISC.EDU created by WISC.EDU administrators◦ Who creates WISC.EDU or .EDU?

DNS Design: Cont.

180

Responsible for “root” zone

Approx. 13 root name servers worldwide◦ Currently {a-

m}.root-servers.net

Local name servers contact root servers when they cannot resolve a name◦ Configured with

well-known root servers

DNS: Root Name Servers

181

Each host has a resolver◦ Typically a library that applications can link to◦ Resolves contacts name server◦ Local name servers hand-configured (e.g.

/etc/resolv.conf)

Name servers◦ Either responsible for some zone or…◦ Local servers

Do lookup of distant host names for local hosts Typically answer queries about local zone

Servers/Resolvers

182

Steps for resolving www.wisc.edu◦ Application calls gethostbyname() (RESOLVER)◦ Resolver contacts local name server (S1)

◦ S1 queries root server (S2) for (www.wisc.edu)

◦ S2 returns NS record for wisc.edu (S3)

◦ What about A record for S3? This is what the additional information section is for

(PREFETCHING)

◦ S1 queries S3 for www.wisc.edu

◦ S3 returns A record for www.wisc.edu

Can return multiple A records what does this mean?

Typical Resolution

http://www.wisc.edu/



183

Recursive query: Server goes out and

searches for more info (recursive)

Only returns final answer or “not found”

Iterative query: Server responds with as

much as it knows (iterative)

“I don’t know this name, but ask this server”

Workload impact on choice?

Local server typically does recursive

Root/distant server does iterative

Lookup Methods

requesting hostsurf.eurecom.fr

gaia.cs.umass.edu

root name server

local name serverdns.eurecom.fr

1

2

34

5 6 authoritative name server

dns.cs.umass.edu

intermediate name serverdns.umass.edu

7

8

iterated query

184

Are all servers/names likely to be equally popular?◦ Why might this be a problem? How can we solve this problem?

DNS responses are cached ◦ Quick response for repeated translations◦ Other queries may reuse some parts of lookup

NS records for domains

DNS negative queries are cached◦ Don’t have to repeat past mistakes◦ E.g. misspellings, search strings in resolv.conf

Cached data periodically times out◦ Lifetime (TTL) of data controlled by owner of data◦ TTL passed with every record

Workload and Caching

185

Typical Resolution

Clientresolver

Local DNS server

root & edu DNS server

ns1.wisc.edu DNS server

www.cs.wisc.edu

NS ns1.wisc.eduwww.cs.wisc.edu

NS ns1.cs.wisc.edu

A www=IPaddr

ns1.cs.wisc.eduDNS

server

186

Subsequent Lookup Example

ClientLocal

DNS server

root & edu DNS server

wisc.edu DNS server

cs.wisc.eduDNS

server

ftp.cs.wisc.edu

ftp=IPaddr

ftp.cs.wisc.edu

187

DNS servers are replicated◦ Name service available if ≥ one replica is up◦ Queries can be load balanced between replicas

UDP used for queries◦ Need reliability must implement this on top of UDP!◦ Why not just use TCP?

Try alternate servers on timeout◦ Exponential backoff when retrying same server

Same identifier for all queries◦ Don’t care which server responds

Reliability

188

Task◦ Given IP address, find its name◦ When is this needed?

Method◦ Maintain separate hierarchy

based on IP names◦ Write 128.2.194.242 as

242.194.2.128.in-addr.arpa Why is the address reversed?

Managing◦ Authority manages IP addresses

assigned to it◦ E.g., CMU manages name space

2.128.in-addr.arpa

Reverse DNS

edu

cmu

cs

kittyhawk128.2.194.242

cmcl

unnamed root

arpa

in-addr

128

2

194

242

189

Name servers can add additional data to response

Typically used for prefetching◦ CNAME/MX/NS typically point to another host

name◦ Responses include address of host referred to in

“additional section”

Prefetching

190

Generic Top Level Domains (gTLD) = .com, .net, .org, etc…

Country Code Top Level Domain (ccTLD) = .us, .ca, .fi, .uk, etc…

Root server ({a-m}.root-servers.net) also used to cover gTLD domains◦ Load on root servers was growing quickly!◦ Moving .com, .net, .org off root servers was

clearly necessary to reduce load done Aug 2000

DNS Today: Root Zone

191

.info general info .biz businesses .aero air-transport industry .coop business cooperatives .name individuals .pro accountants, lawyers, and

physicians .museum museums Only new one actives so far

= .info, .biz, .name

New gTLDs

192

No centralized caching per site◦ Each machine runs own caching local server◦ Why is this a problem?◦ How many hosts do we need to share cache? recent studies

suggest 10-20 hosts

Hit rate for DNS = 80% 1 - (#DNS/#connections)◦ Is this good or bad?

Most Internet traffic is Web◦ What does a typical page look like? average of 4-5 imbedded

objects needs 4-5 transfers◦ This alone accounts for 80% hit rate!

Lower TTLs for A records does not affect performance

DNS performance really relies more on NS-record caching

DNS Performance

Programmers Perspective

194

Socket API introduced in BSD4.1

UNIX, 1981 explicitly created, used,

released by apps client/server paradigm two types of transport

service via socket API: ◦ unreliable datagram ◦ reliable, byte stream-

oriented

Socket programming

a host-local, application-created,

OS-controlled interface (a “door”) into which

application process can both send and

receive messages to/from another

application process

socket

Goal: learn how to build client/server application that communicate using sockets

195

Server and Client

TCP/UDP

IP

Ethernet Adapter

Server

TCP/UDP

IP

Ethernet Adapter

Clients

Server and Client exchange messages over the network through a common Socket API

Socket API

hardware

kernel space

user spaceports

196

Socket: a door between application process and end-end-transport protocol (UDP or TCP)

TCP service: reliable transfer of bytes from one process to another

Socket-programming using TCP

process

TCP withbuffers,

variables

socket

controlled byapplicationdeveloper

controlled byoperating

system

host orserver

process

TCP withbuffers,

variables

socket

controlled byapplicationdeveloper

controlled byoperatingsystem

host orserver

internet

197

Client must contact server server process must first

be running server must have created

socket (door) that welcomes client’s contact

Client contacts server by: creating client-local TCP

socket specifying IP address, port

number of server process When client creates

socket: client TCP establishes connection to server TCP

When contacted by client, server TCP creates new socket for server process to communicate with client◦ allows server to talk with

multiple clients◦ source port numbers

used to distinguish clients (more in Chap 3)

Socket programming with TCP

TCP provides reliable, in-order transfer of bytes (“pipe”) between client and server

application viewpoint

198

A stream is a sequence of characters that flow into or out of a process.

An input stream is attached to some input source for the process, eg, keyboard or socket.

An output stream is attached to an output source, eg, monitor or socket.

Stream jargon

199

Example client-server app:

1) client reads line from standard input (inFromUser stream) , sends to server via socket (outToServer stream)

2) server reads line from socket

3) server converts line to uppercase, sends back to client

4) client reads, prints modified line from socket (inFromServer stream)

Socket programming with TCP

outT

oSer

ver

to network from network

inFr

omS

erve

r

inFr

omU

ser

keyboard monitor

Process

clientSocket

inputstream

inputstream

outputstream

TCPsocket

Clientprocess

client TCP socket

200

Client/server socket interaction: TCP

wait for incomingconnection requestconnectionSocket =welcomeSocket.accept()

create socket,port=x, forincoming request:welcomeSocket =

ServerSocket()

create socket,connect to hostid, port=xclientSocket =

Socket()

closeconnectionSocket

read reply fromclientSocket

closeclientSocket

Server (running on hostid) Client

send request usingclientSocketread request from

connectionSocket

write reply toconnectionSocket

TCP connection setup

201

UDP: no “connection” between client and server

no handshaking sender explicitly attaches

IP address and port of destination to each packet

server must extract IP address, port of sender from received packet

UDP: transmitted data may be received out of order, or lost

Socket programming with UDP


UDP provides unreliable transfer of groups of bytes (“datagrams”)

between client and server

202

Client/server socket interaction: UDP

closeclientSocket

Server (running on hostid)

read reply fromclientSocket

create socket,clientSocket = DatagramSocket()

Client

Create, address (hostid, port=x,send datagram request using clientSocket

create socket,port=x, forincoming request:serverSocket = DatagramSocket()

read request fromserverSocket

write reply toserverSocketspecifying clienthost address,port number

203

Example: Java client (UDP)

sendP

ack

et

to network from network

rece

iveP

ack

et

inF

rom

Use

r

keyboard monitor

Process

clientSocket

UDPpacket

inputstream

UDPpacket

UDPsocket

Output: sends packet (TCP sent “byte stream”)

Input: receives packet (TCP received “byte stream”)

Clientprocess

client UDP socket

This contains the protocol specific addressing information that is passed from the user process to the kernel and vice versa

Each of the protocols supported by a socket implementation have their own socket address structure sockaddr_suffix

Where suffix represents the protocol familyEx: sockaddr_in – Internet/IPv4 socket address structure sockaddr_ipx – IPX socket address structure

Socket address structure

The generic socket address structure sockaddr { address family protocol specific data };

The internet/IPv4 socked address structure sockaddr_in { in_family Internet address family sin_port Transport layer Port Number in_addr sin_addr IP address; sin_zero[8] Padding ;};

int8_t signed 8-bit integer - <sys/types.h>

uint8_t unsigned 8-bit integer - <sys/types.h>





sa_family_t address family of - <sys/socket.h>

socklen_t length of socket address structure -<sys/socket.h>

in_addr_t IPv4 address, normally uint32_t <netinet/in.h>

in_port_t TCP/UDP port, normally uint16_t <netinet/in.h>

Datatypes required by POSIX

Byte ordering◦ Network byte order◦ Host byte order◦ htons(l), ntohs(l)

Memory content initialization◦ memset(buffer,value,buffersize)

Data copying and comparison◦ memcpy(dest,src,num_of_bytes)◦ memcmp(buffer1,buffer2,num_of_bytes)

Data manipulation functions

IP address notation conversion◦ Integer notation◦ Dotted decimal notation

status inet_aton(ddstring_pointer,address_pointer) Returns 1 on success 0 on error

ddstring_pointer inet_ntoa(address_pointer)

address_pointer inet_addr(ddstring_pointer) *deprecated

Continued..

sockfd socket(domain, type, protocol)◦ domain is the protocol/address family AF_INET,AF_IPX..◦ type is the the type of service

SOCK_DGRAM,SOCK_STREAM …◦ protocol is the specific protocol that is supported by the

protocol family specified(as param1)◦ Returns a fresh socket descriptor on success, –1 on error

status close(sockfd)◦ Flushes(supposed to) the pending I/O to disk ◦ Returns –1 on error

Initialisation and Shutdown

status bind(sockfd,ptr_to_sockaddr,sockaddr_size)◦ Associates the sockaddr with sockfd◦ The rules for successful binding depend on the

protocol family of the socket(specified during call to socket)

◦ Necessary for receiving connections on STREAM socket

status listen(sockfd,backlog)◦ Notifies the willingness to accept connections◦ backlog Maximum number of established

connections yet to be notified to their respective user processes(calls to accepts)

◦ On unbounded sockets an implicit bind is done with IN_ADDRANY and a random port as the address and port parameters respectively

* Above calls return –1 on error

Connection Management

struct sockaddr_in { unsigned short sin_family; /* address family (always AF_INET) */ unsigned short sin_port; /* port num in network byte order */ struct in_addr sin_addr; /* IP addr in network byte order */ unsigned char sin_zero[8]; /* pad to sizeof(struct sockaddr) */ };

connfd accept(sockfd,ptr_to_sockaddr,ptr_to_sockaddr_size)◦ Blocks till a connection gets established on sockfd and

returns a new file descriptor on which I/O can be performed with the remote entity

◦ Fills the sockaddr and size parameters with the address information (and it’s size respectively) of the connecting entity

◦ bind and listen are assumed to have been called on sockfd prior to calling accept

status connect(sockfd, ptr_to_sockaddr, sockaddr_size)◦ Initiates a new connection with the entity addressed by

sockaddr in case of a STREAM socket◦ Sets the default remote address for I/O in case of DGRAM

socket

* Above calls return –1 on error

Continued…

SEND: int send(int sockfd, const void *msg, int len, int flags);◦ msg: message you want to send◦ len: length of the message◦ flags := 0◦ returned: the number of bytes actually sent

RECEIVE: int recv(int sockfd, void *buf, int len, unsigned int flags);◦ buf: buffer to receive the message◦ len: length of the buffer (“don’t give me more!”)◦ flags := 0◦ returned: the number of bytes received

SEND (DGRAM-style): int sendto(int sockfd, const void *msg, int len, int flags, const struct sockaddr *to, int tolen);◦ msg: message you want to send◦ len: length of the message◦ flags := 0◦ to: socket address of the remote process◦ tolen: = sizeof(struct sockaddr)◦ returned: the number of bytes actually sent

RECEIVE (DGRAM-style): int recvfrom(int sockfd, void *buf, int len, unsigned int flags, struct sockaddr *from, int *fromlen);◦ buf: buffer to receive the message◦ len: length of the buffer (“don’t give me more!”)◦ from: socket address of the process that sent the data◦ fromlen:= sizeof(struct sockaddr)◦ flags := 0◦ returned: the number of bytes received

CLOSE: close (socketfd);

Concurrent server

Client+server: connection-oriented

SOCKETBIND

LISTEN

CONNECT

ACCEPT

RECEIVE

RECEIVE

SEND

SEND

CLOSE

TCP three-way handshake

Client+server: connectionless

CREATE

BIND

SEND

SEND

CLOSE

RECEIVE

Step by Step Explanation of having Connection Oriented Server and Client

217

For example: web server

What does a web server need to do so that a web client can connect to it?TCP

IP

Ethernet Adapter

Web Server

Port 80

TCP Server

Since web traffic uses TCP, the web server must create a socket of type SOCK_STREAM

int fd; /* socket descriptor */

if((fd = socket(AF_INET, SOCK_STREAM, 0)) < 0) {perror(“socket”);exit(1);

}

• socket returns an integer (socket descriptor)• fd < 0 indicates that an error occurred

• AF_INET associates a socket with the Internet protocol family• SOCK_STREAM selects the TCP protocol

Socket I/O: socket()

219

A socket can be bound to a port

int fd; /* socket descriptor */struct sockaddr_in srv; /* used by bind() */

/* create the socket */

srv.sin_family = AF_INET; /* use the Internet addr family */

srv.sin_port = htons(80); /* bind socket ‘fd’ to port 80*/

/* bind: a client may connect to any of my addresses */srv.sin_addr.s_addr = htonl(INADDR_ANY);

if(bind(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) {perror("bind"); exit(1);

}

• Still not quite ready to communicate with a client...

Socket I/O: bind()

220

listen indicates that the server will accept a connection

Socket I/O: listen()


/* 1) create the socket *//* 2) bind the socket to a port */

if(listen(fd, 5) < 0) {perror(“listen”);exit(1);

}

• Still not quite ready to communicate with a client...

222

accept blocks waiting for a connection

Socket I/O: accept()

int fd; /* socket descriptor */struct sockaddr_in srv; /* used by bind() */struct sockaddr_in cli; /* used by accept() */int newfd; /* returned by accept() */int cli_len = sizeof(cli); /* used by accept() */

/* 1) create the socket *//* 2) bind the socket to a port *//* 3) listen on the socket */

newfd = accept(fd, (struct sockaddr*) &cli, &cli_len);if(newfd < 0) {

perror("accept"); exit(1);}

• accept returns a new socket (newfd) with the same properties as the original socket (fd)• newfd < 0 indicates that an error occurred

223

Socket I/O: accept() continued...struct sockaddr_in cli; /* used by accept() */int newfd; /* returned by accept() */int cli_len = sizeof(cli); /* used by accept() */

newfd = accept(fd, (struct sockaddr*) &cli, &cli_len);if(newfd < 0) {

perror("accept");exit(1);

}• How does the server know which client it is?• cli.sin_addr.s_addr contains the client’s IP address• cli.sin_port contains the client’s port number

• Now the server can exchange data with the client by using read and write on the descriptor newfd.

• Why does accept need to return a new descriptor?

224

read can be used with a socket read blocks waiting for data from the

client but does not guarantee that sizeof(buf) is read

Socket I/O: read()

int fd; /* socket descriptor */char buf[512]; /* used by read() */int nbytes; /* used by read() */

/* 1) create the socket *//* 2) bind the socket to a port *//* 3) listen on the socket *//* 4) accept the incoming connection */

if((nbytes = read(newfd, buf, sizeof(buf))) < 0) {perror(“read”); exit(1);

}

225

For example: web client

How does a web client connect to a web server?

TCP Client

TCP

IP

Ethernet Adapter

2 Web Clients

226

IP Addresses are commonly written as strings (“128.2.35.50”), but programs deal with IP addresses as integers.

Dealing with IP Addresses

struct sockaddr_in srv;

srv.sin_addr.s_addr = inet_addr(“128.2.35.50”);if(srv.sin_addr.s_addr == (in_addr_t) -1) {

fprintf(stderr, "inet_addr failed!\n"); exit(1);}

Converting a numerical address to a string:

struct sockaddr_in srv;char *t = inet_ntoa(srv.sin_addr);if(t == 0) {

fprintf(stderr, “inet_ntoa failed!\n”); exit(1);}

Converting strings to numerical address:

227

Gethostbyname provides interface to DNS Additional useful calls

◦ Gethostbyaddr – returns hostent given sockaddr_in◦ Getservbyname

Used to get service description (typically port number) Returns servent based on name

Translating Names to Addresses

#include <netdb.h>

struct hostent *hp; /*ptr to host info for remote*/ struct sockaddr_in peeraddr;char *name = “www.cs.cmu.edu”;

peeraddr.sin_family = AF_INET; hp = gethostbyname(name) peeraddr.sin_addr.s_addr = ((struct in_addr*)(hp->h_addr))->s_addr;

228

connect allows a client to connect to a server...

Socket I/O: connect()

int fd; /* socket descriptor */struct sockaddr_in srv; /* used by connect() */


/* connect: use the Internet address family */srv.sin_family = AF_INET;

/* connect: socket ‘fd’ to port 80 */srv.sin_port = htons(80);

/* connect: connect to IP Address “128.2.35.50” */srv.sin_addr.s_addr = inet_addr(“128.2.35.50”);

if(connect(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) {perror(”connect"); exit(1);

}

229

write can be used with a socket

Socket I/O: write()

int fd; /* socket descriptor */struct sockaddr_in srv; /* used by connect() */char buf[512]; /* used by write() */int nbytes; /* used by write() */

/* 1) create the socket *//* 2) connect() to the server */

/* Example: A client could “write” a request to a server */if((nbytes = write(fd, buf, sizeof(buf))) < 0) {

perror(“write”);exit(1);

}

230

Review: TCP Client-Server Interaction

socket()

bind()

listen()

accept()

write()

read()

read()

TCP Server

close()

socket()

TCP Client

connect()

write()

read()

close()

connection establishment

data request

data reply

end-of-file notification

Example: C client (TCP)/* client.c */void main(int argc, char *argv[]){ struct sockaddr_in sad; /* structure to hold an IP address */ int clientSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */

char Sentence[128]; char modifiedSentence[128];

host = argv[1]; port = atoi(argv[2]);

clientSocket = socket(PF_INET, SOCK_STREAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure

*/ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_port = htons((u_short)port); ptrh = gethostbyname(host); /* Convert host name to IP address

*/memcpy(&sad.sin_addr, ptrh->h_addr, ptrh->h_length);

connect(clientSocket, (struct sockaddr *)&sad, sizeof(sad));

Create client socket, connect to server

Example: C client (TCP), cont.

gets(Sentence);

n=write(clientSocket, Sentence, strlen(Sentence)+1);

n=read(clientSocket, modifiedSentence, sizeof(modifiedSentence)); printf("FROM SERVER: %s\n”,modifiedSentence);

close(clientSocket); }

Get input stream

from user

Send lineto server

Read linefrom server

Close connection

Example: C server (TCP)/* server.c */void main(int argc, char *argv[]){ struct sockaddr_in sad; /* structure to hold an IP address */ struct sockaddr_in cad;int welcomeSocket, connectionSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */

char clientSentence[128]; char capitalizedSentence[128];

port = atoi(argv[1]);

welcomeSocket = socket(PF_INET, SOCK_STREAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */ sad.sin_port = htons((u_short)port);/* set the port number */

bind(welcomeSocket, (struct sockaddr *)&sad, sizeof(sad));

Create welcoming socket at port &

Bind a local address

Example: C server (TCP), cont/* Specify the maximum number of clients that can be queued */listen(welcomeSocket, 10)

while(1) {

connectionSocket=accept(welcomeSocket, (struct sockaddr *)&cad, &alen); n=read(connectionSocket, clientSentence, sizeof(clientSentence)); /* capitalize Sentence and store the result in capitalizedSentence*/

n=write(connectionSocket, capitalizedSentence, strlen(capitalizedSentence)+1);

close(connectionSocket); } }

Write out the result to socket

End of while loop,loop back and wait foranother client connection

Wait, on welcoming socket for contact by a client

Outline for typical concurrent server

Status transition

*after return from accept

*after fork()returns

*after socketclose()

Socket programming with UDP

UDP: no “connection” between client and server

• no handshaking• sender explicitly attaches IP

address and port of destination to each packet

• server must extract IP address, port of sender from received packet

UDP: transmitted data may be received out of order, or lost


UDP provides unreliable transfer of groups of bytes (“datagrams”)

between client and server

239

For example: NTP daemon

What does a UDP server need to do so that a UDP client can connect to it?

UDP Server Example

UDP

IP

Ethernet Adapter

NTPdaemon

Port 123

240

The UDP server must create a datagram socket…

Socket I/O: socket()

int fd; /* socket descriptor */

if((fd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {perror(“socket”);exit(1);

}

• socket returns an integer (socket descriptor)• fd < 0 indicates that an error occurred

• AF_INET: associates a socket with the Internet protocol family• SOCK_DGRAM: selects the UDP protocol

241

A socket can be bound to a port

Socket I/O: bind()



/* bind: use the Internet address family */srv.sin_family = AF_INET;

/* bind: socket ‘fd’ to port 80*/srv.sin_port = htons(80);

/* bind: a client may connect to any of my addresses */srv.sin_addr.s_addr = htonl(INADDR_ANY);

if(bind(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) {perror("bind"); exit(1);

}

• Now the UDP server is ready to accept packets…

242

read does not provide the client’s address to the UDP server

Socket I/O: recvfrom()

int fd; /* socket descriptor */struct sockaddr_in srv; /* used by bind() */struct sockaddr_in cli; /* used by recvfrom() */char buf[512]; /* used by recvfrom() */int cli_len = sizeof(cli); /* used by recvfrom() */int nbytes; /* used by recvfrom() */

/* 1) create the socket *//* 2) bind to the socket */

nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) &cli, &cli_len);

if(nbytes < 0) {perror(“recvfrom”); exit(1);

}

243

Socket I/O: recvfrom() continued...nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */,

(struct sockaddr*) cli, &cli_len);

• The actions performed by recvfrom• returns the number of bytes read (nbytes)• copies nbytes of data into buf• returns the address of the client (cli)• returns the length of cli (cli_len)• don’t worry about flags

244

How does a UDP client communicate with a UDP server?

UDP Client Example

TCP

IP

Ethernet Adapter

2 UDP Clients

ports

245

write is not allowed Notice that the UDP client does not bind a port number

◦ a port number is dynamically assigned when the first sendto is called

Socket I/O: sendto()

int fd; /* socket descriptor */struct sockaddr_in srv; /* used by sendto() */

/* 1) create the socket */

/* sendto: send data to IP Address “128.2.35.50” port 80 */srv.sin_family = AF_INET;srv.sin_port = htons(80); srv.sin_addr.s_addr = inet_addr(“128.2.35.50”);

nbytes = sendto(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) &srv, sizeof(srv));

if(nbytes < 0) {perror(“sendto”); exit(1);

}

246

Review: UDP Client-ServerInteraction

socket()

bind()

recvfrom()

sendto()

UDP Server

socket()

UDP Client

sendto()

recvfrom()

close()

blocks until datagramreceived from a client

data request

data reply

Example: C client (UDP)/* client.c */void main(int argc, char *argv[]){ struct sockaddr_in sad; /* structure to hold an IP address */ int clientSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */

char Sentence[128]; char modifiedSentence[128];

host = argv[1]; port = atoi(argv[2]);

clientSocket = socket(PF_INET, SOCK_DGRAM, 0);

/* determine the server's address */memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure

*/ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_port = htons((u_short)port); ptrh = gethostbyname(host); /* Convert host name to IP address

*/memcpy(&sad.sin_addr, ptrh->h_addr, ptrh->h_length);

Create client socket, NO connection to server

Example: C client (UDP), cont.

gets(Sentence);

addr_len =sizeof(struct sockaddr); n=sendto(clientSocket, Sentence, strlen(Sentence)+1, (struct sockaddr *) &sad, addr_len);

n=recvfrom(clientSocket, modifiedSentence, sizeof(modifiedSentence). (struct sockaddr *) &sad, &addr_len); printf("FROM SERVER: %s\n”,modifiedSentence);

close(clientSocket); }

Get input stream

from user

Send lineto server

Read linefrom server

Close connection

Example: C server (UDP)/* server.c */void main(int argc, char *argv[]){ struct sockaddr_in sad; /* structure to hold an IP address */ struct sockaddr_in cad;int serverSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */

char clientSentence[128]; char capitalizedSentence[128];

port = atoi(argv[1]);

serverSocket = socket(PF_INET, SOCK_DGRAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */ sad.sin_port = htons((u_short)port);/* set the port number */

bind(serverSocket, (struct sockaddr *)&sad, sizeof(sad));

Create welcoming socket at port &

Bind a local address

250

How can the UDP server service multiple ports simultaneously?

The UDP Server

UDP

IP

Ethernet Adapter

UDP Server

Port 2000Port 3000

251

What problems does this code have?

UDP Server: Servicing Two Ports

int s1; /* socket descriptor 1 */int s2; /* socket descriptor 2 */

/* 1) create socket s1 *//* 2) create socket s2 *//* 3) bind s1 to port 2000 *//* 4) bind s2 to port 3000 */

while(1) {recvfrom(s1, buf, sizeof(buf), ...);/* process buf */

recvfrom(s2, buf, sizeof(buf), ...);/* process buf */

}

client 1 server client 2

call connectcall accept

call read

ret connectret accept

call connectcall fgets

User goesout to lunch

Client 1 blockswaiting for userto type in data

Client 2 blockswaiting to completeits connection request until afterlunch!

Server blockswaiting fordata fromClient 1

Server Flaw

Concurrent Serversclient 1 server client 2

call connectcall accept

ret connectret accept

call connect

call fgets

User goesout to lunch

Client 1 blockswaiting for user to type in data

call acceptret connect

ret accept call fgets

write

write

call read

end readclose

close

call read (don’t block)

call read

while (1) { newsock = (int *)malloc(sizeof (int)); *newsock=accept(sock, (struct sockaddr *)&from,

&fromlen); if (*newsock < 0) error("Accepting"); printf("A connection has been accepted from %s\n", inet_ntoa((struct in_addr)from.sin_addr)); retval = pthread_create(&tid, NULL,

ConnectionThread, (void *)newsock); if (retval != 0) { error("Error, could not create thread"); } }

Multithreaded Server

/****** ConnectionThread **********/ void *ConnectionThread(void *arg) { int sock, n, len; char buffer[BUFSIZE]; char *msg = "Got your message"; sock = *(int *)arg; len = strlen(msg); n = read(sock,buffer,BUFSIZE-1); while (n > 0) { buffer[n]='\0'; printf("Message is %s\n",buffer); n = write(sock,msg,len); if (n < len) error("Error writing"); n = read(sock,buffer,BUFSIZE-1); if (n < 0) error("Error reading"); } if (close(sock) < 0) error("closing"); pthread_exit(NULL); return NULL; }

Concurrency

• Threading– Easier to understand– Race conditions increase complexity

• Select()– Explicit control flows, no race conditions– Explicit control more complicated

• There is no clear winner, but you MUST use select()…

What is select()?

• Monitor multiple descriptors• How does it work?

– Setup sets of sockets to monitor– select(): blocking until something

happens– “Something” could be

• Incoming connection: accept()• Clients sending data: read()• Pending data to send: write()• Timeout

Concurrency – Step 1

• Allowing address reuse

• Then we set the sockets to be non-blocking

int sock, opts=1;

sock = socket(...); // To give you an idea of where the new code goes

setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &opts, sizeof(opts));

if((opts = fcntl(sock, F_GETFL)) < 0) { // Get current optionsprintf(“Error...\n”);...

}opts = (opts | O_NONBLOCK); // Don't clobber your old settingsif(fcntl(sock, F_SETFL, opts) < 0) {

printf(“Error...\n”);...

}

bind(...); // To again give you an idea where the new code goes

Concurrency – Step 2

• Monitor sockets with select()– int select(int maxfd, fd_set *readfds, fd_set

*writefds, fd_set *exceptfds, const struct timespec *timeout);

• maxfd– max file descriptor + 1

• fd_set: bit vector with FD_SETSIZE bits– readfds: bit vector of read descriptors to

monitor– writefds: bit vector of write descriptors to

monitor– exceptfds: set to NULL

• timeout– how long to wait without activity before

returning

What about bit vectors?

• void FD_ZERO(fd_set *fdset);– clear out all bits

• void FD_SET(int fd, fd_set *fdset); – set one bit

• void FD_CLR(int fd, fd_set *fdset); – clear one bit

• int FD_ISSET(int fd, fd_set *fdset); – test whether fd bit is set

The Server// socket() call and non-blocking code is above this point

if((bind(sockfd, (struct sockaddr *) &saddr, sizeof(saddr)) < 0) { // bind!printf(“Error binding\n”);...

}

if(listen(sockfd, 5) < 0) { // listen for incoming connectionsprintf(“Error listening\n”);...

}

clen=sizeof(caddr);

// Setup pool.read_set with an FD_ZERO() and FD_SET() for// your server socket file descriptor. (whatever socket() returned)

while(1) {pool.ready_set = pool.read_set; // Save the current statepool.nready = select(pool.maxfd+1, &pool.ready_set, &pool.write_set, NULL, NULL);

if(FD_ISSET(sockfd, &pool.ready_set)) { // Check if there is an incoming connisock=accept(sockfd, (struct sockaddr *) &caddr, &clen); // accept itadd_client(isock, &pool); // add the client by the incoming socket fd

}

check_clients(&pool); // check if any data needs to be sent/received from clients}

...

close(sockfd);

What is pool?

typedef struct { /* represents a pool of connected descriptors */ int maxfd; /* largest descriptor in read_set */ fd_set read_set; /* set of all active read descriptors */ fd_set write_set; /* set of all active read descriptors */ fd_set ready_set; /* subset of descriptors ready for reading */ int nready; /* number of ready descriptors from select */ int maxi; /* highwater index into client array */ int clientfd[FD_SETSIZE]; /* set of active descriptors */ rio_t clientrio[FD_SETSIZE]; /* set of active read buffers */

... // ADD WHAT WOULD BE HELPFUL FOR PROJECT1} pool;

What about checking clients?

• The main loop only tests for incoming connections– There are other reasons the server wakes

up– Clients are sending data, pending data to

write to buffer, clients closing connections, etc.

• Store all client file descriptors– in pool

• Keep the while(1) loop thin– Delegate to functions

• Come up with your own design

264

maxfds: number of descriptors to be tested◦ descriptors (0, 1, ... maxfds-1) will be tested

readfds: a set of fds we want to check if data is available◦ returns a set of fds ready to read◦ if input argument is NULL, not interested in that condition

writefds: returns a set of fds ready to write exceptfds: returns a set of fds with exception conditions

Socket I/O: select()int select(int maxfds, fd_set *readfds, fd_set *writefds,

fd_set *exceptfds, struct timeval *timeout);

FD_CLR(int fd, fd_set *fds); /* clear the bit for fd in fds */FD_ISSET(int fd, fd_set *fds); /* is the bit for fd in fds? */FD_SET(int fd, fd_set *fds); /* turn on the bit for fd in fds */FD_ZERO(fd_set *fds); /* clear all bits in fds */

265

timeout◦ if NULL, wait forever and return only when one of the

descriptors is ready for I/O◦ otherwise, wait up to a fixed amount of time specified by

timeout if we don’t want to wait at all, create a timeout structure with timer

value equal to 0

Refer to the man page for more information

Socket I/O: select()

int select(int maxfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

struct timeval {long tv_sec; /* seconds /long tv_usec; /* microseconds */

}

266

Socket I/O: select()

int s1, s2; /* socket descriptors */fd_set readfds; /* used by select() */

/* create and bind s1 and s2 */while(1) {

FD_ZERO(&readfds); /* initialize the fd set */

FD_SET(s1, &readfds); /* add s1 to the fd set */FD_SET(s2, &readfds); /* add s2 to the fd set */

if(select(s2+1, &readfds, 0, 0, 0) < 0) {perror(“select”);exit(1);

}if(FD_ISSET(s1, &readfds)) {

recvfrom(s1, buf, sizeof(buf), ...);/* process buf */

}/* do the same for s2 */

}

• select allows synchronous I/O multiplexing

267

TCP

IP

Ethernet Adapter

Web Server

Port 80

How can a a web server managemultiple connections simultaneously?

Port 8001

More Details About a Web Server

Lecture 3: 9-4-01 268

Now the web server can support multiple connections...

Socket I/O: select()int fd, next=0; /* original socket */int newfd[10]; /* new socket descriptors */while(1) {

fd_set readfds;FD_ZERO(&readfds); FD_SET(fd, &readfds);

/* Now use FD_SET to initialize other newfd’s that have already been returned by accept() */

select(maxfd+1, &readfds, 0, 0, 0);if(FD_ISSET(fd, &readfds)) {

newfd[next++] = accept(fd, ...); }/* do the following for each descriptor newfd[n] */if(FD_ISSET(newfd[n], &readfds)) {

read(newfd[n], buf, sizeof(buf));/* process data */

}}

269

A Few Programming Notes:Representing Packets

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Type |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Length | Checksum |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type: 4-byte integerLength: 2-byte integerChecksum: 2-byte integerAddress: 4-byte IP address

270

A Few Programming Notes:Building a Packet in a Buffer

struct packet {u_int32_t type;u_int16_t length;u_int16_t checksum;u_int32_t address;

};

/* ================================================== */char buf[1024];struct packet *pkt;

pkt = (struct packet*) buf;pkt->type = htonl(1);pkt->length = htons(2);pkt->checksum = htons(3);pkt->address = htonl(4);

#include <stdio.h> /* for printf() and fprintf() */#include <sys/socket.h> /* for socket(), connect(),

sendto(), and recvfrom()

*/#include <arpa/inet.h> /* for sockaddr_in and

inet_addr() */#include <stdlib.h> /* for atoi() and exit() */#include <string.h> /* for memset() */#include <unistd.h> /* for close() */

#define ECHOMAX 255 /* Longest string to echo */

EchoClient.c – #include’s

int main(int argc, char *argv[]){ int sock; /* Socket descriptor */ struct sockaddr_in echoServAddr; /* Echo server address */ struct sockaddr_in fromAddr; /* Source address of echo */ unsigned short echoServPort =7; /* Echo server port */ unsigned int fromSize; /* address size for recvfrom() */ char *servIP=“172.24.23.4”; /* IP address of server

*/ char *echoString=“I hope this works”; /* String to send

to echo server */ char echoBuffer[ECHOMAX+1]; /* Buffer for receiving

echoed string */ int echoStringLen; /* Length of string to echo */ int respStringLen; /* Length of received response */

EchoClient.c -variable declarations

/* Create a datagram/UDP socket */ sock = socket(AF_INET, SOCK_DGRAM, 0);

/* Construct the server address structure */ memset(&echoServAddr, 0, sizeof(echoServAddr)); /* Zero

out structure */ echoServAddr.sin_family = AF_INET; /* Internet addr family */ echoServAddr.sin_addr.s_addr = htonl(servIP); /* Server IP

address */ echoServAddr.sin_port = htons(echoServPort); /* Server port

*/

/* Send the string to the server */ sendto(sock, echoString, echoStringLen, 0, (struct sockaddr *)

&echoServAddr, sizeof(echoServAddr);/* Recv a response */

EchoClient.c - creating the socket and sending

fromSize = sizeof(fromAddr); recvfrom(sock, echoBuffer, ECHOMAX, 0, (struct sockaddr *)

&fromAddr, &fromSize); /* Error checks like packet is received from the same server*/

/* null-terminate the received data */ echoBuffer[echoStringLen] = '\0'; printf("Received: %s\n", echoBuffer); /* Print the echoed arg

*/close(sock); exit(0);} /* end of main () */

EchoClient.c – receiving and printing

int main(int argc, char *argv[]){ int sock; /* Socket */ struct sockaddr_in echoServAddr; /* Local address */ struct sockaddr_in echoClntAddr; /* Client address */ unsigned int cliAddrLen; /* Length of incoming message */ char echoBuffer[ECHOMAX]; /* Buffer for echo string */ unsigned short echoServPort =7; /* Server port */ int recvMsgSize; /* Size of received message */ /* Create socket for sending/receiving datagrams */ sock = socket(AF_INET, SOCK_DGRAM, 0); /* Construct local address structure */ memset(&echoServAddr, 0, sizeof(echoServAddr)); /* Zero out

structure */ echoServAddr.sin_family = AF_INET; /* Internet address family

*/ echoServAddr.sin_addr.s_addr = htonl(“172.24.23.4”); echoServAddr.sin_port = htons(echoServPort); /* Local port */

/* Bind to the local address */ bind(sock, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr);

EchoServer.c

for (;;) /* Run forever */ { cliAddrLen = sizeof(echoClntAddr);

/* Block until receive message from a client */ recvMsgSize = recvfrom(sock, echoBuffer, ECHOMAX, 0, (struct sockaddr *) &echoClntAddr, &cliAddrLen);

printf("Handling client %s\n", inet_ntoa(echoClntAddr.sin_addr));

/* Send received datagram back to the client */ sendto(sock, echoBuffer, recvMsgSize, 0, (struct sockaddr *) &echoClntAddr, sizeof(echoClntAddr); } } /* end of main () */

Error handling is must

The setsockopt() function manipulates options associated with a socket. Options can exist at multiple protocol levels. However, the options are always present at the uppermost socket level. Options affect socket operations, such as the routing of packets, out-of-band data transfer, and so on.

The setsockopt function#include <sys/socket.h>setsockopt(int s, int level, int optname, const void *optval, socklen_t optlen);

The level argument specifies the protocol level at which the option resides. To set options at the socket level, specify the level argument as SOL_SOCKET. To set options at other levels, supply the appropriate protocol number for the protocol controlling the option. For example, to indicate that an option is interpreted by the TCP (Transport Control Protocol), set level to the protocol number of TCP.

The following options are supported for setsockopt(): SO_DEBUG Provides the ability to turn on recording of debugging

information. This option takes an int value in the optval argument. This is a BOOL option.

SO_BROADCAST Permits sending of broadcast messages, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOL option.

SO_REUSEADDR Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOLoption.

Level Argument

http://www.mkssoftware.com/docs/man3/bind.3.asp

SO_KEEPALIVE Keeps connections active by enabling periodic transmission of messages, if this is supported by the protocol.

If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option takes an int value in the optval argument. This is a BOOL option.

SO_LINGER Specifies whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option takes a linger structure in the optval argument.

http://www.mkssoftware.com/docs/man3/close.3.asp



SO_OOBINLINE Specifies whether the socket leaves received out-of-band data (data marked urgent) in line. This option takes an int value in optval argument. This is a BOOL option.

SO_SNDBUF Sets send buffer size information. This option takes an int value in the optval argument.

SO_RCVBUF Sets receive buffer size information. This option takes an int value in the optval argument.

SO_DONTROUTE Specifies whether outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option takes an int value in the optval argument. This is a BOOL option.

TCP_NODELAY Specifies whether the Nagle algorithm used by TCP for send coalescing is to be disabled. This option takes an int value in the optval argument. This is a BOOL option.

For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.

RETURN VALUES If successful, setsockopt() returns a zero. If

a failure occurs, it returns a value of -1 and sets errno to one of the following values:

EBADF s is not a valid descriptor ENOTSOCK s is not a socket descriptor ENOPROTOOPT optname is unknown at

indicated level EFAULT optval is an invalid pointer

Sample Usage: int skt, int sndsize; err = setsockopt(skt, SOL_SOCKET, SO_SNDBUF, (char *)&sndsize, (int)sizeof(sndsize));or: int skt, int sndsize; err = setsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sndsize, (int)sizeof(sndsize));

int optval; int optlen; char *optval2; // set SO_REUSEADDR on a socket to true (1): optval =

1; setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval);

// bind a socket to a device name (might not work on all systems): optval2 = "eth1";

// 4 bytes long, so 4, below: setsockopt(s2, SOL_SOCKET, SO_BINDTODEVICE, optval2, 4);

// see if the SO_BROADCAST flag is set: getsockopt(s3, SOL_SOCKET, SO_BROADCAST, &optval, &optlen);

if (optval != 0) { print("SO_BROADCAST enabled on s3!\n"); }

Example

ESCRIPTION The getsockopt() function retrieves the current value for a socket option associated

with a socket of any type, in any state, and stores the result in optval. Options may exist at multiple protocol levels, but they are always present at the uppermost socket' level. Options affect socket operations, such as the routing of packets, out-of-band data transfer, and so on.

The level argument specifies the protocol level at which the option resides. To retrieve options at the socket level, specify the level argument as SOL_SOCKET. To retrieve options at other levels, supply the appropriate protocol number for the protocol controlling the option. For example, to indicate that an option is to be interpreted by the TCP (Transport Control Protocol), set level to the protocol number of TCP.

The value associated with the selected option is returned in the buffer optval. The integer pointed to by optlen should originally contain the size of this buffer; on return, it is set to the size of the value returned. For SO_LINGER, this is the size of a struct linger; for most other options it is the size of an integer.

The application is responsible for allocating any memory space pointed to directly or indirectly by any of the parameters it specified.

If an option has not been set with setsockopt(), getsockopt() returns the default value for the option.

The getsockopt function#include <sys/socket.h>int getsockopt(int s, int level, int optname, void *optval, socklen_t *optlen);

http://www.mkssoftware.com/docs/man3/setsockopt.3.asp

http://www.mkssoftware.com/docs/man3/setsockopt.3.asp

O_DEBUG Reports whether debugging information is being recorded. This option stores an int value in the optval argument. This is a BOOL option.

SO_ACCEPTCONN Reports whether socket listening is enabled. This option stores an int value in the optval argument. This is a BOOL option.

SO_BROADCAST Reports whether transmission of broadcast messages is supported, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOL option.

SO_REUSEADDR Reports whether the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOLoption.

SO_KEEPALIVE Reports whether connections are kept active with periodic transmission of messages, if this is supported by the protocol.

If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option stores an int value in the optval argument. This is a BOOL option.

http://www.mkssoftware.com/docs/man3/bind.3.asp

SO_LINGER Reports whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option stores a linger structure in the optval argument.

SO_OOBINLINE Reports whether the socket leaves received out-of-band data (data marked urgent) in line. This option stores an int value in optval argument. This is a BOOL option.

SO_SNDBUF Reports send buffer size information. This option stores an int value in the optval argument.

SO_RCVBUF Reports receive buffer size information. This option stores an int value in the optval argument.

SO_ERROR Reports information about error status and clears it. This option stores an int value in the optval argument.

SO_TYPE Reports the socket type. This option stores an int value in the optval argument. SO_DONTROUTE Reports whether outgoing messages bypass the standard routing facilities.

The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option stores an int value in the optval argument. This is a BOOL option.

SO_MAX_MSG_SIZE Maximum size of a message for message-oriented socket types (for example, SOCK_DGRAM). Has no meaning for stream-oriented sockets. This option stores an int value in the optval argument.




CP_NODELAY Specifies whether the Nagle algorithm used by TCP for send coalescing is disabled. This option stores an int value in the optval argument. This is a BOOL option.

For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.

RETURN VALUES If successful, getsockopt() returns a zero. If a failure

occurs, it returns a value of -1 and sets errno to one of the following values:

EBADF The parameter s is not a valid descriptor. ENOPROTOOPT The option is unknown at the level

indicated. ENOTSOCK The parameter s is a file, not a socket.

int sockbufsize = 0; int size = sizeof(int);

err = getsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sockbufsize, &size);

Thank You

Documents

Introduction to Socket Programming-NBV