Upload
mike-lively
View
9.618
Download
2
Tags:
Embed Size (px)
DESCRIPTION
How to utilize the PHP CLI SAPI in a scalable way. Initially presented at the 2008 DC PHP Conference
Citation preview
PHP CLI
A Cinderella Story
Introduction
• Andrew Minerd is a software architect at the Selling Source, Inc. As a part of the architecture team he is responsible for the overall technical direction of the Selling Source software products.
• Mike Lively is a team lead with the Selling Source, developing an online loan servicing solution. This solution heavily uses and relies on background processing to perform many tasks ranging from sending legal documents to transferring money to and from bank accounts.
If you use Windows...
...please leave now.
Overview
Why Identifying processes that can be backgrounded Walk through the evolution of a CLI script
Creating a single process Creating multiple processes Distributing a process across multiple machines
Why Background Processing
• Performance - Let your web server serve webpages• Robustness - If a web service or email fails, it is easier to
handle in the background• Isolation - Using background processes can help isolate
functionality and allow you to easily swap it out for different (sometimes better) services
• Efficiency - consolidate resource requirements
Why Use PHP?
Reuse Existing development staff Existing code Existing infrastructure
Quick prototyping
Identifying Suitable Processes
Anything where an immediate response is not vital Email notifications Remote service calls
Processing data in advance Pre-caching Aggregating Data
Even a few things where a somewhat immediate response is needed Notify users upon completion
Single Process
Advantages: Easiest to implement Don't have to worry about synchronization Don't have to worry about sharing data Already familiar with this paradigm
Disadvantages: You can only do one thing
Introducing the CLI SAPI
SAPI: Server API; PHP's interface to the world Special file descriptor constants:
STDIN: standard in STDOUT: standard out STDERR: standard error
Special variables: $argc: number of command line parameters $argv: array of parameter values
Misc dl() still works (worth mentioning?)
Writing a cronjob
Advantages Automatically restarts itself Flexible scheduling good for advance processing
Challenges Long-running jobs
Overrun protection
Touch a lock file at startup, remove at shutdown Work a little `ps` magic
Work Queues
Database MySQL SQLite
Message queue Memcached
Possible, not necessarily optimal
MySQL Work Queues
Segregate tasks on a specific table by auto_increment key Access is very fast for MyISAM, can be even faster for
InnoDB Create a separate table to hold progress
If progress == MAX(id), nothing needs to be done LOCK/UNLOCK TABLE; easy synchronization Single point of failure, but probably already is
SQLite Work Queue
SQLite 3 only locks during active writes by default BEGIN EXCLUSIVE TRANSACTION prevents others from
reading and writing Synchronized access to a progress/queue table Lock is retained until COMMIT
Memcached
Perhaps already familiar Eases transition for processes dependent upon shared
memory VOLATILE STORAGE Use as a job queue?
Add a lock key; on fail (key exists) block and poll Read pointer Read item Increment pointer Remove lock key
Already capable of distributing storage across servers
Persistent Processing
Advantages: Mitigate setup overhead by doing it once
Disadvantages: Persistent processes may be more susceptible to
memory leaks More housekeeping work than cronjobs
Process Control
Signal handling pcntl_signal - Commonly used signals
What are ticks Daemonizing
Fork and kill parent Set the child to session leader Close standard file descriptors See: daemon(3)
Signals
• SIGHUP• SIGTERM; system shutdown, kill• SIGINT; sent by Ctrl+c• SIGKILL (uncatchable); unresponsive, kill -9• SIGCHLD; child status change• SIGSTP; sent by Ctrl+z• SIGCONT; resume from stop, fg• See: signal(7), kill -l
Daemonize
function daemon($chdir = TRUE, $close = TRUE){ // fork and kill off the parent if (pcntl_fork() !== 0) { exit(0); }
// become session leader posix_setsid();
// close file descriptors if ($close) { fclose(STDIN); fclose(STDOUT); fclose(STDERR); }
// change to the root directory if ($chdir) chdir('/');}
Multiple Processes
Advantages: Take advantage of the multi-core revolution; most
machines can now truly multiprocess Disadvantages:
Must synchronize process access to resources Harder to communicate
Directed vs. Autonomous
Directed: one parent process that distributes jobs to children processes Single point of failure No locking required on job source
Autonomous: multiple peer processes that pick their own work Need to serialize access to job source Single peer failure isn't overall failure
Split work into independent tasks
Forking
<?php$pid = pcntl_fork();if ($pid == -1) { die("Could not fork!");} else if ($pid) { // parent} else { // child}?>
Forking Multiple Children
<?phpdefine('MAX_CHILDREN', 5);$children = array();$jobs = get_jobs();
while (count($jobs)) { if (count($children) < MAX_CHILDREN) { $data = array_shift($jobs); $pid = pcntl_fork(); if ($pid == -1) { die("Could not fork!"); } else if ($pid) { $children[$pid] = true; } else { process_data($data); exit(0); } }
while ($wait_pid = pcntl_waitpid(-1, $status, WNOHANG)) { if ($wait_pid == -1) { die("problem in pcntl_waitpid!"); } unset($children[$wait_pid]); }}
?>
Shared Resources
File/socket descriptors shared between parent and child Some resources cannot be shared
MySQL connections Use resources before forking Assume children will probably need to open and establish
its own resources Allow your resources to reopen themselves
Shared Resources
<?php// ...// bad time to open a database connection$db = new PDO('mysql:host=localhost', 'dbuser', 'pass');
while (count($data)) { if (count($children) < MAX_CHILDREN) { $data = array_shift($jobs); $pid = pcntl_fork(); if ($pid == -1) { die("Could not fork!"); } else if ($pid) { $children[$pid] = true; } else { process_data($data, $db); exit(0); // When the child exits the database connection // will be disposed of. } } // ...}
?>
Shared Resources
<?php// ...
while (count($data)) { if (count($children) < MAX_CHILDREN) { $data = array_shift($jobs); $pid = pcntl_fork(); if ($pid == -1) { die("Could not fork!"); } else if ($pid) { $children[$pid] = true; } else { // Much safer $db = new PDO('mysql:host=localhost', 'dbuser', 'pass'); process_data($data, $db); exit(0); // When the child exits the database connection // will be disposed of. } } // ...}
?>
Memory Usage
Entire process space at time of forking is copied Do as little setup as possible before forking If you have to do setup before forking; clean it up in the
child after forking Pay particular attention to large variables
Memory Usage
<?phpdefine('MAX_CHILDREN', 5);$children = array();$jobs = get_jobs();
while (count($jobs)) { if (count($children) < MAX_CHILDREN) { $data = array_shift($jobs); $pid = pcntl_fork(); if ($pid == -1) { die("Could not fork!"); } else if ($pid) { $children[$pid] = true; } else { unset($jobs); // <--- will save memory in your child where you do not need $jobs around anymore process_data($data); exit(0); } }
while ($wait_pid = pcntl_waitpid(-1, $status, WNOHANG)) { if ($wait_pid == -1) { die("problem in pcntl_waitpid!"); } unset($children[$wait_pid]); }}
?>
Shared Memory
Shmop_* or shm_*? shm functions store and retrieve key/value pairs stored
as a linked list Retrieval by key is O(n)
shmop functions access bytes Semaphores
Generic locking mechanism Message queues ftok()
How to Talk to Your Kids
• msg_get_queue($key, $perms)• msg_send($q, $type, $msg, $serialize, $block, $err)• msg_receive($q, $desired, $type, $max, $msg, $serialize, $flags, $err)
• Use types to communicate to a specific processo Send jobs with type 1o Responses with PID of process
How to Talk to Your Kids
• array stream_socket_pair($domain, $type, $protocol)• Creates a pair of socket connections that communicate
with each other• Use the first index in the parent, use the second index in
the child (or the other way around)
How to Talk to Your Kids
<?php$socks = stream_socket_pair(STREAM_PF_UNIX, STREAM_SOCK_STREAM, STREAM_IPPROTO_IP);$pid = pcntl_fork();
if ($pid == -1) { die('could not fork!');} else if ($pid) { // parent fclose($socks[1]); fwrite($socks[0], "Hi kid\n"); echo fgets($socks[0]); fclose($socks[0]);} else { // child fclose($socks[0]); fwrite($socks[1], "Hi parent\n"); echo fgets($socks[1]); fclose($socks[1]);}/* Output: Hi kidHi parent*/?>
Distributing Across Servers
Advantages: Increased reliability/redundancy Horizontal scaling can overcome performance plateau
Disadvantages: Most complex Failure recovery can be more involved
Locking
Distributed locking is much more difficult Database locking
"Optimistic" vs. "Pessimistic" Handling failures when the progress is already updated
Talking to Your Servers
Roll your own network message queues stream_socket_server(), stream_socket_client() Asynchronous IO
stream_select() curl_multi() PECL HTTP
Failure Tolerance
PHP cannot recover from some types of errors Heartbeat
Moves a service among cluster init style scripts start/stop services
Angel process Watches a persistent process and restarts it if it fails
What if dependent services fail?
"Angel" Process
<?php
function run($function, array $args = array()) { do { $pid = pcntl_fork(); if ($pid === 0) { call_user_func_array($function, $args); exit; } } while (pcntl_waitpid($pid, $s)); }
?>
Angel as a Cron Job
• In your primary script write your pid to a file• In the angel cron check for that pid file and if it exists,
ensure the pid is still running `ps -o pid= <pid>` or file_exists('/proc/<pid>')
• If the file does not exist, or the process can not be found, restart the process
Resources
• http://php.net/manual - as always• http://linux-ha.org/ - Heartbeat• http://dev.sellingsource.com/ - Forking tutorial• http://curl.haxx.se/libcurl/c/ - libcurl documentation• man pages• http://search.techrepublic.com.com/search/php+cli.html