Quantcast
Channel: Life Scaling » LAMP
Viewing all articles
Browse latest Browse all 10

Connecting Several PHP Processes To The Same MySQL Transaction

$
0
0

The following concept is still under examination, but my initial tests proved successful, so I thought it’s time to share.

Here is a problem: There is a batch job reading remote data through XML-RPC, and updating a local MySQL database according to the XML-RPC responses. The db is InnoDB, and the entire batch job should be transacted. That is, in any case of failure, there should be rollback, and on success there should be commit.

So, the simple way is of course a single process script that uses a linear workflow:

  1. START TRANSACTION locally.
  2. Make XML-RPC request, fetch data.
  3. Insert data into db as needed.
  4. Repeat 2-3 until Error or Finish.
  5. If Error ROLLBACK, if Finish COMMIT.

This works, but you may notice a bottleneck, being the XML-RPC request. It’s using http, and it’s connecting to a remote server. Sometimes the XML-RPC server also takes time to perform the work that generates the response. Add the network latency, and you get a single process that most of the time sits idle and waits for response.

So if we have a process that just sits and waits most of the time, let’s spread its work over several processes, and assume that while most of the processes will be waiting, at least one can be free to deal with the local database. This way we will get maximum utilization of our resources.

So the multi-process workflow:

  1. START TRANSACTION locally.
  2. Fork children as necessary.
  3. From child, make XML-RPC request, fetch data.
  4. From child, acquire database access through semaphore.
  5. From child, insert data into db as needed.
  6. From child, release database access through semaphore.
  7. From child, repeat 3-6 until Error or Finish.
  8. From parent, monitor children until Error or Finish.
  9. From parent, if Error ROLLBACK, if FINISH COMMIT.

Now, the workflow seems all and well in theory, but can it work in practice? Can we connect to the same transaction from several different PHP processes?

I was surprised to find out that the answer is positive. As long as all processes share the same connection resource, they all use the same connection. And in MySQL, the same connection means the same transaction, given that a transaction was started and not yet committed or rolled back (either explicitly or implictly).

The secret is to create the connection resource with the parent, and when forking children, they have a reference to the same connection. The caveat is that they must access the resource atomically, otherwise unexpected behavior occurs (usually the connection hangs, I am guessing that it is when one child tries to read() from the socket and the other to write() to it). So in order to streamline the access to the db connection, we use a semaphore. Each child can access the connection only when it’s available, and it’s blocking if not available.

In the end of the workflow, our parent process acts much like a Transaction Manager in an XA Transaction, and according to what the children report, decides whether to commit or rollback.

Here is a proof of concept code (not tested in this version, but similar code tested and succeeded):

The DBHandler Class

class DBHandler
{
	private $link;
	private $result;
	private $sem;
 
	const SEMKEY = '123456';
 
	public function __construct($host, $dbname, $user, $pass, $new_link = false, $client_flags = 0)
	{
		$this->link = mysql_connect($host, $user, $pass, $new_link, $client_flags);
		if (!$this->link)
			throw new Exception ('Could not connect to db. MySQL error was: '. mysql_error());
		$isDb = mysql_select_db($dbname,$this->link);
		if (!$isDb)
			throw new Exception ('Could not select db. MySQL error was: '. mysql_error());
	}
 
	private function enterSemaphore()
	{
		$this->sem = sem_get(self::SEMKEY,1);
		sem_acquire($this->sem);
	}
 
	private function exitSemaphore()
	{
		sem_release($this->sem);
	}
 
 
	public function query($sql)
	{
		$this->enterSemaphore();
 
		$this->result = mysql_unbuffered_query($sql, $this->link);
		if (!$this->result)
			throw new Exception ('Could not query: {' . $sql . '}. MySQL error was: '. mysql_error());
		if ($this->result === true)
		{
			// INSERT, UPDATE, etc..., no result set
			$ret = true;
		}
		else
		{
			// SELECT etc..., we have a result set
			$retArray = array();
			while ($row = mysql_fetch_assoc($this->result))
				$retArray[] = $row;
			mysql_free_result($this->result);
			$ret = $retArray;
		}
 
		$this->exitSemaphore();
 
		return $ret;
	}
 
	public function beginTransaction()
	{
		$this->query('SET AUTOCOMMIT = 0');
		$this->query('SET NAMES utf8');
		$this->query('START TRANSACTION');
	}
 
	public function rollback()
	{
		$this->query('ROLLBACK');
	}
 
	public function commit()
	{
		$this->query('COMMIT');
	}
}

The Forking Process

$pid = 'initial';
$maxProcs = $argv[1];
if (!$maxProcs)
{
	 $maxProcs = 3;
}
$runningProcs = array(); // will be $runningProcs[pid] = status;
define('PRIORITY_SUCCESS','-20');
define('PRIORITY_FAILURE','-19');
 
try 
{
	$dbh = new DBHandler(DBHOST,DBNAME,DBUSER,DBPASS);
 
	$dbh->beginTransaction();
 
		// fork all needed children
		$currentProcs = 0;
		while ( ($pid) && ($currentProcs < $maxProcs))
		{
			$pid = pcntl_fork();
			$currentProcs++;
			$runningProcs[$pid] = 0;
		}
 
		if ($pid==-1)
		{
			throw new Exception ("fork failed");
		}
		elseif ($pid)
		{
			// parent
			echo "+++ in parent +++\n";
			echo "+++ children are: " . implode(",",array_keys($runningProcs)) . "\n";
 
			// wait for children
			// NOTE -- here we do it with priority signaling
			// @TBD -- posix signaling or IPC signaling.
			while (in_array(0,$runningProcs))
			{
				if (in_array(PRIORITY_FAILURE,$runningProcs))
				{
					echo "+++ some child failed, finish waiting for children +++\n";
					break;
				}
				foreach ($runningProcs as $child_pid => $status)
				{
					$runningProcs[$child_pid] = pcntl_getpriority($child_pid);
					echo "+++ children status: $child_pid, $status +++\n";
				}
				echo "\n";
				sleep(1);
			}
 
			echo "+++ checking if should commit or rollback +++\n";
			if (in_array(PRIORITY_FAILURE,$runningProcs) || in_array(0,$runningProcs))
			{		
				echo "+++ some child had problem! rollback! +++\n";
				$dbh->rollback();
			}
			else
			{
				echo "+++ all my sons successful! committing! +++\n";
				$dbh->commit();
			}
 
			// signal all children to exit
			foreach ($runningProcs as $child_pid => $status)
			{
				echo "+++ killing child $child_pid +++\n";
				posix_kill($child_pid,SIGTERM);
			}
		}
		else
		{
			// child
			$mypid = getmypid();
			echo "--- in child $mypid ---\n";
			//sleep(1);
			echo "--- child $mypid current priority is " . pcntl_getpriority() . " ---\n";
 
			// NOTE -- following queries do not work, for example only
			$dbh->query("select ...");
 
			echo "--- child $mypid finished, setting priority to success and halting ---\n";
			pcntl_setpriority(PRIORITY_SUCCESS);
			while (true)
			{
				echo "--- child $mypid waiting to be killed ---\n";
				sleep(1);
			}
		}
 
} 
catch (Exception $e) 
{
	// output error
	print "Error!: " . $e->getMessage() . "\n";
 
	// if parent -- rollback, signal children to exit
	// if child  -- make priority failure to signal
	if ($pid)
	{
		// rollback
		$dbh->rollBack();
		foreach ($runningProcs as $child_pid => $status)
			posix_kill($child_pid,SIGTERM);
	}
	else
	{
		pcntl_setpriority(PRIORITY_FAILURE);
		$mypid = getmypid();
		while (true)
		{
			echo "--- child $mypid waiting to be killed ---\n";
			sleep(1);
		}
	}
 
}

Well, all of this sounds well, and also worked well on a development environment. But it should be taken out of the lab and tested on a production environment. Once I give it a shot, I will update with benchmarks.


Viewing all articles
Browse latest Browse all 10

Trending Articles