Offload read queries to a replica DB for better performance

In most web application, which require a lot of find()’s especially if more than a couple of models are involved, you should probably consider offloading those operations to read-only replica of your DB. This is typically achieved by having a master/slave or master/master configuration. In high traffic application you might have a cluster of databases, but for the purpose of this example we’ll only use two data sources: “default” and “replica”.

Therefore our basic database.php will look something like this:

public $default = array(
    'driver' => 'mysql',
    'persistent' => false,
    'host' => 'production.example.com',
    'login' => 'user',
    'password' => 'password',
    'database' => 'production',
    'prefix' => '',
    'encoding' => 'utf8'
  );
 
  public $replica = array(
    'driver' => 'mysql',
    'persistent' => false,
    'host' => 'readonly.example.com',
    'login' => 'user',
    'password' => 'password',
    'database' => 'readonly',
    'prefix' => '',
    'encoding' => 'utf8'
  );

This prepares our application to use two data sources, as needed.

Next, let’s imagine we have a good ol’ blog and need to grab various information to build a list of posts.
In our Posts Controller, we’ll have some method that gets the required information:

$this->Post->getListofPosts();

The method above will have to involve additional models to get all of the needed info (Author, Tag, (PostsTag for the join table), PostRating… and maybe a few other models). The point is that this is enough operations already to consider offloading them to our read-only DB server.

The actual process is quite simple.
First, we’ll create a generic method in our App Model:

protected function _switchDataSource($models, $datasource = 'default') {
    if (is_array($models)) {
      foreach ($models as $model) {      
        ClassRegistry::init($model)->setDataSource($datasource);
      }
    }
  }

I hope this code is simple enough, but the implementation example is coming up…
It’s worth to note that ClassRegistry::init() will cache your model information (object instance, to be more precise) in memory, therefore in a more complex case (where you might have multiple find()’s) the newly switched data source will persist until switched back. Therefore it is important to remember to “reset” your data source once you are done with the read operation(s).

Now, here’s the basic usage sample (this snippet would be inside of our getListofPosts() method):

//let's switch our DS to replica
$this->_switchDataSource(array(
  'Post', 'PostsTag', 'Author', 'PostRating'
), 'replica');

//now we can execute our find with all of the above
//model data coming from the read-only DB
$posts = $this->find('all', array(
  'contain' => array(
  //include our models here
  ),
  'limit' => 35
));

//don't forget to switch the DS back to default
$this->_switchDataSource(array(
  'Post', 'PostsTag', 'Author', 'PostRating'
));

return $posts;

As you see the implementation is pretty simple. The only thing to keep in mind is that if you are getting some SQL errors, chances are you have not included all of the required models for the operation, the most common case is forgetting the join table model. To troubleshoot and see the results the debug kit is very helpful, because it will show you which data source is being used to run a particular set of queries.
Another hint, if you use the same set of models over and over you might as well assign them to a property in your model, so that if you need to add or change something you’d only do it in one place.
Using our example we can modify the code like so:

public $postQueryModels = array('Post', 'PostsTag', 'Author', 'PostRating');

....

$this->_switchDataSource($this->postQueryModels, 'replica');
  • http://www.jblotus.com James Fuller

    Great tip, a read only database is a good solution. On our site at work we try to use the read only slave in as many situations as possible. One time where you might not wish to use it though is with time-critical data. Occasionally the read only db will get backed up when it comes to replication. Also having a fallback to the master is probably a good idea.

  • http://josediazgonzalez.com Jose Gonzalez

    Would it not be more useful to have this as a behavior? You could automagically crawl the contains and conditions (to some default recursion level) to switch the datasource in the Behavior::beforeFind() and switch back in the Behavior::afterFind(). Of course you could provide an override for when the recursion isn’t useful enough, but it would be a nicer-looking implementation than this.

    Also, does this work for manually specified joins? Just curious.

  • teknoid

    @Jose Gonzalez

    1. I am pretty sure it does work with manual joins, but I need to look at the code of the app to be sure.
    2. This approach is a little safer than going full force with Behavior, because (in our case at least) all reads could not be offloaded to replica for the very reason James Fuller mentioned above (real-time data requirement). Plus it was much faster to implement in a short deadline … an hour, before DB blows up :)