Tag: cakephp performance

Million dollar site in CakePHP …

Is that possible? Yes, and it has been done.

But…

— CakePHP is slow. I’ve seen bench-marks on “Hello world” and cake falls behind other frameworks.
— What’s your caching strategy?
— [crickets] …

  1. Cache content
  2. CakePHP comes with a built-in view caching mechanism. Granted there is always a question of real-time data vs. performance, but I’ve yet to come across a project where at least some content could not be cached. Even a five or ten minute cache can be quite helpful if you’ve got millions of hits a day on your site. I’ve described some strategies in a post a while back, so give it a read, if you are after the details of implementation.
    If you need something faster, consider adding a front-end cache solution, like varnish or reverse-proxy nginx. (If you have a choice consider replacing apache, which requires a lot of tweaking with nginx as your web server, which is quite fast out of the box).

  3. Cache DB queries
  4. Optimizing queries is important. You should always use containable and limit the fields being returned by each query. This optimization will only take you so far, however… and I would argue that before you go head-first into nitty-gritty details of each query, you should invest into setting up memcached (nicely supported by cake) to alleviate some load on your DB.

  5. Index your DB fields
  6. Do you have some JOIN’s? Do you perform some searches on certain fields? (i.e. “username”). Then you’d better remember to index any fields in your DB that are used in the JOIN or being searched on. There are a few ways to properly use indexes, but one thing is for sure — without them, your DB and your app performance is going to suffer greatly.

  7. De-normalize your DB structure, better yet use an appropriate DB system
  8. Speaking of JOIN’s, no matter how you slice it… they are costly for the performance. Sometimes it helps to de-normalize your DB structure in order to avoid such expensive operations. An example would be a users table and user_profiles table. I love to keep minimal information in the users table, but if I find that I keep JOIN’ing user_profiles to get additional information, it is probably time to consider de-normalizing data and move whatever piece of info I need into the users table to avoid an extra JOIN. (Counter cache is another prime example of this).
    That being said, if you find yourself de-normalizing your DB quite heavily, perhaps it is time to consider an alternative to RDBMS. Of course, I am going to suggest MongoDB. Where applicable, it is quite alright to use a mixture of DB systems… always use the right tool for the right job (simple, but powerful statement).

  9. Create read/write DB replicas
  10. Do evaluate your read and write queries. Do you have a ton of admin features, which require complex find() operations? Offload them to a read replica, so that your users (or front-end) write queries do not get in the way. Increasing performance through replication is a nice trick, but you should be cautious not to offload mission-critical data to a replica, because the data might be a little behind as compared to your master server.

  11. Offload heavy tasks to a background process
  12. I do hear this pretty often… “I need to export data to an Excel, and it is taking forever”. This is a perfect example of a job that can be offloaded to a background operation. There is simply no need for a user to sit in-front of the screen and wait for X minutes for the Excel file to build and download. When a user requests an Excel report, add this task to your job queue manger and notify the user when the job is complete. (Have you looked at gearman?).
    — Well, I need to have this real-time.
    — Sorry, but this is not going to happen. Having a user sit and wait is already not “real-time”, IMO. Not to mention the unnecessary stress this kind of operation will put on your DB. (This kind of requirement is a perfect time to consider replication as well).

  13. Use AJAX, when needed
  14. The whole point of AJAX was to minimize the number of heavy requests to the server. Imagine you have an e-commerce site, where you have a page of most popular t-shirts (purchased by the users in the last week) cached… for a week. Makes sense, you only rebuild this list once a week, based on the updated data in the DB. For a week, this popular page on your site is served statically, just like good ol’ HTML.
    But there is a gotcha… you have shopping cart info also as part of this page. Well, AJAX to the rescue. While the rest of the page is cached, one little div, which has the shopping cart summary is easily updated by AJAX.
    Granted this is a very simple example, but think about how well this applies to other situations where you need to mix and match static and dynamic content.

A few other things worth looking into: APC, hiphop-php, Yahoo! performance rules.

Overall, I suggest not to get too hung up on trying to squeeze milliseconds out of your PHP code, unless you’ve exhausted all other resources. Spend your time on proper architecture and caching strategies. Avoid premature optimization and do not trust “hello world” benchmarks when measuring something so complex as an entire framework.
Oh, and before you invest hours into setting up various caching and optimization tools, do not forget to run a load and stress tests to help you identify early bottle-necks upfront.

Would love to hear your experience with optimization, caching and improving performance.

Make your CakePHP app ridiculously faster with “view caching”

Big or small, once your app goes to production you’d better start thinking about utilizing view caching ;)

Granted, view caching in a dynamic application can be tricky or impossible(?)…
but with proper usage wherever and whenever needed, it can make your app go off the charts in terms of performance.
If you are not familiar with view caching at all, you should read the CakePHP manual… yet I will give you some basic examples and couple of things that may not be so well known.

1. Enable view caching for 10 minutes for the main (index) page

public function index() {
$this->helpers[] = 'Cache';
$this->cacheAction = '10 minutes';
}

In this little snippet we’ll add the Cache helper, only when required by the given action. No need to load for the entire controller. (Once the page is “visited” for the first time, the given view will be cached… you can see the evidence of this in your app/tmp/cache/views directory).

2. Be mindful of what you are caching, certain things should remain dynamic

Whenever there is dynamic content within the the view, such as a div, which is updated by AJAX, for example, you’d want to utilize the cake:nocache tags, like so:

<cake:nocache><?php echo microtime(); ?></cake:nocache>

… but there are a few “gotchas” to be aware of…

a. Callbacks such as beforeRender() or beforeFilter() do not fire by default.

For larger scale apps this nearly kills the ability to use view caching.
However, not all is lost… In your controller you can setup a property to ensure that callbacks do run for certain actions where they are truly required (as shown in CakePHP manual):

public $cacheAction = array(
  'view' => array('callbacks' => true, 'duration' => 21600),
  'add' => array('callbacks' => true, 'duration' => 36000),
  'index'  => array('callbacks' => true, 'duration' => 48000)
);

Additional option proposed by jrbasso, is to fire-up the necessary logic by adding the following snippet to the layout:

<cake:nocache><?php
if (isset($controller) && is_a($controller, 'Controller')) {
  $controller->constructClasses();
  $controller->startupProcess();
}
?></cake:nocache>

Still a little overhead, but much much much faster overall. Both options achieve the same basic goal and I will leave it up to you to see, which one suits your needs better.

b. Spaces

Another excellent catch by mr. jrbasso is that if you have spaces between your cake:nocache and php tags, the “no-caching features” may not work as expected.
So be sure to always place your dynamic PHP code like so:

<cake:nocache><?php
if($dynamic) {
  echo 'This is a timestamp: ' . microtime();
}
?></cake:nocache>

c. Don’t surround your elements with cake:nocache tags.

Your entire element will not be cached by doing this:

<cake:nocache><?php echo $this->element('my_element'); ?></cake:nocache>

Yep, you have to go inside the element and decide what to cache or what not to cache. Which, in some cases makes sense, depends on the element really.
Alternatively, you can cache an entire element using the ‘cache’ key:

<?php echo $this->element('my_element', array('cache' => array(
   'time' => '+1 day',
   'key' => 'my_cached_element'
))); ?>

Since the element itself can have dynamic content, the ‘key’ can be used to create different “versions” of the cached element. Of course in the above example it is a static key, but in other cases you could create a dynamic key based on some conditions or criteria, which would make sense for your application.

Real-time updates vs view caching

It goes without saying that once your view is cached any changes to the underlying model will not be reflected until the cache is invalidated. In some cases it is perfectly fine, in others you might want to clear the cache once the model has been updated.
It is quite easily achieved by adding the following to your model:

public function afterSave() {
  clearCache('related_view*');
}

Granted, on the first hit to the page the cache will be rebuilt and some “extra” weight will be put on the application, yet by using this technique you’ll have the benefit of using the cache and real-time reflection of the updated data.

Don’t let APC get in the way

I am not sure if other PHP opcode engines can cause the same trouble, but at least with APC we’ve found that it can conflict with cake’s view caching. It would be a bit too much to go into details of the issue, but if you use both APC and view caching for your application and notice some strange behavior be sure to run APC with stat “ON” (which is the default behavior).

To learn a little more about this, you can take a look here

In summary I might as well throw this out: when comparing framework performance, using examples such as “Hello world!” don’t forget that any framework or application requires caching. As many other things, ensuring high-availability and excellent performance is quite easy in cake, and beyond that be sure to utilize other tools (APC, memcached, etc.), which are just as necessary in any modern web application as a web server itself.

Speeding things up with materialized views and MySQL

First let’s see what Wikipedia says about materialized views:

A materialized view takes a different approach [from the regular sql view] in which the query result is cached as a concrete table that may be updated from the original base tables from time to time. This enables much more efficient access, at the cost of some data being potentially out-of-date. It is most useful in data warehousing scenarios, where frequent queries of the actual base tables can be extremely expensive.

This feature exists in some databases, but we do need a work-around for MySQL, still let’s see if it’s even worthwhile to consider.

All that being said (well quoted), there are a couple of ways to look at this situation…

1. You will have speed of access, but the data might be out of date
2. You perform a lot of queries for the same data, which does not change very often

I prefer to be more optimistic and look at it from the second point of view.

Let’s consider the following situation:

You are looking for a listing of the best Irish pubs in your area.
The greatest solution would be to query best_irish_pubs table with $this->BestIrishPub->find(‘all’); and be done with it…

While that would be an excellent solution, unfortunately our database is likely to be highly normalized and because of that a simple query is not going to cut it.
To follow along with the example, we can imagine that the following information might be required:

Pub type = Irish
Pool table = Yes/No
Bartender first name
Bartender last name
How many bartenders on shift from 4-8?
Out of the above how many are female?
Hours of operation?
Pub latitude
Pub longitude
Distance units = miles or kilometers
Pub owner’s name

The point of all this is that the data is likely to be scattered all over the database and can potentially require some complex calculations on the fly (using longitude and latitude to determine distance); attempting to squeeze all of these requirements into a single query is not impossible, but highly inefficient for the life of your database, web application… as well as your own.

To make the long story short, it becomes increasingly difficult to optimize a single complex query to squeeze out a few seconds of improved performance. If you’ve got some complex JOIN’s, an ORDER BY and other well known performance grinders, it is not unreasonable to see a single query take as much as… let’s say 30 seconds.
Even if you manage to make it 50% faster, it is still completely unacceptable by any web standards. You’d have to struggle to make it thousands times faster (literally) just to get to a reasonable speed.

1. What about query caching?
In many cases this will be your silver bullet, although in some cases it won’t be possible to implement. If our web app is going to show the best Irish pub in any city in the world with a pile of different conditions (options) not to mention any required calculations.., you can imagine that caching a specific query is going to be quite impossible.

2. What if we had the desired data in a single table?
That would certainly speed things up, but you don’t have to sacrifice your database design and ignore normalization in order to increase performance. Your data should be stored in such a way that your tables aren’t growing horizontally. For example, it is easy to be tempted to add a new field to some table rather than normalizing and separating the data into a few smaller and more manageable tables.
However, just as you like to organize your code using MVC to keep things from getting out of control ;) the database should be fundamentally well designed. One does not want to build on top of a crippled foundation.

3. Materialized views to the “rescue”…
The goal can be met half-way… Unfortunately MySQL does not have a materialized view capability. (One implementation I’ve found is here, but without trying it I am just going to proceed with a more manual approach).
Good news is that we know what sort of results are going to be required, therefore we can take our complicated query and make a regular view out of it. (It is quite simple to create a view once the SQL query is known, a little google’ing will have you up and running in a few minutes).
Hold on a second, wouldn’t that be the same exact query and now we throw in an underlying complexity of a MySQL view?! Shouldn’t things become even slower now?
Indeed the application would become even slower… if we just left everything as is.

4. Tables and cron to the rescue!
Here’s what’s going to happen next…
We will query our view and create a table that will hold all the relevant data (exactly like the view does). With one major difference: once the query is completed, the data in the table stays as is, regardless of what is going on in the rest of your database. Think of it as cached data ready to be queried, worked on calculated, etc. further.
The query can still take 30 seconds to execute but in our case an update to a pub doesn’t happen very often. If we create a cron, which updates our “cache table”, and runs every 3-4 hours it could very well work for this “pub scenario”. For some application the update could happen weekly for others, every few minutes. Bottom line is that there is always a little trade off for up-to-the-second data accuracy and a dramatic increase in performance. Running that same query against a live DB will certainly upset your users, even if they get the most up-to-date results.
That being said, there are ways to manually trigger updates of your “cache table” by using model’s afterSave() and afterDelete() callbacks, so “technically” you don’t have to worry about missing any updates. The frequency of these updates should be the best guide on how you’d implement this approach to properly balance any trade offs.

5. Recap and a little SQL
First, we identify our complex or slow query that should not be running against a live DB.
Secondly, based on this query we build our view to hold the essential data in a single place.
Next, we create a “cache table” to replicate the materialized view functionality.
The best and simplest implementation I’ve found would look something like this:
[sql]
DROP TABLE IF EXISTS `best_irish_pubs_new`;
CREATE TABLE `best_irish_pubs_new` SELECT * from `best_irish_pubs_view`;
ALTER TABLE `best_irish_pubs_new` ADD INDEX (`lat`)’;

– Continue to add as many indexes to the table as needed. The line above is for reference.
– You can also set your database engine and collation at this point.

RENAME TABLE `best_irish_pubs` = `best_irish_pubs_old`, `best_irish_pubs_new` = `best_irish_pubs`;
DROP TABLE IF EXISTS `best_irish_pubs_old`;
[/sql]
(Thanks to the original poster http://dev.mysql.com/doc/refman/5.0/en/create-view.html… see comments for more details).
Now we can perform calculations and querying on a single table where the results have already been thoroughly prepared to be extracted. For example, the distance calculation can be done since we have latitude and longitude in the table, we can order by bartender’s name since we brought all this info in form other tables via JOIN’s… but the best news, of course, is that we have dramatically improved performance.

A real world number is that by employing this technique we’ve cut down a 27.89 second query to under one second for a pretty complex resultset.

And there we have it… a long ass post, but hopefully a useful one nonetheless ;)