Digital Web Magazine

The web professional's online magazine of choice.

Being a PHP Lumberjack

Got something to say?

Share your comments on this topic with other web professionals

In: Articles

By Daryl L. L. Houston

Published on March 23, 2005

How many times have you sat staring blankly at a screen trying to track down an elusive bug? We all know the drill:

  1. Stare at code for five minutes
  2. Furrow brow
  3. Tweak something
  4. Execute code again
  5. Watch it break again
  6. Furrow brow
  7. Stare at code for five minutes
  8. ...

I know people who spend hours beating their heads against this wall, tweaking their code to within an inch of its life and distorting it blindly in hopes of solving it by brute force. I’ve done it myself.

There are better ways to hunt down logical errors in your code. Tools (like Zend’s debugger and profiler) aren’t an option for everybody, so I’d like to focus on a more generalized practice that can help you watch your code’s execution. Get out your double-bladed axe—we’re going logging.

Logging means writing information to a file for retrieval later. This is a valuable definition and a good practice that I’ll address briefly later, but I use the term in a more general sense. The point of logging is to document things that have happened during program execution. For our purposes, it doesn’t matter whether information is saved to the system or is printed to the screen for real-time use.

Seeing the Trees Despite the Forest

There’s a lot going on in the typical large Web application. It’s easy to overload yourself with logged information, obscuring the useful log output with unneeded information. In fact, There are only a few cases in which logging is really useful. Consider the following cases:

Let’s go over these one by one.

Determining whether program execution gets to point X

The first logging statement I write when a piece of new code isn’t producing any output is one that prints "GOT HERE" at the very top of the file. If it prints, I know that the code is being executed. I begin to move the line down the file until "GOT HERE" doesn’t print anymore. At that point, I know where the problem is—in the section just above the print statement.

This method helps me determine where an error is occurring, an important first step in debugging.

Determining whether or not a condition is being met (a special case of the point above)

I consider this a different kind of logging because it’s more targeted. In the first case, I’m using a two-by-four to swat a fly and simply locating where the code is going wrong. In this case, I have a better idea what’s wrong. I’m suspicious a specific condition is not being met so I’m using the more precise flyswatter.

Rather than simply printing "GOT HERE," I print the value of a variable I’m testing. I might also print the value of a variable that’s supposed to change within the condition, showing the "before" and "after" value (particularly useful in switch statements or blocks with multiple elses).

Determining whether value Y is being set as expected, when expected

Determining the value of a variable at a given point can help you figure out several types of problems. For example, you might discover that you’ve mistyped a variable name somewhere in your code and are setting an unused variable’s value rather than changing the value of an existing variable as expected. Printing the value of the variable rather than simply allowing the error to affect the subsequent portions of the program doesn’t help you find the location of the typo, but it clues you in that you need to look elsewhere in the code to find where the variable is being set. This sort of logging also comes in handy for troubleshooting scope issues. Some people still use $var rather than $_GET["var"], for instance, and on systems in which register_globals is disabled, this will cause problems. Testing the value of $var and seeing that it comes up blank might prompt you to test for $_GET["var"] and then to set $var to that value or to use $_GET["var"] directly.

Monitoring program execution step-by-step as it happens

Tired of writing one-off logging statements? Write some code that will print or save a series of logging statements. This will let you watch the execution of a program to determine any gaps, and is especially useful if used in conjunction with a debug flag. This affords you flexibility without requiring you to constantly comment out code as you switch from development to production mode and back.

Swinging the Axe

Now that we’ve spotted when to use logging, let’s consider how to use logging. First, we should decide whether program execution should stop after printing a given log statement—something that might need to happen if a query upon which others depend fails. For logging that should stop program execution, the die() or exit() functions can be used:

if($condition==true){
   die('GOT HERE');
   //Or:
   //exit('GOT HERE');
   //Or even:
   //print 'GOT HERE';
   //exit();
}

When logging information at a point that should not stop program execution, you can simply use print statements. In such cases, you might also consider building up your log information for display at the end of the program execution. Consider the following small program:

//Set $show_log to 1 to show the log and 0 to suppress it.
$show_log=1;
$log='';

for($count=0; $count<100; $count++){
   //If $count is evenly divisible by two, append a log statement to $log.
   if($count %2==0){
        $log .= "Got to iteration " . $count . ".";
   }
}

//If we've opted to show the log, print it.
if($show_log==1){
   print $log;
}

This code lets you choose whether or not to display a log of the program’s activity. The log is written to regardless of how the $show_log flag is set, and prints the log only if it’s in debug mode. This saves having to write an if statement for every log statement and makes it very easy to toggle log display on or off (something we might do when moving from a development to a production environment).

Sometimes it’s helpful to display data structures—such as arrays and objects—more complicated than what a simple print statement will accommodate. PHP provides the functions var_dump(), var_export() and print_r() for displaying such structures. Each function displays key/value (or property/value, in the case of objects) pairings, indenting to indicate hierarchy. There are some differences among the functions, however, as summarized in the table below.

Differences among the functions
var_dump()var_export()print_r()
RecursiveYesYesYes
Shows private/protected propertiesNoYesYes
Valid PHP syntaxNoYesNo
Prints multiple argumentsYesNoNo
Can suppress output NoYesYes

All of the functions are recursive. That is, they will print out arrays within arrays within arrays within objects, and so on. Only var_export() and print_r() display private properties and methods of objects, so if you need to print all properties and methods of an object, you should use one of these functions.

Only var_export() displays its output in valid PHP syntax, so you can eval() its output to create a new array without raising any errors.

Only var_export() can take an indeterminate number of arguments to print out (if you pass three arrays, the structures for each will be printed in succession). And only var_dump() and print_r() take a boolean second parameter that determines whether to print output immediately or return it (passing true causes a return; the default behavior is to print immediately).

Now that we’ve dissected the functions, let’s think about how they can really be useful. Say you’ve created a bunch of objects and you find that the value for one is never being modified as expected after you pass it to a function. I’ll assume you’re using PHP 4, so that objects aren’t automatically passed by reference. Take the following code, for example:

//Assumes you've defined the class myObject elsewhere.
$obj=new myObject();
$obj->val='one';
myfunc($obj);
print $obj->val;

function myfunc($obj){
    $obj->val='two';
}

In this code, we’ve instantiated an object, set its "val" property to "one," passed it as a parameter to the function "myfunc," and printed its "val" property. You might expect the print statement to print the string "two" because that’s what we changed the property to in myfunc(). But because we passed the object by value rather than by reference, myfunc() changed a copy of the object rather than the object itself. Now when we then print the "val" property after executing the function, it hasn’t changed because we’re printing the property from the original object. If you’re stumped on this, add some logging to determine the state of affairs at different times during program execution, as follows:

//Assumes you've defined the class myObject elsewhere.
$obj=new myObject();
$obj->val='one';

//Print the object structure
print "Before passing to function:<br />" . print_r($obj,1) . "<br /><br />";

myfunc($obj);

//Print the object structure
print "After passing to function:<br />" . print_r($obj,1) . "<br /><br />";

print $obj->val;

function myfunc($obj){
    $obj->val='two';
    //Print the object structure
 print "Inside function:<br />" . print_r($obj,1) . "<br /><br />";
}

We’ve added print statements before, inside, and after the function so that we can see the "val" property at three crucial points. This code produces the following output:

Before passing to function:<br />myobject Object
(
    [val] => one
)
<br /><br />Inside function:<br />myobject Object
(
    [val] => two
)
<br /><br />After passing to function:<br />myobject Object
(
    [val] => one
)
<br /><br />one

You can see the the function changed the value of its object correctly, but since it’s working on a copy of the object, the original object isn’t affected by the change. There are at least two workarounds for this issue. Change the myfunc() definition by prepending an ampersand to the parameter. This forces the function to take a reference to its parameter rather than make a copy of it. It also causes the function to return its copy of the object and assign that return value to the original object. Logging a data structure in this instance helps you to track down an issue that is easy to fix—with minimal effort and not a bit of head-scratching or brow-furrowing.

When using these data-structure functions in a standard Web application, you’ll find that helpful white space isn’t honored in your Web browser by default because browsers are designed to ignore tabs and multiple spaces. I work around this by passing the second boolean flag to suppress output where applicable and to print the output within pre or textarea tags to maintain formatting. Here’s the syntax:

print "<pre>" . print_r($obj,1) . "</pre>";

Counting Rings

Just as you can learn something about the age and history of a tree by looking at its rings, you can monitor the flow of a program’s execution by using the debug_backtrace() function. This returns an array of associative arrays with keys representing (if applicable) function, line, file, class, type and arguments for each function called to get to the current function. This can be particularly useful in modular code spanning many files and using many nested functions. To see it in action, check out the following code:

print foobar(2);

function foobar($num){
    //Call and return the value from foo()
    return foo($num);
}
   
function foo($foobar){
    //Call and return the value from bar()
    return bar($foobar * 2);
}
   
function bar($foo){
    //Print the backtrace to show what's happened so far in the program execution.
    print_r(debug_backtrace());

    //Multiply the parameter by 2 and return
    return $foo * 2;
}

Because it shows a history of where the code has been so far, the deeper into execution debug_backtrace() is called, the more information it displays. In this example, I’ve called the function in the deepest-nested function to display information about all the functions called to get to that point. The output in our case is as follows:

Array
(
  [0] => Array
    (
      [file] => /var/www/backtrace.php
      [line] => 11
      [function] => bar
      [args] => Array
        (
          [0] => 4
        )
    )
  [1] => Array
    (
      [file] => /var/www/backtrace.php
      [line] => 7
      [function] => foo
      [args] => Array
        (
          [0] => 2
        )
    )
  [2] => Array
    (
      [file] => /var/www/backtrace.php
      [line] => 3
      [function] => foobar
      [args] => Array
        (
          [0] => 2
        )
    )
)

The array lists the most recent function first. You can track the execution path by going to the end of the array to see that foobar was called with "2" as a parameter, followed by (from within that function) foo with "2" as a parameter and (within foo) bar with "4" as a parameter. Note that because debug_backtrace() returns an array, we need to use one of the data-structure-friendly print functions to display its output correctly.

Printing backtrace has proven most useful to me in debugging content management system applications with fairly complicated APIs in which almost absurdly (if necessarily) abstracted functions are called dynamically with many layers of nesting and across many files. Using this function has saved me much grief, preventing me from having to have 15 windows open and keeping track of what I’m viewing in each window to determine what the program flow is as I work my way from function to function to try to find a bug.

Watching the Trees as they Fall

By default, PHP suppresses output until all output has been prepared for display. This can be a problem if you’ve got code that takes a long time to execute but that you wish to keep tabs on as it runs.

I’ve inherited some code that does invoice posting. It has to iterate over many line items and perform a series of checks for each and then in some cases to perform several update or insert queries for each line item. If any query fails or if any of the checks aren’t passed, all queries are rolled back. Because this code is executed by our accounting department and extends the functionality of our accounting system, we use a Web interface for posting. In the past, when our accounting staff tried to post an invoice, they had to wait for a minute or so, staring at a blank screen with no idea what progress had been made or if the code was even executing. In order to make their experience a little less maddening, I wrote a class that uses output buffering to display output as it’s generated. (I won’t get into the specifics of output buffering in this article.) The code that follows can also be used to display debugging statements as they’re generated.

class StatusBar{

  var $spaces;
 
  function StatusBar(){
    //Flush the output buffer. This turns output buffering on if it’s not already on.
    ob_flush();

    /*This trick gets around a browser limitation that prevents output from being sent if it’s not longer than a minimum length. A little kludgey, but it works. We’re debugging, after all, so it doesn’t have to be too elegant. Note that these are comments, so they’re not actually displayed on the screen.*/
    for($i=0; $i<1000; $i++){
      $this->spaces .= "<!--buffer-->";
    }
    $this->spaces .= "n";
  }

  //Call this function to actually display a message.
  function flush($msg){
    print $msg . $this->spaces;
    ob_flush();
  }
     
}

The class is used as follows:

$statusbar=new StatusBar();

for($count=0; $count<100; $count++){
  //If $count is evenly divisible by five, print a log statement
  if($count %5==0){
    $sb->flush("Got to iteration " . $count . ".");
  }
}

For every fifth iteration, a message will be immediately printed to the screen. For fast, simple code like the example above, in which output is nearly instant anyway, the StatusBar class will be of little use. For code with a long execution time, the class can be a handy way to monitor progress. It could also be enhanced, as was my original intent (consider the name of the class), to print images that would display a status bar as a program executes.

Save the Trees

So far, all the logging we’ve done has been to the screen for immediate and temporary display. Debugging is immediate and temporary, after all. However, sometimes you’ll want to save your logs. There are a couple of ways to do it. One is simply to open and write to a file:

$fp=fopen('log.txt','w');
fwrite($fp,'log message here');
fclose($fp);

Another is to use the error_log() function to send your messages to any of several places. This function’s first parameter contains the message to send. The second optional parameter is the action to take, as listed in the table below. The third (optional) defines the destination of the message for some actions (email address or filename, for example), and the fourth (optional) defines extra headers to send with an email message.

error_log() parameters
Parameter Where message goes
0 System log as defined by PHP. An Apache error_log file, for example.
1 Mail address defined in third parameter and with headers specified in fourth. Uses mail() function.
2 PHP Debugging connection if remote debugging enabled. Probably pretty rarely used.
3 Appended to file listed in destination.

Chainsaw or Hatchet?

These logging functions can be a life saver as you try to debug your code, especially as your codebase becomes larger and more complex. These methods have saved me hours debugging large-scale Web applications, like substantially abstracted CMSs with many nested calls to hard-to-trace functions.

The type of logging you do depends on the nature and scope of the application at hand. For small one-off applications, simplified logging is fine. But for larger scale code, it’s worth it to put a little extra work into adding logging code that is executed or whose output is displayed only under certain conditions (like when a flag is toggled). You might also consider implementing different log levels, so that logging statements are given priority, with only high priority logs printed by default. Whatever implementation you choose, I think you’ll find that putting a little time into understanding how to log efficiently will save much time and frustration in the future.

Got something to say?

Share your comments  with other professionals (8 comments)

Related Topics: PHP, Programming

 

Daryl L. L. Houston is a literature geek by training and a computer geek by trade. He's done public Web site and private business systems programming for well-known and respected Fortune 500 companies in recent years. PHP is his weapon of choice, but he's done production work in perl, python, and jsp as well.

Media Temple

via Ad Packs