PHP Design in Software Engineering PHP Tutorial ~ Way2discuss

PHP Design

· Writing Requirements Specifications

· Writing Design Documents

· Change Management

· Modularization Using include

· Free Energy

· Templates

· Application Frameworks

· PEAR

· URLs Friendly to Search Engines

Building a Web site with PHP is not the same as building a static Web site. If you choose simply to sprinkle PHP code occasionally throughout the site, the effect may be minimal, of course. If you choose to use PHP to generate every page, you will find many opportunities for transforming patterns into functions. As I wrote in Chapter 26, elements such as opening and closing body tags can be put into a function or an included file. The consequence of this situation is that you no longer have just a Web site. You have a Web application.

When this happens, it becomes more important to draw upon formal development techniques. Certainly, structured design is useful when building static Web sites. The case is made plainly in Web Site Engineering by Thomas Powell. The addition of PHP makes careful design critical. PHP applications may not be mission-critical endeavors that include thousands of programmers, but there are some ideas from software engineering that can benefit small projects. I can't cover every topic of software engineering as it applies to Web applications in the context of a chapter. I recommend reading Powell's book as an excellent starting point. I also recommend Pete McBreen's Software Craftsmanship. His ideas frame the experience of PHP-powered development well.

After introducing the basics of software requirements and design, I will explore some specific design issues and solutions.

27.1 Writing Requirements Specifications

Before you can design a system, it is important to understand what it's supposed to do. Too often this comes in the form of a verbal request such as, "We need a home page with a guest book and a visitor counter," which is never further defined. This usually leads to the building of a prototype that is 25 percent of what the client wants. Changes are made to the prototype, and the site is now 50 percent of what the client wants now. During the time the changes were made, the client has moved the target.

The solution to this problem is to set a target and stick with it. This should start with a statement of the goals for the project. In my experience the most important question left unasked is about motivation. When a client asks for a large, animated scene to appear on his index page, often the motivation is a desire to seem familiar with leading-edge technology. Instead of blindly fulfilling the client's request, it is better to look for the best solution for the Why? A slick graphical design can say more about the client's attention to advances in technology.

Once you have asked Why? enough times, you should have a list of several goals for the project. These goals should suggest a set of requirements. If one of the system's goals is to generate more business, one requirement may be to raise visitor awareness of items in the client's catalog. This may evolve into a requirement that products appear throughout the site on a rotational basis. This could be implemented as banners or kickers strategically placed within the site. Don't, however, tie yourself down with design issues. This earliest stage of site development should concentrate solely on the goals of the system.

From a solid base of goals, you can begin to describe the system requirements. This usually takes the form of a requirements specification document, a formal description of the black-box behavior expected from the site. The goals will suggest a collection of functional requirements and constraints on the design. As I've said, having a goal of increasing sales suggests, among other things, that the site should raise customer awareness of catalog items. Another requirement could be that the site provides some free service to attract visitors. An example is a loan company offering a mortgage calculator. It is a good idea to informally explore possible solutions to requirements, but it's still important to keep design decisions out at this time.

The requirements specification is formal and structured, but it should be understandable by nonexperts in the implementation technology. The description of the system's behavior serves partially as a contract between the client and developer. Clear statements will eliminate misunderstandings that have a high cost later in development. That is not to say that the document shouldn't be precise. When possible, state requirements in measurable terms. Constraining page size to 30K is an objective standard and easily tested. Requiring the site to inspire confidence in the client company is not easily measurable, but sometimes it's all you have.

Table 27.1 lists six things toward which a requirements specification should aspire. It should only specify external behavior. Every requirement should be expressed as the answer to a What? question. It should specify constraints. These are best expressed as quantities: How many hits per day? Maximum page size? Maximum page depth? The requirements specification should allow you to change it later. While you should use natural language, don't write a long narrative. Number sections of the document and use diagrams where necessary. It should be a document that helps a future programmer learn about the system. Don't be surprised if that programmer is you six months later.

The requirements should pay attention to the entire life of the system. If the system needs to be able to recover from a catastrophic failure within an hour, write it into the specification. And the follow-up to this idea is that you should describe how the system deals with adversity—not just disaster, but also illegal user input. Some systems ignore user input that is not understood. How many times have you seen a "404 Document Not Found" error? It's nice when that page includes a link to the index page of the site.

Table 27.1. Properties of Requirements Specifications
Specifies only external system behavior
Specifies constraints on the implementation
Allows easy modification
Serves as a reference tool for system maintainers
Records forethought about the lifecycle of the system
Characterizes acceptable responses to undesired events

Keeping these guidelines in mind, refer to Table 27.2, which outlines the structure of a requirements specification. The overview should be a page or less that reviews the goals of the site. If the goals were detailed in another document, make this document available. It is important to preserve the thought that went into the project at each phase. The requirements build on the goals, and in turn the design builds on the requirements. But being able to refer to the original goals of the system will be helpful to the designer and even the implementer.

Table 27.2. Requirements Specification Document Structure
Overview of system goals
Operating and development environments
External interfaces and data flow
Functional requirements
Performance requirements
Exception handling
Implementation priorities
Foreseeable modifications
Design suggestions

The operating and development environments are sometimes overlooked in requirements specifications. This includes both the browser and the Web server. If you are developing an intranet application, you may be fortunate enough to optimize for a particular browser version. I've found that while a large company may impose a standard browser for the organization for which you've developed an application, another standard may apply to the users in another organization a thousand miles away. The most popular browsers operate closer to a standard than they did in the early days of the Web, so this is less of an issue than it was.

The Web server is perhaps more under your control and certainly less finicky about differences in source code. If you are using PHP, most likely you will be using Apache. It's a good idea to use identical versions of both Apache and PHP for your development and live environments.

For the most part, your list of external interfaces will include the Internet connection between the browser and the Web server, the local file system, and possibly a database connection. I find it helpful to create a diagram that shows the relationship between data elements, the simplest of which might be a box labeled Browser connected to a box labeled Server. The line would have arrows at each end to show that information travels in both directions. This diagram is a description of the context, not a design of the data structure. Whether you will be using a database may be obvious, but which database may not be. If the system will be storing data somehow, just show data flowing into a box that could be database or flat file. The goal is to describe how data moves around in the system.

The functional requirements will certainly be the largest part of the document. If you have drawn a data flow diagram, you may have a very good idea of how the system breaks up into modules. The more you can partition the functionality into distinct pieces, the easier it will be to group the functional requirements. I've written many requirements documents for Web applications that are essentially data warehouses. My approach has been to dedicate a section to each of the major data entities. A project management application might have a collection of project descriptions, a collection of users, and a collection of comments. Each of these would have a section in the functional requirements that lists first all the information it stores and then the ways the information can be manipulated.

The performance requirements are constraints on the functionality. You may wish to outline a minimum browser configuration for use of the site. Maximum page weights are a good idea. If the client is dictating that a certain technology be used, it should be noted in this section. It's good to know in advance that while you will be allowed to use PHP, you have to deal with Oracle and Internet Information Server on Windows XP.

The exception-handling section describes how the system deals with adversity. The two parts of this are disaster and invalid input. Discuss what should happen if the Web server suddenly bursts into flame. Decide whether backups will be made hourly, daily, or weekly. Also decide how the system handles users entering garbage. For example, define whether filling out a form with a missing city asks the user to hit the back button or automatically redisplays the form with everything filled out and the missing field marked with a red asterisk.

If the client has a preference for the order of implementation, outline it. My experience has been that, faced with a dire deadline before the project begins, the client will bargain for which functionality will appear in the first round. Other requirements may not be critical to the system, and the client is willing to wait. If there is a preference in this area, it is very important for the designer and implementers to know in advance.

Farther in the future are the foreseeable modifications. The client may not be ready to create a million-dollar e-commerce site just yet, but may expect to ask you to plug this functionality into the site a year from now. It may not make sense to use an expensive database to implement a 50-item catalog, but building a strong foundation for later expansion will likely be worthwhile.

The last part of the requirements specification is a collection of design hints. This represents the requirements writer's forethought about pitfalls for the designer. You might summarize a similar project. You might suggest a design approach.

27.2 Writing Design Documents

Once you have created a requirements specification document, you will have to decide whether to write a design document. Often it is not necessary, especially when a few people are working on a small project. You may wish to choose key elements of a complete design document and develop them to the point of usefulness.

The first part of design is concerned with the architecture of the system. The system should be broken into sections that encompass broad groups of functionality. A Web application for project management might break down into a module that handles project information, a module that handles users, and a module that handles timesheet entries. An informational Web site can be broken down by the secondary pages—that is, the pages one click away from the home page. The "About Us" section serves to inform visitors about the company itself, while a catalog area is a resource for learning about the items the company sells.

Depending on the type of site, you should choose some sort of diagram that shows the subsystems and how they relate to each other. These are called entity relationship diagrams. I almost always create a page-flow diagram. Each node in the graph is a page as experienced by the user. Lines representing links connect the page to other pages on the site. Another useful diagram is one that shows the relationships between database tables. Nodes represent tables, and you may wish to list the fields inside boxes that stand for the tables. Lines connect tables and show how fields match. It's also helpful to indicate whether the relationship between the tables is one to one or one to many.

The next phase of design is interface specification. This defines how subsystems communicate. It can be as simple as listing the URLs for each page. If the site has forms, all the fields should be enumerated. If you are tracking user sessions, you will want to specify how you will be doing this, with cookies or form variables. Define acceptable values for the session identifier. If the site will be communicating with files or a database, this phase will define names of files or login information for databases.

The largest part of a design document is a detailed description of how each module works. At this point it's acceptable to specify exactly the method for implementing the module. For example, you may specify that a list of catalog items be presented using the ul tag. On the other hand, if it doesn't matter, leave it out. The programmer will have the best idea for solving the problem.

I suggest pursuing a style guide, which may be part of the design document or may stand alone. This document specifies the style of the code in the project. You'll find an example in Appendix G, but don't bother flipping there now. The style guide deals with issues like how to name variables and where to place curly braces. Many of these issues are arbitrary. What's important is that a decision is made and followed. A large body of code formatted according to a standard is easier to read.

For the rest of this chapter I'd like to present some design ideas you may choose to adopt. PHP's dynamic nature allows for structural designs that can't be achieved in plain HTML. It is a shame to waste this functionality by using PHP as a faster alternative to CGI. I encourage you to consider using PHP as the engine that powers a completely dynamic Web site.

27.3 Change Management

Anyone who's worked with a team on a Web application knows the pain of dividing the tasks among team members. For small teams, it usually works to shout over cubicle walls. For larger teams, you may need a manager to coordinate the development process. However, Gantt charts don't seem to fit the shoot-from-the-hip mentality of the typical Web programmer. It feels natural to wander through the files of the project, changing them as you tackle a problem without worrying if someone else is editing them.

Sometimes changes are lost, but people cope by keeping backups. Alternatively, team members can warn each other not to touch some files for short period. If a file is destroyed, you may hunt through archives to find an older version. Developers can guard against losing newer changes by keeping local copies of every change they make, but it feels like a big hassle.

Web sites evolve through many iterations. The team works on a project, and it integrates the changes when it finishes. There are two typical methods for putting the changes into production. The brute force method involves replacing all application files. This ensures that you don't miss any files. Alternatively, you can copy just the new files and the files that changed.

Instead of trying to control the source code through ad hoc activities, consider using a source code control system. Popular among C programmers, source code control works well with most programming languages. The PHP development team uses source code control to coordinate the hundreds of people contributing to PHP, as do many open-source projects.

The overwhelming favorite source code control system among open-source developers is CVS (concurrent versions system). CVS is an open-source project itself. At its core is the functionality of the diff and patch utilities that are part of most operating systems. You can use diff to compare two files and find the differences. The patch utility can apply the differences to a third file to bring it up to date.

CVS keeps a repository for a project that includes every incremental change to every file. Users interact with the repository by running shell commands on the server. Remote users must use a remote shell, which is rsh by default. It's wise to avoid rsh and use ssh if you can, as rsh sends passwords and traffic through the net unencrypted. Some open-source projects provide a read-only account for grabbing a current development version without allowing changes.

After checking out files from a repository, a developer may make any number of changes to files without disturbing any other developer. Under normal use, CVS does not grant exclusive use of a file to one user. These are called unreserved copies. Developers work on files concurrently, and CVS takes care of tracking changes as they are checked in. CVS distributes changes on demand to developers. The changes integrate into source files even if the developer updates a file with changes that aren't checked in.

CVS does support reserved copies, but most users find them unnecessary. In most contexts, CVS can resolve differences between files without human intervention. When conflicts do occur, CVS alerts the developer and marks conflicting code plainly.

Although I present a brief tutorial here, find Karl Fogel's book, Open Source Development with CVS <http://cvsbook.red-bean.com/>. The chapters that deal with CVS specifically are free to download, but I recommend buying the book if you decide to use CVS. Beyond the mechanics of CVS itself, it documents how CVS fits into the development process. Also, keep an eye on the Subversion project <http://subversion.tigris.org/>, which aims to build a CVS replacement.

If you're running Linux or FreeBSD, CVS may be installed already. If not, use a package manager appropriate for your system, such as RPM or apt-get. If you're using Windows, you can run CVS clients with no problem, but CVS servers don't work well. You can set up a server that allows local CVS usage with which to experiment, but you need a UNIX operating system to use CVS seriously.

The CVS Web site <http://www.cvshome.org/> has links for downloading binaries for many operating systems. You can also download source code and compile it yourself, but I won't go over those steps. The compilation follows typical steps because it uses autoconf. See the installation instructions in the source code archive.

CVS requires just one binary that's typically installed as /usr/local/bin/cvs. This is the client application, but it also makes changes on the server through a remote shell. To start using a host as a CVS server, you only need to create a repository.

All CVS functionality goes through the cvs command-line utility. The init command to cvs creates a new repository. The -d option sets the path to the repository. CVS creates this directory and places several files inside it. Figure 27.1 is a capture from my shell as I created a new repository and listed the contents.

Figure 27.1 Creating a CVS repository.

# cvs -d /home/cvshome init

# ls -R /home/cvshome

/home/cvshome:

CVSROOT

/home/cvshome/CVSROOT:

Emptydir        config,v       loginfo    rcsinfo

checkoutlist    cvswrappers    loginfo,v  rcsinfo,v

checkoutlist,v  cvswrappers,v  modules    taginfo

commitinfo      editinfo       modules,v  taginfo,v

commitinfo,v    editinfo,v     notify     verifymsg

config          history        notify,v   verifymsg,v

/home/cvshome/CVSROOT/Emptydir:

I created this directory as the root user. This doesn't allow anyone else to use the repository. I created a group named cvs in /etc/group and used chgrp to allow users in this group to use the repository.

Traditionally, CVS uses a password server process on port 2401 for connections. Installation involves adding the server to inetd's list of daemons. CVS manages a set of users and passwords separate from those in /etc/passwd with the pserver daemon. All commands through the password server execute as a single user.

Using pserver is good for public repositories, such as those for open-source projects. If you're using it for your internal team, don't bother with it. It's complicated and less secure than SSH.

CVS uses rsh by default. Set the CVS_RSH environment variable to switch it to SSH. For example, I added the lines in Figure 27.2 to my .bash_profile file.

Figure 27.2 Additions to bash profile.

#make sure cvs uses SSH

CVS_RSH=ssh

export CVS_RSH

To access the CVS server remotely, you must use special notation. CVS uses colons to separate information about the authentication method and the hostname of the server. For example, :ext:leon@192.168.123.194:/home/cvshome matches my repository.

In this mode, CVS will prompt you for your password each time you execute cvs. Some people find this annoying, so they generate an authorized key. This is a function of SSH, not CVS. You can read about this on the OpenSSH site <http://www.openssh.org/>.

Use the import command to create a project inside your repository. This command creates a directory in the repository and copies all the files in your current directory recursively. For example, I started a new project in a directory called myproject. Inside the directory is a single PHP script. To create a directory in the repository, I issued the commands in Figure 27.3. Note how I used backslashes to keep the lines from wrapping.

The -d option appears again, specifying the path to the repository. The -m option applies to the import command. It sets a comment to associate with the CVS action. This comment can be as long as you need, and if you leave out the -m option, CVS will launch an editor for you. The last three commands specify the project name, the vendor tag, and the release tag. These names are up to you. The project name will be the name used for the directory on the server, and it's how you refer to the project, so choose a short name. What you choose for the vendor tag and the release tag aren't important usually. I use the name of the company and start by default.

Figure 27.3 Importing a project into CVS.

/tmp/myproject> cvs \

-d :ext:leon@192.168.123.194:/home/cvshome import \

-m 'starting my project' myproject mycompany start

leon@192.168.123.194's password:

N myproject/index.php

No conflicts created by this import

/tmp/development/myproject>

CVS created a directory on the server, but it hasn't changed any of the files I imported. To work with the files in the repository, you must make a checkout.

The checkout command copies files from the server to your local machine. It also creates directories named CVS in every subdirectory of the project. These subdirectories keep track of the status of the files and where they came from. After making a checkout, you no longer need to specify the path to the repository. CVS will find it in the CVS directory.

Figure 27.4 shows how I made a checkout of my new project.

Figure 27.4 Checking out a project from CVS.

~> cvs -d :ext:leon@192.168.123.194:/home/cvshome \

checkout myproject

leon@192.168.123.194's password:

cvs server: Updating myproject

U myproject/index.php

~>

Once you checkout the project, you can start editing files. Other developers can make their own checkouts. When you finish working on a file, use a commit command to integrate your changes into the project. CVS examines all the files in the current directory and in any subdirectories. It then coordinates with the server to find changes and apply them to the server's copies of the files. Figure 27.5 shows the results of making a commit.

You and other developers can commit changes as often as you wish, and the server keeps the most current version at all times. Your own files do not receive updates unless you ask for them explicitly with the update command. CVS will check all files recursively. If the server has a newer version, it applies the changes to your files. Your changes are not lost. CVS does its best to merge your changes with those committed since you last updated your files.

Figure 27.5 Checking changes into CVS.

~/myproject>cvs commit -m 'added navigation code'

leon@192.168.123.194's password:

Checking in index.php;

/home/cvshome/myproject/index.php,v  <--  index.php

new revision: 1.2; previous revision: 1.1

done

~/myproject>

Updating your files often helps keep your work coordinated with other developers and avoids conflicts. Conflicts occur when two developers disagree on a particular part of the source code. For example, consider the following sequence of events. In the beginning state, a line in the source code states $a=3. Later, another developer changes the line to $a=5 and commits the file. This sets the official version of the line. If you issue an update before changing this line, you will receive the change with no conflicts, and you can change it yourself. However, if you change the line before issuing an update, you will encounter a conflict. CVS marks the conflicting sections of code and inserts both versions in the source code. To resolve the conflict, you must edit the file and choose one version or the other.

Regularly updating files helps avoid conflicts. It also alerts you to changes in files. As you issue an update, CVS notifies you of which files have changed since your last update. You can also configure CVS to email changes to a mailing list. If all developers subscribe to the mailing list, they can monitor activity on the project. This isn't a substitute for proper communication among team members, but it reduces the need to consult constantly with each other about who's editing which file.

When you're ready to make the project live, you have two options. If releases are infrequent, you may wish to make an export of the project and replace existing files on the production server. Use the export command to make a checkout that contains no CVS directories.

For a site that gets frequently updated, I prefer making an ordinary CVS checkout on the production server. When making a new version of the site live, you need to log in to the production server and issue an update command. This is faster and less hassle than replacing all existing files. It also avoids those errors associated with missing files or incorrect paths.

27.4 Modularization Using `include`

Despite its name, the include function is not equivalent to C's preprocessor command of the same name. In many ways it is like a function call. Given the name of a file, PHP attempts to parse the file as if it appeared in place of the call to include. The difference from a function is that the code will be parsed only if the include statement is executed. You can take advantage of this by wrapping calls to include in if statements. The require function, however, will always include the specified file, even if it is inside an if block that is never executed. It has been discussed several times on the PHP mailing list that require is faster than include because PHP is able to inject the specified file into the script during an early pass across the code. However, this applies only to files specified by a static path. If the call to require contains a variable, it can't be executed until runtime. It may be helpful to adopt a rule of using require only when outside a compound statement and when specifying a static path.

Almost anything I write in PHP uses include extensively. The first reason is that it makes the code more readable. The other reason is that it breaks the site into modules. This allows multiple people to work on the site at once. It forces you to write code that is more easily reused within the existing site and on your next project. Most Web sites have to rely on repeating elements. Consistent navigation aids the user, but it is also a major problem when building and maintaining the site. Each page has to have a similar code block pasted into it. Making this a module and including it allows you to debug the code once, making changes quickly.

You can adopt a strategy that consists of placing functions into include modules. As each script requires a particular function, you can simply add an include. If your library of functions is small enough, you might place them all into one file. However, you likely will have pieces of code that are needed on just a handful of pages. In this case, you'll want this module to stand alone.

As your library of functions grows, you may discover some interdependencies. Imagine a module for establishing a connection to a database, plus a couple of other modules that rely on the database connection. Each of these two scripts will include the database connection module. But what happens when both are themselves included in a script? The database module is included twice. This may cause a second connection to be made to the database, and if any functions are defined, PHP will report the error of a duplicate function.

In C programmers avoid this situation by defining constants inside the included files. In PHP you can use the include_once statement. A function named printBold is defined in Listing 27.1. This function is needed in the script shown in Listing 27.2. I've purposely placed a bug in the form of a second include. The second time the module is included, it will return before redeclaring the function.

Listing 27.1 Preventing a double `include`

<?php

    function printBold($text)

        print("<b>$text</b>");

?>

Listing 27.2 Attempting to include a module twice

<?php

    //load printBold function

    include_once("27-1.php");

    //try loading printBold function again

    include_once("27-1.php");

    printBold("Successfully avoided a second include");

27.5 FreeEnergy

I used the technique of including modules on several Web applications, and it led me to consider all the discrete elements of a Web page. Headers and footers are obvious, and so are other repeating navigational elements. Sometimes you can divide pages up into the content unique to the page, the stuff that comes before it, and the stuff that comes after it. This could be hard to maintain, however. Some of the HTML is in one file, some in another. If nothing else, you'll need to flip between two editor windows.

Consider for a moment a Web page as an object—that is, in an object-oriented way. On the surface, a Web page is a pair of html tags containing head tags and body tags. Regardless of the design or content of the page, these tags must exist, and inside them will be placed further tags. Inside the body tags a table can be placed for controlling the layout of the page. Inside the cells of the table are either links to other pages on the site or some content unique to the page.

FreeEnergy is a system that attempts to encapsulate major pieces of each page into files to be included on demand. Before I proceed, I want to state my motivations clearly. My first concern when developing a Web site is that it be correct and of the highest quality. Second is that it may be developed and maintained in minimal time. After these needs are addressed, I consider performance. Performance is considered last because of the relatively cheap cost of faster hardware. Moore's law suggests that eighteen months from now, CPU speed and memory capacity will have doubled for the same price. This doubling costs nothing but time. Also, experience has shown that a small minority of code contributes to a majority of the time spent processing. These small sections can be optimized later, leaving the rest of the code to be written as clearly as possible.

The FreeEnergy system uses more calls to include than you would find if you simply make a few includes at the top of your pages. Hits to the file system do take longer than function calls, of course. You could place everything you might need in one large file and include it on every page, but you will face digging through that large file when you need to change anything. A trade has been made between the performance of the application and the time it takes to develop and maintain it.

I called this system FreeEnergy because it seems to draw power from the environment that PHP provides. The include function in PHP is quite unique and central to FreeEnergy, especially the allowance for naming a script with a variable. The content unique to a page is called a screen. The screen name is passed to a single PHP script, which references the screen name in a large array that matches the screen to corresponding layout and navigation modules.

The FreeEnergy system breaks Web pages into five modules: action, layout, navigation, screen, and utility. Action modules perform some sort of write function to a database, a file, or possibly to the network. Only one action module executes during a request, and it is executed before the screen module. An action module may override the screen module named in the request. This is helpful in cases where an action module is attempting to process a form and the submitted data are incomplete or otherwise unsatisfactory. Action modules never send data directly to the screen. Instead, they add messages to a stack to be popped later by the layout module. It is possible that an action module will send header information, so it's important that no output be produced.

Layout modules contain just enough code to arrange the output of screen and navigation modules. They typically contain table tags for controlling the layout of a Web page. Inside the table cells, calls to include are placed. They may be invoking navigation modules or screen modules.

Navigation modules contain links and repeating elements. In the vernacular used by engineers I work with, these are "top nav," "bottom nav," and "side nav." Consider the popular site, Yahoo!. Its pages generally consist of the same navigation across the top and some at the bottom. Its top nav includes the logo and links to important areas of the site. If the Yahoo! site were coded in FreeEnergy, there would probably be a dynamic navigation module for generating breadcrumbs for the current section, such as Home > Computers and Internet > Software > Internet > World Wide Web > Servers > Server Side Scripting > PHP.

Screen modules contain the content unique to the particular page being displayed. They may be plain HTML, or they may be primarily PHP code, depending on context. A press release is static. Someone unfamiliar with PHP can prepare it. He needs only know that the screen module is an HTML fragment.

Any module may rely on a utility module in much the same way utility files are used in other contexts. Some utility modules are called with each page load. Others are collections of functions or objects related to a particular database table.

All modules are collected in a modules directory that further contains a subdirectory for each module type. To enhance security, it is placed outside of the Web server's document root. Within the document root is a single PHP script index.php. This script begins the process of calling successive modules and structuring their output with the standard HTML tags.

27.6 Templates

Another approach to modularizing PHP applications can be called templatizing. Loose coupling is a fundamental principle of good system design. Aside from avoiding confusing people who don't understand PHP, a separation offers the benefit of switching to a different presentation language, such as XML, without disturbing the business logic.

Using templates, interface designers insert simple tags into prototypical files (templates) composed mostly of HTML. They insert short bits of a simple templating language that a PHP script parses in order to replace markers with generated information.

As with most solutions, there's a tradeoff. The cost of a templating system is increased work for PHP with each page load. PHP includes an efficient parser written in C by the geniuses at Zend. Writing your own parser in PHP itself is bound to be less than optimal, or so the argument goes. Yet, a simple syntax can help keep parsing fast, and some caching tricks can avoid most of the heavy lifting.

I'm optimistic about the average person being able to learn to program in PHP. Templating pessimistically guesses that the average person won't learn PHP but can understand a simpler middle ground between PHP and HTML. I like to teach people to understand PHP, but I also understand there's usually a context for a good tool.

FastTemplate is perhaps the oldest of the templating systems. It was ported from the original Perl implementation. It uses .tpl files to hold templates. These templates contain HTML and markers inside curly braces. A PHP script loads a template, sets values for each of the markers, and parses the template to produce a final chunk of HTML ready to send to the browser.

Listing 27.3 Main template

<html>

<head><title>{TITLE}</title>

</head>

<body>

<h1>{TITLE}</h1>

<table>

<tr>

<td valign="top">{SIDENAV}</td>

<td valign="top">{MAIN}</td>

</tr>

</table>

</body>

</html>

Listing 27.3 shows a simple template. Look for the markers in curly braces. This template uses three: TITLE, SIDENAV, and MAIN. These are chunks of content generated inside the main PHP script. The first is a simple variable assignment, and the second will contain another template file. The last is a standard name used by FastTemplate to stand for the main content of any screen. Listing 27.4, Listing 27.5, and Listing 27.6 are a few other templates used in this example.

Listing 27.4 Side navigation

<a href="home.php">Home</a><br>

<a href="about.php">About Us</a><br>

<a href="contact.php">Contact Us</a><br>

Listing 27.5 Table template

<table border="1">

<tr><th>n</th> <th>n^2</th> <th>n^3</th></tr>

{ROWS}

</table>

Listing 27.6 Row template

<tr>

    <td>{NUMBER}</td> <td>{SQUARE}</td>  <td>{CUBE}</td>

</tr>

The side navigation is a simple set of links to other scripts, as you might expect. The table includes three columns for a number, its square, and its cube. A template stored in row.tpl further defines the rows of the table. The PHP script in Listing 27.7 calls this template for each row of the table.

Listing 27.7 Script using templates

<?php

    //get FastTemplate class

    require_once("class.FastTemplate.php");

    //instantiate

    //use templates in current directory

    $tpl = new FastTemplate(".");

    //set list of templates used

    $tpl->define(

        array(

            "main"=>"27-3.tpl",

            "side"=>"27-4.tpl",

            "table"=>"27-5.tpl",

            "row"=>"27-6.tpl"

);

    //set the value of the TITLE variable

    $tpl->assign(array("TITLE"=>"FastTemplate Test"));

    //get side navigation

    $tpl->parse("SIDENAV", "side");

    //create rows for the table

    for($n=1; $n <= 10; $n++)

        //set values

        $tpl->assign(

            array(

                "NUMBER"=>$n,

                "SQUARE"=>pow($n,2),

                "CUBE"=>pow($n,3)

);

        //parse row template and append it to ROWS

        $tpl->parse("ROWS",".row");

    //parse table, main and put it in MAIN

    $tpl->parse("MAIN", array("table","main"));

    //send entire contents to the browser

    $tpl->FastPrint("MAIN");

?>

Most of the code in this example ought to be easy to follow. The template files need to be in a subdirectory, as shown in the instantiation. The assign method sets one or more variables to a fixed value, and the parse method parses a template. You must define marker values before parsing a template, of course.

This example produces a table of numbers generated in a loop. Each row of the table is appended to the ROWS variable by assigning variable values and parsing the template. Note that the call to parse uses a period before the name of the template, row. This tells FastTemplate to append instead of replace.

FastTemplate also uses another syntax for repeating blocks. You mark part of the HTML with HTML comments that must follow a strict form. There's no room for adding extra spacing or breaking the comment onto two lines. These are called dynamic blocks, and they are really embedded templates.

PHPLib is a large framework for building Web applications. It includes a class that uses templates very similar to those used by FastTemplate. You must download the entire package to get the template class, but it's usable by itself.

Like FastTemplate, PHPLib's template class uses curly braces for markers. It also supports repeating blocks using HTML comment syntax. Other than the differences in method names, this class works like FastTemplate.

Two other similar solutions are AvantTemplate and TemplatePower. These classes use the same approach to templating defined by FastTemplate: markers that stand for replaceable values. They also add support for including templates directly instead of using a marker.

Choosing between these templating systems is largely one of personal preference. You might prefer the syntax of one of them over others. TemplatePower claims to be faster than FastTemplate by six times. Naturally, if you use PHPLib, its included templating class is your best choice.

The consequence of the extra layer keeping HTML and PHP logic separate is a hit to performance. Every page load requires parsing templates and filling in values for markers. It can have a significant effect on the time it takes to assemble a page. Some data must be regenerated with each request, such as the contents of a shopping basket, but most information on a Web site is static. We can save a lot of work if we cache the parsed templates.

In computer terms, a cache is temporary, fast storage. Space in the cache is limited, and data placed there is volatile. Caches rely on the idea that a request for data now predicts another request for the same data in the near future. If an application behaves this way and the cache is sufficiently large, you will experience a performance increase by using a cache.

The cachedFastTemplate class adds caching to the original PHP FastTemplate implementation. Two new methods allow reading from and writing to text files stored in /tmp. The write_cache method stores fully parsed templates in a directory named after the Web server's host name. The is_cached method will load the contents from the directory if the template was cached previously.

The appeal of this class is that it's a drop-in replacement for the original class. You don't need to update your templates. Changes to your PHP scripts are minor, and they will continue to function even without modification. They just won't cache.

There are a few other templating systems that use caching, but Smarty is an industrial-strength solution. First, Smarty compiles templates into native PHP. The template file edited by interface designers is parsed only once. Calls to templates cause the PHP engine to run a .php file. This eliminates the overhead of running a parser written in PHP.

Compilation of scripts occurs behind the scenes, with no commands in your script. If a page request calls for a template that hasn't been compiled yet, Smarty compiles it. If the template file changes after this compilation, Smarty will recompile the next time your script uses the template.

Additionally, Smarty includes caching functionality, increasing the performance for static pages. For those pages with static content, Smarty will process the template into a plaintext file. As with other caching implementations, you can set an expiration time, after which the file will be regenerated.

Smarty's templating system includes more than just marker replacement. It also includes sophisticated control flow, such as if-else statements. This allows interface designers to make simple logical decisions without bothering programmers. The system also includes loops and a function for including other templates in place.

Templating systems are clearly a satisfying solution for some people; otherwise, they wouldn't be so popular. FastTemplate is simple, and I'm sure anyone comfortable with HTML can handle working around the markers. The complex solutions, such as Smarty, may be nearly as intimidating as PHP itself. This is not to suggest that Smarty has no value. Its approach certainly will be attractive to many programmers, and careful communication with novices can help keep them away from the more complex syntax.

Most of these templating systems use {name} as a marker for some value to be placed later by a PHP script. It's only slightly more complicated to write <?php=$name?>. The biggest disadvantage to using PHP tags is that they don't show up visually in browsers, which treat them as unrecognized tags.

27.7 Application Frameworks

Taking application development to the next logical level, application frameworks attempt to organize reusable components to a ready platform for application development. The bargain made with these tools is trading some flexibility and performance for a large library of ready-made components. This can lead to rapid development.

BinaryCloud <http://www.binarycloud.com/> is a complete application- hosting environment written in PHP, meant for building enterprise-level applications. Alex Black and his company, Turing Studio, lead the maintenance of BinaryCloud. BinaryCloud compiles its own source files into PHP scripts. It uses the Smarty template engine discussed earlier in the chapter. The source code is freely available under a GNU license.

Another approach to Web site design with PHP is the Midgard project <http://www.midgard-project.org/>. The maintainers are Jukka Zitting and Henri Bergius. Rather than code a solution in PHP alone, they have pursued integrating PHP into their own application server. Midgard is capable of organizing more than 800,000 pages of content using a Web-based interface. For this reason it is ideal for operating Magazine sites.

Midgard is an open-source project, of course. You can download an official release or grab a snapshot through CVS. Binary downloads are available as well.

Ariadne is a Web application framework from Muze, a development agency in the Netherlands. It's available under the GNU Public License. Auke van Slooten leads the project. The source code can be downloaded from the Muze site <http://www.muze.nl/software/ariadne/>.

Ariadne stores PHP source code as objects in a MySQL database. These objects interact with each other using a virtual file system. A rich user interface is presented to the user through Web pages, but advanced users may dig deeper, as well. Another major component controls access rights for users or groups.

Horde <http://www.horde.org/> is the application framework used for IMP, a popular email client written in PHP. Chuck Hagenbuch started the Horde Project. Currently, Eric Rostetter maintains the project, which is available under a GNU license. The framework evolved from the backend of the original IMP application, and its heritage shows in its ability to build quality Web applications for communicating with Internet servers.

27.8 PEAR

PEAR <http://pear.php.net/> is the PHP Extension and Application Repository. It's part of the PHP project, and you get a copy of the core PEAR library when you install PHP. In some ways, PEAR is a parallel to Perl's CPAN. It collects many general-purpose PHP scripts into a cohesive library. You can fetch components as you need them using part of PEAR itself. Stig Bakken, a longtime PHP contributor, leads the PEAR project.

A core set of PEAR classes comes along with PHP. Although some packages have a narrow purpose, PEAR as a whole is general purpose. Downloading a PEAR package is easy. The PHP distribution includes a shell script named pear. Running pear without any arguments lists available commands. To get a list of packages available for installation, run pear remote-list. To install a package, execute something like pear install XML_Tree. The script downloads and installs the package.

Using a PEAR class is easy too. PHP keeps the downloaded PEAR classes in /usr/local/lib/php by default. This path should be in your include path, which means you can include a PEAR class simply by naming it. For example, require_once('XML/Tree.php') gets the XML Tree class. Listing 27.8 demonstrates the use of XML_Tree, which allows the creation of an XML document without having the DOMXML extension available.

Listing 27.8 Using a PEAR class

<?php

//load XML_Tree

require_once('XML/Tree.php');

//create a document

$tree = new XML_Tree;

$root =& $tree->addRoot('catalog');

$section =& $root->addChild('section');

$section->addChild('A');

$section->addChild('B');

$section->addChild('C');

$section =& $root->addChild('section');

$section->addChild('X');

$section->addChild('Y');

$section->addChild('Z');

//dump XML document

header('Content-Type: text/xml');

$tree->dump();

27.9 URLs Friendly to Search Engines

Search engines such as Google <http://www.google.com/> and All the Web <http://www.alltheweb.com/> attempt to explore the entire Web. They have become an essential resource for Internet users, and anyone who maintains a public site benefits from being listed. Search engines use robots, or spiders, to explore pages in a Web site, and they index PHP scripts the same way they index HTML files. When links appear in a page, they are followed. Consequently, the entire site becomes searchable.

Unfortunately, many robots do not follow links that appear to contain form variables. Links containing question marks may lead a robot into an endless loop, so they are programmed to avoid them. This presents a problem for sites that use form variables in links. Passing form variables in anchor tags is a natural way for PHP to communicate, but it can keep your pages out of the search engines. To overcome this problem, data must be passed in a format that resembles ordinary URLs.

First, consider how a Web server accepts a URI and matches it to a file. The URI is a virtual path, the part of the URL that comes after the hostname. It begins with a slash and may be followed by a directory, another slash, and so forth. One by one, the Web server matches directories in the URI to directories in the file system. A script is executed when it matches part of the URI, even when more path information follows. Ordinarily, this extra path information is thrown away, but you can capture it.

Look at Listing 27.9. This script works with Apache compiled for UNIX but may not work with other Web servers. It relies on the PATH_INFO environment variable, which may not be present in a different context. Each Web server creates a unique set of environment variables, although there is overlap.

Listing 27.9 Using path info

<?php

if(isset($_SERVER['PATH_INFO']))

{

//remove .html from the end

$path = str_replace(".html",

"", $_SERVER['PATH_INFO']);

//remove leading slash

$path = substr($path, 1);

//iterate over parts

$pathVar = array();

$v = explode("/", $path);

$c = count($v);

for($i=0; $i<$c; $i += 2)

{

$pathVar[($v[$i])] = $v[$i+1];

}

print("You are viewing message " .

"{$pathVar['message']}<br>\n");

}

//pick a random ID

$nextID = rand(1, 1000);

print("<a href=\"{$_SERVER["SCRIPT_NAME"]}/message/

$nextID.html\">" .

"View Message $nextID</a><br>\n");

You may be accessing the code in Listing 27.9 from the URL http://localhost/corephp/27-9.php/message/1234.html. In this case, you are connecting to a local server that contains a directory named corephp in its document root. A default installation of Apache might place this in /usr/local/apcache/htdocs. The name of the script is 27-9.php, and everything after the script name is then placed in the PATH_INFO variable. No file named 1234.html exists, but to the Web browser it appears to be an ordinary HTML document. It appears that way to a spider as well.

The code in Listing 27.9 doesn't really do much. It splits the path info into pairs used for variable name and value. The script pretends message is an identifier. It could be referencing a record in a relational database. I've added some code to use a random number to create a link to another imaginary record. Remember the BBS from Chapter 23? This method could be applied, and each message would appear to be a single HTML file.

I've introduced only the essential principles of this method. There are a few pitfalls, and there are a few enhancements to be pursued. Keep in mind that Web browsers do their best to fill in relative URLs, and using path information this way may foil their attempts to request images that appear in your scripts. Therefore, you must use absolute paths. You might also wish to name your PHP script so that it doesn't contain an extension. This is possible with Apache by setting the default document type, using the DefaultType configuration directive. You can also use Apache's mod_rewrite. I encourage you to read about these parts of Apache at its home site <http://www.apache.org/docs/>.

PHP Writing Requirements Specifications, PHP Writing Design Documents, PHP Change Management, PHP Modularization, PHP Using include, PHP Free Energy, PHP Templates, PHP Application Frameworks, PHP PEAR, PHP URLs Friendly to Search Engines, PHP Design, Software Engineering Model, PHP

By PHP with No comments

Way2discuss - Learn about PHP Opensource

Monday, April 11, 2011