In This Chapter:
Server Side Includes are directives which you can place into your HTML documents to execute other programs or output such data as environment variables and file statistics. Unfortunately, not all servers support these directives; the CERN server cannot handle SSI, but the servers from NCSA and Netscape can. However, there is a CGI program called fakessi.pl that you can use to emulate Server Side Includes if your server does not support them.
While Server Side Includes technically are not really CGI, they can become an important tool for incorporating CGI-like information, as well as output from CGI programs, into documents on the Web.
How do Server Side Includes work? When the client requests a document from the SSI-enabled server, the server parses the specified document and returns the evaluated document (see Figure 5-1). The server does not automatically parse all files looking for SSI directives, but only ones that are configured as such. We will look at how to configure documents in the next section.
Figure 5-1:
SSI sounds like a great feature, but it does have its disadvantages. First, it can be quite costly for a server to continually parse documents before sending them to the client. And second, enabling SSI creates a security risk. Novice users could possibly embed directives to execute system commands that output confidential information. Despite these shortcomings, SSI can be a very powerful tool if used cautiously.
Table 5-1 lists all the SSI directives. In this chapter, I'll discuss each of these directives in detail.
Command Parameter Description echo var Inserts value of special SSI variables as well as other environment variables include -- Inserts text of document into current file file -- Pathname relative to current directory virtual -- Virtual path to a document on the server fsize file Inserts the size of a specified file flastmod file Inserts the last modification date and time for a specified file exec -- Executes external programs and inserts output in current document cmd -- Any application on the host cgi -- CGI program config -- Modifies various aspects of SSI errmsg -- Default error message sizefmt -- Format for size of the file timefmt -- Format for dates
The first thing you need to set is the extension(s) for the files that the server should parse in the server configuration file (srm.conf). For example, the following line will force the server to parse all files that end in .shtml:
AddType text/x-server-parsed-html .shtml
Internally, the server uses the text/x-server-parsed-html MIME content type to identify parsed documents. An important thing to note here is that you cannot have SSI directives within your CGI program, because the server does not parse the output generated by the program.
Alternatively, you can set the configuration so that the server parses all HTML documents:
AddType text/x-server-parsed-html .html
However, this is not a good idea! It will severely degrade system performance because the server has to parse all the HTML documents that it returns.
Now let's look at the two configuration options that you must set in the access configuration file (access.conf) that dictate what type of SSI directives you can place in your HTML document:
Options Includes ExecCGITo exclusively enable Includes without Exec, you need to add the following:
Options IncludesNoExecBefore enabling either of these features, you should think about system security and performance.
For example, on the CERN server, all you need to do is:
You can get fakessi.pl from http://sw.cse.bris.ac.uk/WebTools/fakessi.html.
<HTML>
<HEAD><TITLE>Welcome!</TITLE></HEAD>
<BODY>
<H1>Welcome to my server at <!--#echo var="SERVER_NAME"-->...</H1>
<HR>
Dear user from <!--#echo var="REMOTE_HOST"-->,
<P>
There are many links to various CGI documents throughout the Web, so feel free to explore.
.
.
.
<HR>
<ADDRESS>Shishir Gundavaram (<!--#echo var="DATE_LOCAL"-->)</ADDRESS>
</BODY></HTML>
SSI directives have the following format:
<!--#command parameter="argument"-->In this example, the echo SSI command with the var parameter is used to display the IP name or address of the serving machine, the remote host name, and the local time. Of course, we could have written a CGI program to perform the same function, but this approach is much quicker and easier, as you can see.
All environment variables that are available to CGI programs are also available to SSI directives. There are also a few variables that are exclusively available for use in SSI directives, such as DATE_LOCAL, which contains the current local time. Another is DATE_GMT:
The current GMT time is: <!--#echo var="DATE_GMT"-->
which contains the Greenwich Mean Time.
Here is another example that uses some of these exclusive SSI environment variables to output information about the current document:
<H2>File Summary</H2>
<HR>
The document you are viewing is titled: <!--#echo var="DOCUMENT_NAME"- ->,
and you can access it a later time by opening the URL to:
<!--#echo var="DOCUMENT_URI"-->. Please add this to your bookmark list.
<HR>
Document last modified on <!--#echo var="LAST_MODIFIED"-->.
This will display the name, URL (although the variable is titled DOCUMENT_URI), and modification time for the current HTML document.
For a listing of CGI environment variables, see Table 2-1. Table 5-2 shows additional SSI environment variables.
Environment Variable Description DOCUMENT_NAME The current file DOCUMENT_URI Virtual path to the file QUERY_STRING_UNESCAPED Undecoded query string with all shell metacharacters escaped with "\" DATE_LOCAL Current date and time in the local time zone DATE_GMT Current date and time in GMT LAST_MODIFIED Last modification date and time for current file
<HR>
<ADDRESS>
<PRE>
Shishir Gundavaram WWW Software, Inc.
White Street 90 Sherman Street
Boston, Massachusetts 02115 Cambridge, Massachusetts 02140
shishir@bu.edu
The address information was last modified Friday, 22-Dec-95 12:43:00 EST.
</PRE>
</ADDRESS>
You can include the contents of this file in any other HTML document with the following command:
<!--#include file="address.html"-->This will include address.html located in the current directory into another document. You can also use the virtual parameter with the include command to insert a file from a directory relative to the server root:
<!--#include virtual="/public/address.html"-->For our final example, let's include a boilerplate file that contains embedded SSI directives. Here is the address file (address.shtml) with an embedded echo command (note the .shtml extension):
<HR>
<ADDRESS>
<PRE>
Shishir Gundavaram WWW Software, Inc.
White Street 90 Sherman Street
Boston, Massachusetts 02115 Cambridge, Massachusetts 02140
shishir@bu.edu
The address information was last modified on <!--#echo var="LAST_ MODIFIED"-->.
</PRE>
</ADDRESS>
When you include this address file into an HTML document, it will contain your signature along with the date the file was last modified.
Here is the latest reference guide on CGI. You can download it by clicking
<A HREF="/cgi-refguide.ps">here</A>. The size of the file is
<!--#fsize file="/cgi-refguide.ps"--> bytes and was last modified
on <!--#flastmod file="/cgi-refguide.ps"-->.
The fsize command, along with its lone parameter, file, displays the size of the specified file (relative to the document root) in bytes. You can use the flastmod command to insert the modification date for a certain file. The difference between the SSI variable LAST_MODIFIED and this command is that flastmod allows you to choose any file, while LAST_MODIFIED displays the information for the current file. You have the option of tailoring the output from these commands with the config command. We will look at this later in the chapter.
Welcome <!--#echo var="REMOTE_USER"-->. Here is some information about you:
<PRE>
<!--#exec cmd="/usr/ucb/finger $REMOTE_USER@$REMOTE_HOST"-->
</PRE>
In this example, we use the UNIX finger command to retrieve some information about the user. SSI allows us to pass command-line arguments to the external programs. If you plan to use environment variables as part of an argument, you have to precede them with a dollar sign. The reason for this is that the server spawns a shell to execute the command, and that's how you would access the environment variables if you were programming in a shell. Here is what the output will look like, assuming REMOTE_USER and REMOTE_HOST are "shishir" and "bu.edu", respectively:
Welcome shishir. Here is some information about you:
<PRE>
[bu.edu]
Trying 128.197.154.10...
Login name: shishir In real life: Shishir Gundavaram
Directory: /usr3/shishir Shell: /usr/local/bin/tcsh
Last login Thu Jun 23 08:18 on ttyq1 from nmrc.bu.edu:0.
New mail received Fri Dec 22 01:51:00 1995;
unread since Thu Dec 21 17:38:02 1995
Plan:
Common, aren't you done with the book yet?
</PRE>
You should enclose the output from an external command in a <PRE>..</PRE> block, so that whitespace is preserved. Also, if there is any HTML code within the data output by the external program, the browser will interpret it!
(To use the exec directive, remember that you need to enable Exec in the Options line of the access.conf file, as described in the "Configuration" section earlier in this chapter.)
Having the ability to execute external programs makes things easier, but it also poses a major security risk. Say you have a "guestbook" (a CGI application that allows visitors to leave messages for everyone to see) on a server that has SSI enabled. Most such guestbooks around the Net actually allow visitors to enter HTML code as part of their comments. Now, what happens if a malicious visitor decides to do some damage by entering the following:
<--#exec cmd="/bin/rm -fr /"-->If the guestbook CGI program was designed carefully, to strip SSI commands from the input, then there is no problem. But, if it was not, there exists the potential for a major headache!
This page has been accessed <!--#exec cgi="/cgi-bin/counter.pl"--> times.The idea behind an access counter is simple. A data file on the server contains a count of the number of visitors that have accessed a particular document. Whenever a user visits the document, the SSI command in that document calls a CGI program that reads the numerical value stored in the file, increments it, and writes the new information back to the file and outputs it. Let's look at the program:
#!/usr/local/bin/perl print "Content-type: text/plain", "\n\n"; $count_file = "/usr/local/bin/httpd_1.4.2/count.txt"; if (open (FILE, "<" . $count_file)) { $no_accesses = <FILE>; close (FILE); if (open (FILE, ">" . $count_file)) { $no_accesses++; print FILE $no_accesses; close (FILE); print $no_accesses; } else { print "[ Can't write to the data file! Counter not incremented! ]", "\n"; } } else { print "[ Sorry! Can't read from the counter data file ]", "\n"; } exit (0);Since we are opening the data file from this program, we need the full path to the file. We can then proceed to try to read from the file. If the file cannot be opened, an error message is returned. Otherwise, we read one line from the file using the <FILE> notation, and store it in the variable $no_accesses. Then, the file is closed. This is very important because you cannot write to the file that was opened for reading.
Once that's done, the file is opened again, but this time in write mode, which creates a new file with no data. If that's not successful, probably due to permission problems, an error message stating that information cannot be written to the file is output. If there are no problems, we increment the value stored in $no_ accesses. This new value is written to the file and printed to standard output.
Notice how this program, like other CGI programs we've covered up to this point, also outputs a Content-type HTTP header. In this case, a text/plain MIME content type is output by the program.
An important thing to note is that a CGI program called by an SSI directive cannot output anything other than text because this data is embedded within an HTML or plain document that invoked the directive. As a result, it doesn't matter whether you output a content type of text/plain or text/html, as the browser will interpret the data within the scope of the calling document. Needless to say, your CGI program cannot output graphic images or other binary data.
This CGI program is not as sophisticated as it should be. First, if the file does not exist, you will get an error if you open it in read mode. So, you must put some initial value in the file manually, and set permissions on the file so that the CGI program can write to it:
% echo "0" > /usr/local/bin/httpd_1.4.2/count.txt % chmod 777 /usr/local/bin/httpd_1.4.2/count.txtThese shell commands write an initial value of "0" to the count.txt file, and set the permissions so that all processes can read, write, and execute the file. Remember, the HTTP server is usually run by a process with minimal privileges (e.g., "nobody" or "www"), so the permissions on the data file have to be set so that this process can read and write to it.
The other major problem with this CGI program is that it does not lock and unlock the counter data file. This is extremely important when you are dealing with concurrent users accessing your document at the same time. A good CGI program must try to lock a data file when in use, and unlock it after it is done with processing. A more advanced CGI program that outputs a graphic counter is presented in Chapter 6, Hypermedia Documents.
Why do you want to do this? Well, for kicks. Also, if the sites are actually mirrors of each other, so it doesn't matter which one you refer people to. By changing the link each time, you're helping to spread out the traffic generated from your site.
Place the following line in your HTML document:
<!--#exec cgi="/cgi-bin/random.pl"-->Here's the program:
#!/usr/local/bin/perl @URL = ("http://www.ora.com", "http://www.digital.com", "http://www.ibm.com", "http://www.radius.com"); srand (time | $$);The @URL array (or table) contains a list of the sites that the program will choose from. The srand function sets a seed based on the current time and the process identification for the random number generator. This ensures a truly random distribution.
$number_of_URL = $#URL; $random = int (rand ($number_of_URL));The $number_of_URL contains the index (or position) of the last URL in the array. In Perl, arrays are zero-based, meaning that the first element has an index of zero. We then use the rand function to get a random number from 0 to the index number of the last URL in the array. In this case, the variable $random will contain a random integer from 0 to 3.
$random_URL = $URL[$random]; print "Content-type: text/html", "\n\n"; print qq|<A HREF="$random_URL">Click here for a random Web site!</A>|, "\n"; exit (0);A random URL is retrieved from the array and displayed as a hypertext link. Users can simply click on the link to travel to a random location.
Before we finish, let's look at one final example: a CGI program that calculates the number of days until a certain event.
<!--#exec cgi="/cgi-bin/count_days.pl?4/1/96"-->The server will return an error.
However, we can create a regular Perl program (not a CGI program) that takes a date as an argument, and calculates the number of days until/since that date:
<!--#exec cmd="/usr/local/bin/httpd_1.4.2/count_days.pl 4/1/96"-->In the Perl script, we can access this command-line data (i.e., "4/1/96") through the @ARGV array. Now, the script:
#!/usr/local/bin/perl require "timelocal.pl"; require "bigint.pl";The require command makes the functions within these two default Perl libraries available to our program.
($chosen_date = $ARGV[0]) =~ s/\s*//g;The variable $chosen_date contains the date passed to this program, minus any whitespace that may have been inserted accidently.
if ($chosen_date =~ m|^(\d+)/(\d+)/(\d+)$|) { ($month, $day, $year) = ($1, $2, $3);This is another example of a regular expression, or regexp. We use the regexp to make sure that the date passed to the program is in a valid format (i.e., mm/dd/yyyy). If it is valid, then $month, $day, and $year will contain the separated month, day, and year from the initial date.
$month -= 1; if ($year > 1900) { $year -= 1900; } $chosen_secs = &timelocal (undef, undef, undef, $day, $month, $year);We will use the timelocal subroutine (notice the & in front) to convert the specified date to the number of seconds since 1970. This subroutine expects month numbers to be in the range of 0-11 and years to be from 00-99. This conversion makes it easy for us to subtract dates. An important thing to remember is that this program will not calculate dates correctly if you pass in a date before 1970.
$seconds_in_day = 60 * 60 * 24; $difference = &bsub ($chosen_secs, time); $no_days = &bdiv ($difference, $seconds_in_day); $no_days =~ s/^(\+|-)//;The bsub subroutine subtracts the current time (in seconds since 1970) from the specified time. We used this subroutine because we are dealing with very large numbers, and a regular subtraction will give incorrect results. Then, we call the bdiv subroutine to calculate the number of days until/since the specified date by dividing the previously calculated difference with the number of seconds in a day. The bdiv subroutine prefixes the values with either a "+" or a "-" to indicate positive or negative values, respectively, so we remove the extra character.
print $no_days; exit(0);Once we're done with the calculations, we output the calculated value and exit.
} else { print " [Error in date format] "; exit(1); }If the date is not in a valid format, an error message is returned.
[an error occurred while processing this directive]By using the config command, you can modify the default error message. If you want to set the message to "Error, contact shishir@bu.edu" you can use the following:
<!--#config errmsg="Error, contact shishir@bu.edu"-->You can also set the file size format that the server uses when displaying information with the fsize command. For example, this command:
<!--#config sizefmt="abbrev"-->will force the server to display the file size rounded to the nearest kilobyte (K). You can use the argument "bytes" to set the display as a byte count:
<!--#config sizefmt="bytes"-->Here is how you can change the time format:
<!--#config timefmt="%D %r"-->The file address.html was last modified on: <!--#flastmod file="address.html"-->.
The output will look like this:
The file address.html was last modified on: 12/23/95 07:17:39 PMThe %D format specifies that the date should be in dd/mm/yy format, while the %r format specifies "hh/mm/yy AM|PM" format. Table 5-3 lists all the data and time formats you can use.
Table 5-3 -- SSI Time Formats
Format Value Example %a Day of the week abbreviation Sun %A Day of the week Sunday %b Month name abbreviation (see %h) Jan %B Month name January %d Date 1 (and not 01) %D Date as "%m/%d/%y" 06/23/95 %e Date 01 %H 24-hour clock hour 13 %I 12-hour clock hour 1 %j Decimal day of the year 360 %m Month number 11 %M Minutes 08 %p am | pm a.m. %r Time as "%I:%M:%S AM | PM" 07:17:39 PM %S Seconds 09 %T 24-hour time as "%H:%M:%S" 16:55:15 %U Week of the year (also %W) 49 %w Day of the week number 05 %y Year of the century 95 %Y Year 1995 %Z Time zone EST
<!--echo var="REMOTE_USER"-->Second, do not add extra spaces between the "-" sign and the "#" character:
<!-- #echo var="REMOTE_USER"-->If you make either of these two mistakes, the server will not give you an error; rather it will treat the whole expression as an HTML comment.
Copyright 1996, O'Reilly & Associates. All rights reserved.
O'Reilly Home |
Catalog & Orders |
Customer Service |
About O'Reilly
Contact Us |
Site Index |
Product Index |
Search the Catalog |