Exercise 8: Site Search Engine CGI Script
Resources for this exercise:
- See the Course Schedule for due date. Exercises are due before the beginning of class on the due date.
- See the Homework Frequently Asked Questions (FAQ) page for answers to questions about homework exercises.
- See the CGI Basics tutorial
- See the CGI section of the Resources page for general information on CGI scripts.
Windows users: In step 2.1 you will download a "tar" file (files compressed into a tar archive, not a zip archive). If you are doing this exercise in the Cabrillo lab, the Dell (Vista) machines will not automatically expand this file into the folder. You will need to do this exercise on one of the OmniPro (XP) machines in the lab, as these machines have the free program called "7-zip" that will expand a tar file.
In this exercise you will download, configure, install, and use a search engine CGI script. This script will allow others to search the pages of your Web site. Additionally, you will install a simple script to show you environment variables that are helpful in configuring CGI scripts such as search engines.
Step 1 - Install a script to show environment variables
1.1) Go to Clueless Lou's Web site on the basics of running CGI scripts and
download to a free, simple script called "envhtml.pl". It can be
found at:
http://www.visca.com/clueless/basics.html#env
1.2) If all goes correctly, the zipped file will automatically expand into a file called "envhtml.cgi" after the file downloads. If it doesn't, you may need to manually open the zipped file with your decompression software.
1.3) You will not need to make any changes to the script. (But, if you wish to look at it with a text editor, open the Perl script called "envhtml.cgi".)
1.4) Create a folder called "envhtml" inside your cgi-bin folder in your public_html folder on the server.
1.5) Upload "envhtml.cgi" into this envhtml folder.
1.6) Of course, you need to make the CGI script executable. This means you need to change its file permissions to what Unix calls a permissions set of "755." Change the permissions of the script, "envhtml.cgi". Do not change the permissions of anything else.
|
How to change file permissions using Fetch (Macintosh):
How to change file permissions using CoreFTP (Windows):
How to change file permissions using Cyberduck (Macintosh):
How to change file permissions using Fugu (Macintosh):
|
1.7) Point your browser at your "envhtml.cgi" page and test it. You should see a list of the current values for "envirionment variables." Some of this information is useful — even necessary — to know if you wish to get certain CGI scripts to run on the server. You may wish to print out this information; we will come back to this later.
Step 2 - Install a site search engine
2.1) Now let's download a free search engine from Extropia.com. Extropia offers over a dozen free CGI scripts available for download. These scripts were written by Selena Sol, and are among the most widely used free CGI scripts on the Web.
Go to:
http://www.extropia.com/applications/search.html
Click on the "Download Now!" link. This will take you to a "download license" page. After carefully reading this entire page (well, you may choose not to read it), click on the button at the bottom of this page to accept these terms. This takes you to the page with a "Download Now!" link; click on it to start the download of the Site Search file archive in "tar" format.
Windows users: you will download a "tar" file (files compressed into a tar archive, not a zip archive). If you are doing this exercise in the Cabrillo lab, the Dell (Vista) machines will not automatically expand this file into the folder. You will need to do this exercise on one of the OmniPro (XP) machines in the lab, as these machines have the free program called "7-zip" that will expand a tar file.
2.2) If all goes correctly, the tar file (called "search.distribution.tar") will automatically expand into a folder called "keyword_search" after the file downloads. If it doesn't, you may need to manually open the tar file with your decompression software.
2.3) You will now have a folder called "keyword_search" containing four text files, and a Documentation folder. Yes, normally you would want to read the README.INSTALLATION file that is within the Documentation folder, but if you do you will see that this file is not very helpful.
You will need to modify two of these files. The CGI script, as always, will need to have the proper "path to Perl."
The file with the ".pl" extension is a Perl script which contains variables which need to be defined in order to make the CGI script work. Files such as these are usually easily identified because they have the word "define," "settings," or "config" in their file name.
2.4) First let's edit the CGI script. With a text editor, open the CGI script called "search_engine.cgi".
2.5) Of course, modify the very first line of the script to be the correct "path to Perl" on our server.
Find the very first line of this script that looks like this:
#!/usr/local/bin/perl -T
and change this line to:
#!/usr/bin/perl -T
2.6) This script is written to only search files with names ending in ".html" or ".htm". To also search files whose names end in ".php", we need to make one more change to this script.
Find the line of this script (near the end of the file) that looks like this:
if (($filename =~ /htm.?/i) ||
and add a line under it so that there are now two lines:
if (($filename =~ /htm.?/i) ||
($filename =~ /php.?/i) ||
In other words, that section now looks like this:
while ($filename = readdir($dirhandle)) {
if (($filename =~ /htm.?/i) ||
($filename =~ /php.?/i) ||
(!($filename =~ /^\.\.?$/) &&
-d "$directory/$filename")) {
last;
2.7) Save these changes to "search_engine.cgi" — save this file as a text file, of course.
2.8) Now with a text editor, open the Perl script file called "search_define.pl". You will need to define two settings in this file, the "root_web_path" and the "server_url".
The root_web_path defines the path to the "directory tree" (the folder along with all the subfolders it contains) that you wish to make searchable. The "server_url" is the URL, or Web address, for this site.
There is an important difference between a URL and an actual path on a server. The URL is simply the address you type into the browser to get to a Web page. But the path seen in a URL is not usually the actual directory structure of the server. To see the real path, let's look at the environment variables shown by the "envhtml.cgi" page.
Find the variable called SCRIPT_FILENAME. This shows the actual path on the server to wherever you installed envhtnl.cgi. Note that this is different from the URL you would type into a browser to get to envhtml.cgi. You will use the first part of this path, but you will want to change the last part to point to the searchable directory.
2.9) Near the very beginning of "search_define.pl" you will see the two lines you need to change.
Find the line that reads:
$root_web_path = "/var/www/htdocs/";
and change it to reflect the path to the directory tree you wish to search.
Let's assume you wish to make your entire personal folder searchable.
Change this line to read:
$root_web_path = "/home/s78587/yourusername/public_html/";
Note that the tilde (~) does not appear in this path. Also note that this path ends in a slash.
2.10) The very next line also needs to be changed.
Find the line that reads:
$server_url = "http://you.yourdomain.com/directoryroot";
and change it to reflect the URL for your Web site. Change this line to read:
$server_url = "http://webhawks.org/~yourusername";
Note that the tilde (~) appears in this path. Also note that this path does not end in a slash.
2.11) Save these changes to "search_define.pl" — as text, of course.
2.12) Create a folder called "keyword_search" inside your cgi-bin folder in your public_html folder on the server.
2.13) Upload the following three files into this keyword_search folder.
- search_engine.cgi
- search_define.pl
- cgi-lib.pl
Now you should have a folder within your cgi-bin called "keyword_search", and this folder should contain the three files listed above.
2.14) Of course, you need to make the CGI script executable. This means you need to change its file permissions to what Unix calls a permissions set of "755." Change the permissions of the script, "search_engine.cgi". Do not change the permissions of anything else.
2.15) Point your browser at "search_engine.cgi" and test it. (This means that you enter the URL of the script into the browser and go directly to the script. You should be able to figure out the URL of the script, right?) Make sure you can get a list of hits, and also click on the hits that it finds, to see if the links in the hit list work properly. Yay! It worked the first time, right?
Troubleshooting hint: If you cannot get any hit results (if no matter what
you search for you see "Sorry, No Pages Were Found With Your Keyword"),
then the root_web_path is probably incorrect.
Step 3 - Incorporate the search box into your Web page
OK, now you wish to put a search box on your own Web page; how do you do it?
3.1) Create a new Web page for exercise 8.
3.2) This page must contain the HTML5 DocType, and the character encoding metat tag for UTF-8.
3.3) This page must contain links (either buttons or text links) to the HTML and CSS validators. See How to Put HTML and CSS Validate Buttons on Your Page for information on how I want you to to do this. (Use the block of code on this page to make the button links or text links, so that clicking on the links automatically lets you know if the page validates or not.)
3.4) This page must display your name and say "Exercise 8."
3.5) This page's <title> must indicate that this is your
exercise 8 page, so it should say something like "John Sky's DM160C Exercise
8".
3.6) This page must contain a relative link to your home page.
3.7) On this exercise 8 page, place a link to your "envhtml.cgi".
3.8) In the place where you want the search box to appear, place the following code on your Web page. Change the path to use you own folder name (although I use a full URL in this example, you could use a relative path to the script instead).
<form method="post" action="http://webhawks.org/~yourusername/cgi-bin/keyword_search/search_engine.cgi">
<div>
<strong>Enter your keywords:</strong>
<input type="text" size="30" name="keywords" maxlength="80">
</div>
<div>
<input type="checkbox" name="exact_match">Exact match
search
<input type="submit" value="Search">
</div>
</form>
3.9) Make sure the HTML on this page validates (using the XHTML 5 DocType) with no errors, according to the W3C's online HTML Validator. You can easily do this by clicking on your HTML validator button. For more information, see How to Use the W3C HTML and CSS Validators for step-by-step directions.
3.10) Make sure the CSS used with this page validates, with no errors, according to the W3C's online CSS Validator.
3.11) If there are any errors, fix them, re-upload the page, and validate it again.
3.12) Make sure your home page has a working link to all your completed exercises, including this exercise, and that the HTML and CSS on your home page validates.
3.13) Submit the feedback form. To receive full credit for this exercise you must submit this feedback form before it expires. This exercise is due on 4/15/13. Allowing for a one-day grace period, the form will expire at the end of the day following the due date, which means the form for this exercise will expire at 12:00 AM on 04/17/13.