Internet Publishing - PPI 

"Materials used in this course are the property of the author. These lessons may be used only by course participants for self-study purposes. Application for permission to use these materials for other educational purposes such as for teaching or as a basis for teaching should be directly submitted to the author." 


Lesson 6: Configuring Servers

In this lesson, we will look at configuring a HTTP server. As examples, we will use win-httpd 1.4 and WebSite. Some webservers for Windows (for example, WebSite) have their own configuration programs. Therefore, the configuration processes can be very different, but the functions ought to be approximately the same.

For Windows 3.x users: Win-httpd is the windows version of NCSA's httpd for UNIX, so the configuration is almost the same for these two. Configuration is done by editing text files.

For Windows95 users: In WebSite, configuration is done by menus in the configuration program "WebSite Server Properties".

Even though you may not run your own server in the future, there are useful things to know about the kinds of opportunities that exist so that you know what you can ask of (or demand from) your distributor. If you are running the server yourself, you have full control and a lot of flexibility. If you are not running the server yourself, but only publishing pages on a commercial server (for example, Compuserve or Eunet), you have less flexibility. Everything that has to be done beyond publishing the actual piece, dead documents, must be done in partnership with your distributor. This can seem like a hinderance, but it doesn't need to be. Serious distributors have built up competence in this area and are able to provide fast and helpful assistance.

We now find ourselves in Chapter 5 of the book.

Comparable information for httpd 1.4 can be found at:
on Win 3.x's machine
on Per's practice machine

A little configuration information about WebSite can be found at:
on WebSite's server
on Per's practice machine

Configuration

We will try to make the same (or almost the same) configuration on the two webservers Win httpd14 and WebSite. The configurering process of the two servers is very different. In this lesson, you will find some general comments on what shall be configured as well as special paragraphs for each type of server where how is taken up.

Many of the commands in the configuration files specify paths to directories or files. These paths can be of two types:

virtual path 
Virtual paths are paths which occur as URLs. Take the URL
http://pb1.idb.hist.no/~per/
Here
/~per/
is a virtual path. 
phsyical path 
A physical path is the actual location on the harddrive. The physical path which is equivalent to the URL above is c:/usr/www. A physical path can be absolute or relative. An absolute path begins at the root of the directory (/). A relative path gives the location of the file (or directory) as it would need to be typed in order to get to the file (or directory) with the current directory as the starting point. Relative paths always begin with the directory name (for example, usr/www/), while absolute paths always begine with slash (/) or drive letter (C:/ i MSDOS). 

Configuration files should be kept secret. Dishonest people may read information from the configuration files which give them access to backdoors and security loopholes. No one beside the one running the server needs to see these files, so don't make them accessible from the web tree ("better safe than sorry" principle.) 

Configuration of the Win 3.x - httpd 1.4

Configuration in win-httpd 1.4 is done by setting variables in different configuration files. These are flat ASCII files which you edit with a text editor (Notepad in Windows or edit in DOS). Using Word would be unnecesary and laborious.

The files contain, at the start, just 7-bit characters (American ASCII). I don't know if I am unnecessarily conservative, but try to avoid using special national characters (In Norway we have the characters æøåÆØÅ - I don't know how these characters appear in Greece or in The Nederlands.) Even though one should in principle be able to write what one wishes in commentary lines (lines that begin with #), one ought not take unnecessary chances. I also use only 7-bit characters in filenames and directory names. ("He who laughs last...")

There are four files which decide the configuration: 

httpd.cnf 
This is the primary server configuration file.
srm.cnf 
This is the resource map for the server.
access.cnf 
Here, global security configuration is made.
mime.typ 
Information about document types which the server sends out is written here.

The following rules apply for all files:

  1. All files are insensitive to upper and lowercase letters (case-insensitive). 
  2. Comment lines begin with the pound sign (#) and must have their own lines. 
  3. With the exception of access.cnf, sentence sequence is not important. 

When you start the server, all files are read into the memory of the machine. If you make changes to the configuration files, you must stop the server and reboot so that the changes come into effect. 

Files and paths must be written as in UNIX, that is with a slash (/), and not like in MS-DOS with a backslash (\). 

The connection between virtual paths and physical paths exist with the help of the alias directive in srm.cnf (see below). 

Configuration of the Windows95/NT - WebSite

In WebSite, there is a program called WebSite Server Properties with its own menu for configuring the server. In this program, virtual paths are written in UNIX format with "/", while the physical paths are written in MS-DOS format with "\".

Configuration of the Administrator

Win 3.x - HTTPD.CNF

If you have installed win-httpd in the standard manner, you should see the httpd.cnf file here. As you can see, it contains many comments. With "ServerAdmin" you can define the administrator of the server. If there is anything you don't understand or are unsure about, check the book or the manual at: /httpddoc/setup/httpd/Overview.html

WebSite

In this file, you change ServerAdmin so that your own e-mail address is given in case of error messages.

In WebSite Server Properties, you choose the General "card" where you can specify the e-mail address for the administrator.

Configuring the Server Name

ServerName may aslo be need to be changed. The machines I work with are, as you probably already know, called pb1.idb.hist.no (httpd 1.4) og pb.idb.hist.no (WebSite) . These are boring names for machines. In addition, the names follow the machines. This means that if I get a new machine, I will aslo get a new address. It would probably be wiser to tell the local name service on the Internet (DNS - Domain Name Service) that pb1.idb.hist.no has an alias which is, for example, ppi.idb.hist.no. Then, all requests from the outside which are directed to ppi.idb.hist.no would be send to pb1.idb.hist.no. In this same way, the users would be able to point to a web server with a more logical name. You have certainly noticed that most of the serious web servers have defined aliases which start with www.---.



Example: You probably accessed the lesson for this course from the server www.idb.hist.no. This is a UNIX machine which is actually called astfgl.idb.hist.no. You can try using the latter variant just to see if you get the same results.


Win 3.x - HTTPD.CNF

Back to local configuration of your httpd. If there is an alias for a web server in the Internet's name service, it is also appropriate that your httpd uses this alias when it says who it is. That is, the ServerName ought to be set to the same alias as is placed in the DNS (Domain Name Service). If no alias has been created for your machine, you shall use the machine's own domain name. You will find this in your machine's TCP/IP program set-up.

Those of you who are running a modem connection to the Internet will not, as a rule, be given a permanent domain name. You can choose any name you wish as you will be the only person with access to this server.

WebSite

In WebSite Server Properties you perform the same operation under the Identity "card". Here, you set in the domain name or IP address for your machine.

NB! If you use a freestanding machine without a permanent IP number (for example, you are connected via modem), you ought to use localhost in place of the domain name.

Log Files

All web servers generate log files of what happens on the server. We will take a closer look at this when we come to server statistics.

For httpd 1.4, HTTPD.CNF will contain information about logging. Here, the names of the other configuration files are also defined. Therefore, this is the foundation for the configuration files (the mother of all configuration files).

For WebSite, the Website Server Properties program and the card Logging is used to make changes the logs configuration.

MIME Types

In the beginning, there was 7-bit ASCII. E-mail was used to send text messages written in American English. Everything was fine. Then, people began sending foreign characters such as æøåÆØÅ, the demand for knowledge increase, multimedia came into being and people wanted to send both apples and SNAKES over the Net. The solution was MIME: Multimedia Internet Mail Exchange. MIME types describe what is being sent and how it is encoded. MIME codes make it possible for the client to start a program which can display the transferred file. A sound bite can be run on a sound program, etc. 

Win 3.x - httpd 1.4

In the file mime.typ a connection is created, between file ending (the last part of the file name) and MIME type -- for example, files which end in .txt of the type plain/text.

The most basic MIME type is application/octet-stream. If a client receives a file of this type, it can't display it in any way. It normally asks the user where to save it. In my mime.typ I have added some file extenstions of this type so that you would be able to download these file types from my server.

        application/octet-stream       bin exe dll

On the client, you are able to create a compatible connection between last name and viewer. 

Read more about this in the win-httpd manual: /httpddoc/setup/typesfor.html 

Document Location

Now, we will take a look at configuring the documents' location on the harddisk in relation to the URLs (virtual paths) which the clients use to get to the files.

It is not always desirable to place all doucments in one directory, for example, c:/httpd/htdocs. It is desirable to have a set-up which is independent of which server program is used. One year I may decide run another web server which doesn't have a directory called httpd/htdocs. It is also easier to keep my files separated from the web server's with regard to back-ups.

Win 3.x - httpd 1.4 Uses SRM.CNF - Server Resource Map

In the file srm.cnf important things are defined, such as DocumentRoot and Alias (and a few other things). 

DocumentRoot tells the server what the root of your web tree is. I have not changed this from the standard set-up. Instead, I have used alias to explain where my web files can be found: 

    Alias /~per/   c:/usr/www/

    Alias /files/ c:/usr/files/

The alias command is written as Alias fakename realname. Fakename is the virtual path which the customer gives as the URL. Realname is the absolute physical path. With the help of the alias command, you can take the physical tree which lies on your harddrive and cut it up and graft it together again in a virtual tree which becomes your web. 

Redirect is used to give a message that your document has moved. 

WebSite

In WebSite Server Properties you choose the Mapping card for so-called "mapping" functions. Here, you can define connections between virtual and physical paths. The "Document Root" or URL is also defined.



Example: Fredrik had an idea which he described in the file didrik.html. Later on, the idea became a project, and it was wise to move the file to a project directory, /prosj/didrik/. Therefore, he created an element such as the following: 

 Redirect /~fredrik/ideas/didrik.html 

http://pc130.idb.hist.no/~fredrik/proj/didrik/didrik.html



If someone asks to see didrik.html under ideas, a message would be sent back saying that the file has moved to a new address. The client automatically picks up this message and asks to be sent to the new address. All this occurs behind the scenes. The user doesn't see these messages which are sent to and fro. 

Automatic Index

If you name an URL which is a pointer to a directory, that is, the URL ends with a slash (/), and an index.htm or index.html file can't be found in this directory, the web server will generate a html document about the contents of the directory. (The document will not be saved, but will be generated each time.) You have a lot of leeway with regard to what this document will look like.

I have a directory on the httpd 1.4 server where this is illustrated: httpd 1.4 files

On the WebSite server, you will find an example at: WebSite files

Win 3.x - httpd 1.4

If you have a file called #readme.htm or #readme.txt, it will be included in the document. That is, this file will exist as a file in the file listing and its contents will be sent to print on the client. In this way, you can describe the contents of the directory for the users. If you want this file to be called something else besides #readme, you can use the directive ReadmeName in the local srm.ctl file.

You can edit the name of what shall become the standard index of the directories. The most usual is one called index.html on UNIX machines and index.htm for MS-DOS/Windows machines. A problem arises when these two worlds meet. If we have a tree (or a forest) of web documents which are visible both from MS-DOS/Novell and UNIX and both UNIX and MS-DOS servers are running, problems arise. We can configure the servers to recognize both .htm and .html as html documents, but we can only ask them to recognize either index.htm or index.html as the standard file for a directory. Since MS-DOS can't teach itself to see index.html, the solution must be to use index.htm on both systems. However, it is difficult to get UNIX people to limit themselves to a MS-DOS name. The best solution is perhaps to use a new neutral name which everyone can accept, for example, index. (Since MS-DOS is now considered dead due to the introduction of Windows95, this discussion is perhaps meaningless.)

The automatic index will go into the individual html files and pick out <TITLE>the title of the document</TITLE>. There is a bug in win-http: TITLE tags must be written in uppercase letters. I have reported this defect, but have only received a semi-automatic answer back.

If there are files which are not html files, you must write a description of them and create the file #haccess.ctl in the same directory as the file. An element in my #haccess.ctl file from the "/files/" directory looks something like this:

AddDescription `PaintShopPro for MSWindows 3.x` psp311.zip

Make sure that ` point in the right direction. It should also be possible to use regular quotation marks (").

WebSite

In WebSite Server Properties choose the Dir Listing "card" in order to configure this function. You can define names for the header, footer and file description files. All the files begin with the #-sign so that they will not appear in the file listing, for example, #header.html, #footer.html and #filedesc.cnf. Try WebSite Files or the demopage at Per's Documentation.

The format for file descriptions is:

(space) comments

filename | description

filename | description

Access Control

Earlier, I wrote about the different levels of web servers. I have now realized that one needs perhaps 4 levels: 

  1. Subconscious: Things you have saved and don't want to delete, but which you don't want to be steadily reminded about.
  2. Personal: Your own notes, ideas, etc. which you don't want to bother other people with, or things which you want to keep hidden.
  3. Internal: Notes and memorandums for internal purposes which belong to your organization.
  4. External: This is what you wish to show the world, things which are published.

It's not easy to know what shall be placed in these categories. One must pay attention to the needs of both oneself and the readers. If you have written down some thoughts about whatever, and the notes lay unstrcutred and in a jumble with gaps in clarity and large holes, a reader will need to spend a long time finding out what your message is. It costs the reader a lot of time to get meaning from the text. This is a form of pollution. This type of pollution is much more visible with regard to e-mail than on the WWW. A person who sends out a draft to all employees, costs the company many work hours and the usefulness is meager. A much bigger document which contains irrelevant information (for example, an invitation to the company soccer match) will, if it is clearly set-up and easily identifiable from the beginning, only be read by those who are interested. Everyone else will delete it immediately.

This holds true for the Web as well. If you have something to share, share it, but let the status of the document be very clear and let the document identify itself. (A document which contains a lot of misspellings, lacks an introduction and formal idenfication of the author, etc. should be considered a working document.) If you do not want to share it, hide it. This can be done by limiting access to parts of the web tree by configuring the server.

Principle for Access Control

There are two methods for controlling access to information:

  1. Personal control: By this, we mean that whoever wishes to receive information must provide a userid and password. We can set up a (random) number of users, each with his own password, and tie these to each individual directory we wish to protect.
  2. Address control: With this form of control, the server will look at the machine address of the person who wants to get information in order to decide if that person shall be permitted or not. This is impersonal in that the user doesn't need to give a password or userid. It is plausible that the user finds himself a "legal" address so that he can receive information. In all other cases, a meassage will be sent out saying that the information requested is not available. 



Example: On Per's practice machine, there is a directory which can only be read from machines with the domain name ending with "idb.hist.no" (the domain name for my department). Try this. If you are using a machine outside my Department of Computer Engineering, you will be sent a message saying that the area requested is restricted. Both personal control and address control are connected to the directories on the web server. All files in the same directory will, therefore, have the same control. The control mechanism is based on two criteria:

  1. Those (user or address) who are allowed to access information.
  2. Those (user or address) who are denied to access information.

Notice that both 1 and 2 are used, and that the order can be configured.

Example: Below is an image for configuring WebSite. The example illustrates address control.
 

Pay special attention to those who have permission to read can be specified by both IP number (those beginning with 199.182) or with the domain name (those ending in idb.hist.no). If domain names are used, the web server must be set up to search DNS, which isn't always desirable. Therefore, the most effective way to control accesss would be to use IP numbers.

ACCESS.CNF Access Control

Win 3.x - httpd 1.4 Users

In httpd 1.4, there is a global access.cnf file which configures the global access to the server. In addition, we can have local #haccess.ctl files in each directory. We shall, at this time, only look at how you can limit access to a few directories to the users registered on your machine and who have passwords. 

I have a branch on the web tree of the practice PC pb1.idb.hist.no with notes which are password protected. You can try this directory by using the userid poi and the password mecpol .

Here is the method for creating protection: (The file #haccess.ctl was created in this directory with the following contents.) 

AuthUserFile c:/httpd/conf/lovbruk.pwd

AuthGroupFile c:/httpd/conf/empty.pwd

AuthName Examples

AuthType Basic



<Limit GET>

require user poi

</Limit>


AuthUserFile is the physical path to a file which contains an overview of users and passwords. It looks like this: 

poi:Tx3uDv;2Io*Gc

Yes, this is an authentic image! If some of you break the code, let me know. This is the type of information which one usually doesn't want to make public. It contains the userid (ppi) and the password (encrypted). How did I make this file? With winhttpd; a program called htpasswd.exe is included.

The User's Manual for the program is at /httpddoc/setup/admin/UserManagement.html. This a MS-DOS program, and the file is created by using the command: 

  htpasswd -c C:\httpd\conf\lovbruk.pwd poi

When the program is run, it will generate a new AuthUserFile (password file) if the -c option is included. If you are going to add an existing password file, you don't use the -c option. After that, follows the path and filename for the password file, and finally the userid for which you will set up passwords.

When running, the password must be given twice. The program will now place an encrypted version of the password in the file, as shown above.

This access control applies for the directory where the #haccess.ctl file lies and the entire tree underneath it. However, it is the physical tree that we are talking about. If you have grafted together a virtual tree consisting of many physical branches, each branch must be protected separately.

WebSite Users

WebSite users can define users and passwords via WebSite Server Properties and the "cards" Users and Access Control. See the example above. The configuration program creates the necessary files for you.


Assignment: Configuring Web Servers

The answer to this exercise shall be:

Students running httpd1.4:

  1. On your httpd machine, you shall create a directory outside httpddoc for your publications. In the exercise text, I call this directory C:/usr/www, but you can use another name. What did you call the physical directory? 
  2. Set this physical directory in the WWW tree as the virtual directory /files. Describe how this was accomplished.
  3. Create a subdirectory to /files (this is the virtual name - you may have chosen another name for the physical directory). The subdirectories shall be called /publ and /private
  4. Create password protection for the /private directory with the userid oliver and the password terces. What did you call the password file? 
  5. Place a few files in the /publ directory which are not called index.htm, so that the directory is displayed as an index. Create a #readme.txt file which describes the contents of the directory. Create a #haccess.ctl file describing each file. Which files have you placed in the /publ directory? 
  6. Copy the contents of #readme.txt for /publ and #haccess.ctl for both directories and include them in your e-mail. 
  7. Copy the password file to your e-mail answer. 
  8. For every file you copy, give the filename first.
  9. Copy the configuration file c:\httpd\conf\srm.cnf to your e-mail.

Students running WebSite:

  1. On your WebSite mahcine you shall create a directory outside htdocs for your publications. In the exercise text, I call this directory C:/usr/www, but you can use another name. What did you call the physical directory? 
  2. Set this physical directory in the WWW tree as the virtual directory /files. Describe how this was accomplished.
  3. Create a subdirectory to /files (this is the virtual name - you may have chosen another name for the physical directory). The subdirectories shall be called /publ and /private 
  4. Create password protection for the /private directory with the userid oliver and the password terces. What did you call the password file? 
  5. Place a few files in the /publ directory which are not called index.html. The directory will appear as an index if a client asks for the URL to this catalog. Create a #fildesc.ctl file which describes the contents of the directory. See the example in the directory ...WebSite\wsdocs\32demo\demotree\#fildesc.ctl. Which files have you placed in the /publ directory?
  6. Copy the contents of #fildesc.ctl for /publ/ into your e-mail text.

This assignment is due 29 april 1997. 


Date: 1 March 1997 Fredrik Wilhelmsen og Per Borgesen