Appendix A - Troubleshooting
We're not supposed to have to do this... but if all else fails, try a reboot. :o)rm -rf /var/lib/dhcp/dhcpd.leases |
Appendix B - OptimizationServer Hardware Optimization
DHCP Load Balancing OptimizationDHCP load balancing is controlled by the "split" or "hba" parameter values in the primary's failover declaration. The split option's parameter sets a threshold for deciding when a request is handled by the primary or secondary. The hba option's bit map can be used to precisely set which server will handle the request. This technique isn't perfect, but it does afford a large degree of control over which workstations will be serviced by the primary and the secondary. There's more about this in RFC-3074 and the dhcpd.conf man page. How it worksA terminal's dhcp request provides the client NIC card's mac address. The dhcpd server "hashes" the 48 bits of this address into a repeatable 8 bit value. The purpose of hashing is to scale the large mac address value and make appear somewhere in the range of 0 to 255. "Somewhere" is an important concept because this process essentially converts a mac address into an 8 bit pseudo-random number. Split 50/50:hba ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff: # same as split 128 Split 25/75 (25% for the primary) Get the idea? These examples are for fine tuning and extreme situations. The hash is pretty good and most of the time, you won't need to bother with any of this... just set a split 128 and you'll probably be content with the load balancing. The finer grain adjustments can come in handy if you're dealing with servers that aren't the same speed or have different roles... in which case you'll want most of the load to fall on the faster and more capable machine. These techniques give you a way to make each server work as hard as you want it to... and no harder. openMosix Optimization
There
is a wealth of information about tuning openMosix for
a variety of environments on the main openMosix site at
SourceForge. For administering
openMosix arrays, it doesn't get much better then the outstanding
openMosixView. |
Appendix C - Backup and Disaster RecoveryA cheap, simple and expeditious backup strategy is to store all the unique files for distinguishing a primary and secondary server (during step 4 and 5 of the installation process) to a floppy (or other backup media) and make a duplicate of the primary hard drive . If a primary server fails, this drive can be used to get it back online immediately. If a secondary fails, the backup drive can be installed and the secondary configuration information from the floppy can be used to set it up for secondary operation. This floppy will include all network settings for the primary and secondary and any configuration changes made during the original installation that distinguish them. For LTSP this will include: /etc/ltsp/i386/etc/lts.conf It's possible to create the backup floppy after the install but much easier to do it as you edit the files during the forking of the primary and secondary configuration. User Data is NOT backed up with this procedure! There is no provision in this procedure for user data that has been stored on the ETS. If a network file server with a suitable backup strategy is available, one approach is to force terminal users to store their data there. If user data is stored on the ETS, you'll have to concern yourself with keeping the home directories on the two servers in sync in addition to backing up the data. |
Appendix D - Security ConsiderationsHere are a few ideas for securing your cluster servers. This is NOT a complete solution and NOT guaranteed to make your system secure. It's just a few things I've done to make it less easy for people to get into the box and make unauthorized changes. There is nothing better then having the servers locked up and only available to authorized individuals.... but if the hardware must be physically exposed, here are a few suggestions: Physical Security
etc...4. After editing lilo.conf, run lilo to install the changes. System Security1. Install firewall rules to control traffic into and out of the server's "outside" interface. (eth0 in the example) |
Appendix E - Firewall IssuesA firewall was incorporated into the ETS to provide control over workstation access to the internet and other resources on the trusted internal network. The Shorewall firewall was chosen because of it's outstanding capabilities, ease of configuration and excellent documentation. Below are sample configuration files from a working ETS that define the firewall's interfaces and zones: /etc/shorewall/interfaces The default policy is to completely block the servers (and therefore workstations) from accessing the internal network. The policy also allows any traffic from the workstations to the server and the server to the workstations. Below is the sample configuration file that implements this policy. /etc/shorewall/policy Rules can override the default policy with specific IP's, ports and protocols. The rules in the following example provide the box with NTP, Samba file and printer sharing, DNS, SSH, VNC and WebMin services from the internal network. Domain authentication service also requires that ports 135, 139 and 1024:65535 be open for tcp traffic and 137, 138 and 1024:65535 be open for udp... which is a ridiculous range. I decided to simply open the server up to the Primary Domain Controller (PDC in the sample file) since it is a "trusted" machine on the network. The Backup Domain Server (BDC) should also be included but isn't in the example provided. Below is a sample configuration file that implements these rules. /etc/shorewall/rulesTroubleshooting The easiest way to determine if you've got a firewall issue with a service is to simply turn off the firewall and pass everything through. With Shorewall, this is easily accomplished from the command line. shorewall clearIf the service starts working after you've done this, you'll have to fix your rules... then try again with: |
Appendix F - Enterprise IntegrationThe ETS is a central platform for launching a variety of applications in an institutional or enterprise environment. Most often, the required applications will include, web browsing, email, word processing. In order to centrally manage user logins, NT domain authentication is often required along with Samba printer and file sharing. Configuring the ETS for Samba/Winbind and Domain Authentication In order for the ETS to interoperate with existing Windows NT and 2000 environments, domain authentication of user logins is often necessary. Since this can be a "challenge" to get working, here are the basic steps and principle configuration files from a currently running ETS installation. The examples are for Samba 2.2.3a running on a Mandrake 8.2 system with an NT 4 domain controller. LTSP is running gdm for the display manager. 1. Install Samba (SMB and NMB) and Winbind. 4. Boot order is important: SMB and NMB must start before winbind. (probably not a problem for most distros) 8. You'll be prompted for your domain admin password. Type it in! wbinfo -t 11. If you get a "Secret is good", you're ready to authenticate users. wbinfo -a MYDOMAIN\\myuserid%mypassword 12. If everything is working ok, you'll get that the plaintext authentication was ok and encrypted authentication failed:Troubleshooting 1. If you aren't able to join the domain or there's no response from your domain server, you may have a firewall issue. Aggrevating things to watch out for when testing: 1. The wbinfo utility wants a specific format for authenticating users: wbinfo -a domain\\userid%password Configuring the ETS for Enterprise ApplicationsThe following section is under development. More information will be included in the next document release. Web BrowsingWhile Mozilla 1.0.1 is currently the browser of choice for the ETS, the Phoenix Project provides a lightweight, "lean and mean" version of Mozilla for embedded applications where reduced footprint and high performance are critical. Phoenix is just a browser... no email, IRC or news clients are included, so it is ideal for ETS installs where the server has limited memory and/or disk space. The Phoenix browser also has a set of concise and attractive themes that can be useful for customizing the overall appearance of an ETS. File and Printer SharingWhile all Samba share configuration is done via /etc/samba/smb.conf, Webmin and the ETS Administration System provide a much easier and safer way to manage this subsystem. The Samba SWAT utility should be installed (comes with Samba) to provide the greatest linked functionality from Webmin. One of the most comprehensive and feature laden e-mail clients for Linux is Evolution from Ximian. It is highly recommended for enterprise environments. The general release of Mozilla also provides mail and news clients that are good for less demanding applications. Word ProcessingThe most widely used and comprehensive and cost free office suite is OpenOffice. For a little money, you can purchase a supported version of Sun's StarOffice. There are small differences but in general, StarOffice and OpenOffice are functionally identical. Sun provides generous discounts for educational institutions and excellent documentation. OpenOffice is currently provided with the Mandrake 9.0 and Red Hat 8.0 distributions. Wine Integration - Running Legacy Windows Applications under LinuxThe most popular way to run legacy Windows applications under Linux is to use Wine. There are a number of different "flavors" of wine which have all been derived from the original Wine Project. Each variation has chosen a different application area to focus on and refine. Some are free.. others aren't. Be forewarned, that making Windows applications run under Wine (which is still in an "alpha" state) isn't an easy task. Windows is a single user platform so running Windows applications in a multi-user LTSP environment will require special considerations and great care. Installing Wine1. Log in as a non-root user. That's it... the Wine binaries are now installed. Type wine from the command prompt and you should see the "Wine 20021031" in the resulting usage prompt Configuring WineThe installation process creates a hidden subdirectory in your home called .wine. We'll be doing virtually all of our work with the applications in this area so let's make it the current directory and display its contents by typing: cd ~/.wine The most important file in the ~/.wine subdirectory is config... open it with your favorite text editor and read the header. The purpose of this file is to map Windows to Unix resources. You'll see Windows drive letters to Unix paths, fonts, multi-media drivers, serial com ports, registry keys... virtually everything that a Windows program will request can and must be properly mapped to a corresponding Unix resource. Some of the default values for drive letter to Unix paths are wrong for Mandrake 8.2 and must be fixed before we can verify our installation. Adjust your config file to match the following: [Drive A] [Drive D] Verifying your Wine installationRun the following commands (as a non-root user) to verify the Wine installation: Fine Tuning your Wine Installation1. Windows programs are generally designed for a single user environment. If you want to run one in a multi-user environment, you may need to make an install for each user. |
Appendix G - Self Test ProceduresAn automated Self Test and Notification System (STANS) for verifying that the ETS is fully operational is planned but not implemented at this time. The mon utility will handle some of this but the higher level diagnostics will include:: 1. Verification that the servers are within operational limits (RAM, disk space, CPU load, etc) and performs a general health check. |
Appendix H - Installing openMosixInstalling an openMosix binary package is a simple and quick procedure. The specifics presented here assume an RPM based system using lilo for the boot manager: 1. Download the desired RPM kernel and userland tools from the openMosix site on SourceForge. If you're having trouble seeing your other nodes, verify that you don't have a firewall issue. I cannot find any documentation on the ports and protocols openMosix uses for inter-node communication so I can't specify them. Fortunately, the installations I've made so far haven't required the openMosix interface to be restricted. For runtime configuring and programming, see the openMosixAPI. |
Appendix I - Fail-over ProceduresHard fail-overAn ETS hard fail-over occurs when the primary or secondary looses contact with it's DHCP peer. If a server hardware failure is responsible, any client sessions running on the failed machine will be lost. Rebooting the affected workstation will re-establish a session with the remaining server via the DHCP fail-over mechanism.Soft fail-over It is sometimes necessary to select a fail-over condition in order to perform maintenance on the primary or secondary servers. Unfortunately there is no way to do this that is transparent to the users on the server that is targetted for shutdown. <<<*caution:Probably the least disruptive way to take an ETS peer out of service is to place the DHCP server on the other ETS into a "partner-down" state near the end of a work shift. The machine in the "partner-down" state will assume the full load for all future DHCP requests. At this point, notify all the users that are currently running on the machine that needs to be removed from service and tell them to reboot at the end of their shift. The other ETS will pick up the new session requests and run the workstations during the planned outage. The process can be reversed (move operational ETS from "partner-down" to "normal") at the end of the next work shift in order to place both machines back online and restore them to their normal roles.>>> *caution* This is an untested procedure that is currently in the development stages. |
Appendix J - Duplicating the ETS SystemsAssuming an existing and fully functional ETS installation, here's a procedure to create a duplicate system. It assumes that eth0 and/or eth1 are using static IP addresses that will need to be reassigned:1. Duplicate the primary and secondary hard drives from the operational system.Scripting could provide automation and greater speed for this replication process. |
Appendix K - System AdministrationFor administering the ETS, the browser based Webmin was chosen for it's well designed and flexible architecture, comprehensive configuration coverage, wide third part support and ease of use. It's also a secure system that utilizes ssl for creating a secure channel and certificate verified links between the administrator and the ETS. While the documentation for Webmin is extensive, it's a largely self-explanatory system with a relatively short learning curve. To install Webmin, download the package from here or here. The instructions are here but it's a simple matter to get it installed. On an RPM based system, the command is: rpm -i webmin-1.030-1.noarch.rpm. After the installation, bring up a local browser and start configuring Webmin and interacting with the local ETS host by typing the url: http://localhost:10000 If Webmin has been configured to use ssl, the url will be: https://localhost:10000 If Webmin hasn't been configured to use ssl, set it to do so... link security is free! You may also want to consider generating a server certificate for each server you administer to increase security and confidence during remote logins. To log in remotely, just replace localhost in the example with the IP or name of the remote ETS: https://10.104.75.9:10000 While Webmin uses SSL, don't place any ETS interface directly on the Internet. Internet access from an ETS should always be done via a well maintained and very secure external firewall. OpenSSH is another invaluable remote admin tool that works well with and compliments Webmin. In situations where you'll need to maintain ETS machines on the other side of a firewall, you can maintain a portal to portal security envelope by using SSH to port forward through the firewall to the internal ETS machine... then using the https connection to Webmin on the ETS host. For example, typing in the following command will set port 8080 on your local machine to listen, open a secure connection to remotefirewall and tell it to forward any traffic from port 8080 on the originating machine to port 10000 on the ETS at the internal IP address 10.104.75.9. SSH -L 8080:10.104.75.9:10000 myuserid@SomeRemoteFirewallWithSSH.com After you've established this connection, you can bring up a browser on your local machine and access Webmin on the remote ETS with the following URL: https://localhost:8080 If you're a Linux expert or beginner, an excellent overall reference for administering Linux systems can be found here. The K12Linux Project also provides some excellent documentation for setting up and administering Linux systems. |