- Shopping Bag ( 0 items )
In this chapter, we will look at how file systems are organized and how this organization affects performance. We will offer suggestions on how you might arrange file systems on a network basis to make your network more manageable.
File Systems from the Roots Up
The file system that Unix uses to store everything from device drivers to users' data is undoubtedly familiar to you. With a single root from which all branches of the treelike structure derive, the file system provides a degree of consistency across the most divergent variations of Unix. Yet the simplicity of this hierarchical file system (playfully depicted in Figure 1.1) hides an increasing degree of complexity with every major release of Solaris. Early versions of Unix systems employed a single disk-based file system. The current version of Solaris supports many different types of file systems-the customary disk-based fast file system; network-based file systems; compact-disc read-only memory (CDROM) file systems; redundant array of inexpensive disk (RAID) systems, in which many disks combine into one logical file system; file systems on disk operating system (DOS) diskettes; and even pseudo or virtual file systems, which are comprised not of files at all but of processes, network sockets, character device drivers, and the like. Even so, the convenient abstraction of the familiar treelike structure persists, and most Unix users are oblivious to the complex structures upon which this abstraction is built.
To begin our examination of Solaris file systems and how you can optimize the performance and utility of file systems across your network, let's first examine a traditional Unix file system (UFS) from the ground up. A single file is, first and foremost, a collection of data-whether ASCII or binary-that is stored in contiguous and/or noncontiguous chunks on a disk. These chunks axe sized according to a storage unit called a block (for UFS, the default block size is 8192 bytes). You have probably also noticed that directory files are always some multiple of 512 bytes. When you first create a directory with mkdir, you will notice that it looks something like this:
This is because space for directory growth is allocated in 512-byte units. By preallocating space in directories, they can grow without as much overhead as if space had to be added each time the contents of a directory changed. Files, of course, don't grow in units of blocks, but by the amount of data they contain. The space allocated for them, however, does grow in units of blocks, which allows them to grow up to the next increment of a block before additional disk space must be allocated. In general, all but the last block of a file is full. Figure 1.2 illustrates disk space allocation for a regular file.
Some file systems milk this efficiency even further by allocating more than a single additional block when a file grows. These are referred to as extent-based, as opposed to block-based, file systems. By allocating a larger amount of disk space, the files can grow more contiguously. On the other hand, an extentbased file system runs a greater risk of wasting space that is allocated and not used. The separate blocks of data comprising a single file are linked to each other by on-disk record structures, which store the location of each of the blocks. This structure is sometimes referred to as a block map. In a similar manner, the available disk space on a file system is maintained in a free-block list. A more complex record structure is used to store the descriptive data concerning each file, sometimes referred to as metadata (a common term for data that describes data). This metadata includes the file's owner (i.e., the user ID [UID]); the file's associated group (the group ID [GID]); the permissions matrix; the file creation, modification and access times; the size and type of the file; and so on. In fact, the only items not stored in this structure are the contents of the file (as mentioned, this is stored on disk) and the file's name and location within the file system.
A file's name is stored within another file called the directory file. The inclusion of the filename within a particular directory also determines where it "lives" within the tree structure. This location has almost nothing to do with the location of the data itself. In fact, the same file can exist in any number of locations, even in the same directory more than once if it has more than one name. If a single collection of data blocks comprising a single file has more than one file system identity, this duplication is only expressed through the separate directory entries and in the number of links indicated within the inode. Directory files, as you might suspect, also contain the inode numbers of the files they contain...
|Part 1||Setting Up Your Solaris Infrastructure||1|
|Chapter 1||Making Smart Decisions about File Systems||3|
|File Systems from the Roots Up||3|
|File Systems: Short Cuts to Data||6|
|Controlling File Access with Access Control Lists||11|
|Types of File Systems||14|
|Pseudo File Systems||16|
|Making Use of/proc||20|
|The /proc Commands||22|
|Building and Maintaining File Systems||25|
|Formatting and Partitioning||27|
|Creating New File Systems||30|
|Mounting and Unmounting||32|
|The Life Cycle of Mount Operations||33|
|Security and Automounter||36|
|Direct and Indirect Maps||37|
|File Systems and Performance||39|
|Laying Out File Systems||41|
|Clients and File Systems||42|
|Securing NFS versus Secure NFS||46|
|Administering Secure NFS||47|
|Chapter 2||Planning Backups and Restores||49|
|File Systems and Backups||49|
|Full versus Incremental Backups||50|
|Backups versus Archives||52|
|The ufsdump Command||55|
|The ufsrestore Command||57|
|The tar Command||58|
|Chapter 3||Booting and Hardware Diagnostics||61|
|The PROM Monitor and OpenBoot Firmware||61|
|The Boot Command||65|
|Perusing the Device Tree||70|
|Setting Up an Alternate Boot Disk||74|
|Chapter 4||Configuring Run States||77|
|Picturing Run States||78|
|The Init Process||78|
|The rc Scripts||81|
|Kill and Start Scripts||83|
|Chapter 5||Installing and Patching Your Solaris System||87|
|Patching Your Systems||89|
|Why so Many Patches?||90|
|Different Types of Patches||91|
|Chapter 6||Exploiting JumpStart||93|
|Network Information Server||97|
|Configuring Your Site for JumpStart||97|
|Do for Each Release||99|
|Update Your Install Server||100|
|Do Once for Each System Type||101|
|Creating Begin and Finish Scripts||109|
|Do Once for Each Client||113|
|Chapter 7||Setting Up Name Services||117|
|Domain Name System||117|
|Installing the DNS||122|
|The Boot File||122|
|The Cache File||131|
|The Forward Map File||134|
|The Reverse Map File||136|
|Starting Up the DNS Server||137|
|Setting Up Solaris as a DNS Client||138|
|Troubleshooting Your DNS Environment||138|
|Chapter 8||Network Information Services: NIS+ and NIS||145|
|Basic NIS+ Objects Explained||147|
|How NIS+ Works||148|
|Planning Your NIS+ Domain Structure||151|
|Common How Tos||152|
|How to Administer NIS+ Credentials||158|
|How to Administer NIS+ Groups||159|
|How to Administer NIS+ Tables||160|
|How to Examine NIS+ Tables||161|
|How to Modify NIS+ Tables||161|
|How to Regularly Administer NIS+||162|
|How to Remove NIS+||162|
|How to Define the Printer Table in NIS+||162|
|Some Frequently Asked Questions||169|
|The NIS Master||172|
|What If You Want to Change Your Master Server to Another Machine?||173|
|The NIS Slave||174|
|What If You Want to Add a New Slave Server after the Domain Is Already in Place?||176|
|The NIS Client||177|
|Part 2||Managing Your Systems||181|
|Chapter 9||Monitoring Your Environment||185|
|Monitoring Tools: Build or Buy?||187|
|Evaluating Network Management Systems||188|
|Chapter 10||Understanding File Systems-So That's What Those Are!||215|
|Chapter 11||Automating Everything...Well, Almost!||225|
|Your Fill of Scripting and Then Some||225|
|Essential Elements of Scripting||227|
|The Effect of Quotes||230|
|Delimiters and Escape Characters||232|
|for, while, if, and case Statements||234|
|Dealing with Limits||237|
|Using the Right Tool for the Job||238|
|Good Scripting Practice||239|
|Running Scripts at Regular Intervals||243|
|Web Link Updates||247|
|Chapter 12||Keeping Your Solaris Systems Secure||249|
|Security through Service Elimination--inetd||250|
|Services Based on RPC||257|
|Replacing Services with Secure Counterparts||258|
|Security through Wrappers||260|
|Users and Groups||267|
|Logging and Log Files||270|
|Patches and Security||273|
|Chapter 13||Implementing High Availability: Eliminating Single Points of Failure||275|
|The Mission Plan for High Availability||276|
|Rolling Your Own--Can It Be Done?||280|
|Choosing a Third-Party HA Solution||284|
|Implementing Your Solution||285|
|Part 3||Looking After Your Hardware||287|
|Chapter 14||Maintaining Your Sun Hardware||291|
|Troubleshooting Your System Hardware||291|
|Upgrading the OpenBoot PROM Level||294|
|Desktop Flash PROM Update||296|
|NVRAM--The Heart of the System||298|
|Changing the Ethernet Address||299|
|General sun4c, sun4m, sun4d, and sun4u IDPROM Programming||302|
|Step 1||Go to the OpenBoot Monitor||302|
|A Quick-and-Dirty Guide to Restoring the NVRAM of a sun4c/m/u Machine||305|
|Examples of Restoring the NVRAM||306|
|Tips and Tricks for Your Hardware Adventures||308|
|Resetting the NVRAM When Stop-N Doesn't Do It||309|
|Other More Arcane Methods for Modifying the IDPROM||309|
|The hostid on Solaris 2.5 x86 and Up||311|
|The NVRAM in sun4 Architecture Machines||312|
|Enterprise Server NVRAM Programming||312|
|Chapter 15||Peripheral Vision: Understanding and Configuring Other Hardware||315|
|Managing Your Printers||315|
|Solaris 2.X Printing||316|
|New Printing Options in Solaris 2.6||319|
|Tips and Tricks for Solaris 2.x Printing||321|
|Serial Port Configuration (Terminals and Modems)||324|
|Setting Serial Port Attributes||325|
|Port Monitors and sacadm||327|
|Configuring Ports for a Dumb Terminal||329|
|Configuring a Dial-in Modem||330|
|Configuring Dial-out Modem||331|
|Chapter 16||The E10000 (Starfire)--Not Just a Big Unix Box!||333|
|What's the Difference?||333|
|The System Service Provider--Intimidation at Its Finest||336|
|The SSP User||338|
|Bringup--Your Key to the E10000||340|
|Netcon--It's Your Friend||343|
|Know Your SSP||346|
|The Dynamic Domain||348|
|Hostview--Since When Do Un*x People Need GUIs?||348|
|The dr Is In||348|
|Blacklisting for Fun and Profit||353|
|Recovering from Crashes||355|
|Hints and Scripts||357|
|SSP Command Reference||358|
|Part 4||Surviving in the Real World||361|
|Chapter 17||Running an Internet Site||365|
|The Dangers of Internet Presence||366|
|Understanding Ports and Protocols||369|
|DNS and E-mail Configuration||377|
|Client Mail Protocols||380|
|Configuring FTP Services||381|
|Managing Your Web Presence||382|
|Specifying Ports and Protocols||383|
|Chapter 18||Coexisting with the Evil Empire||389|
|The Command Line versus the GUI||394|
|Remote Management Options||397|
|Managing File Types||402|
|Other Approaches to Compatibility||407|
|The X Window System||407|
|Porting Your Expertise||408|
|Appendix A||Index of Useful Web Sites||411|
|Appendix B||NIS+ Resources and Notes||415|
|Bibliography and Recommended Reading||455|
NOTE: The Figures and/or Tables mentioned in this sample chapter do not appear on the Web
Laying out file systems on your servers and clients is one of the most important steps you will take in building a manageable network. How you arrange file systems on individual hosts and across a network can dramatically affect the performance of your systems and network, the reliability of your systems, security, cost, your backup strategy, and the amount of work that you have to do to maintain these file systems and the software and data they house. In fact, your file systems can easily be the single most significant factor in the overall performance and reliability of your network.
In this chapter, we will look at how file systems are organized and how this organization affects performance. We will offer suggestions on how you might arrange file systems on a network basis to make your network more manageable.
File Systems from the Roots Up
The file system that Unix uses to store everything from device drivers to users' data is undoubtedly familiar to you. With a single root from which all branches of the treelike structure derive, the file system provides a degree of consistency across the most divergent variations of Unix. Yet the simplicity of this hierarchical file system (playfully depicted in Figure 1.1) hides an increasing degree of complexity with every major release of Solaris. Early versions of Unix systems employed a single disk-based file system. The current version of Solaris supports many different types of file systems-- the customary disk-based fast file system; network-based file systems; compact-disc read-only memory (CD-ROM) file systems; redundant array of inexpensive disk (RAID) systems, in which many disks combine into one logical file system; file systems on disk operating system (DOS) diskettes; and even pseudo or virtual file systems, which are comprised not of files at all but of processes, network sockets, character device drivers, and the like. Even so, the convenient abstraction of the familiar treelike structure persists, and most Unix users are oblivious to the complex structures upon which this abstraction is built.
To begin our examination of Solaris file systems and how you can optimize the performance and utility of file systems across your network, let's first examine a traditional Unix file system (UFS) from the ground up. A single file is, first and foremost, a collection of data-- whether ASCII or binary-- that is stored in contiguous and/ or noncontiguous chunks on a disk. These chunks are sized according to a storage unit called a block (for UFS, the default block size is 8192 bytes). You have probably also noticed that directory files are always some multiple of 512 bytes. When you first create a directory with mkdir, you will notice that it looks something like this:
solaris% mkdir mydir; ls -ld mydir
drwxr-xr-x 2 sdraven staff 512 Nov 11 11: 15 mydir
and, later, as you add files, it jumps in size:
solaris% mkdir mydir; ls -ld mydir
drwxr-xr-x 2 sdraven staff 1024 Nov 22 16: 05 mydir
This is because space for directory growth is allocated in 512-byte units. By pre-allocating space in directories, they can grow without as much overhead as if space had to be added each time the contents of a directory changed.
Files, of course, don't grow in units of blocks, but by the amount of data they contain. The space allocated for them, however, does grow in units of blocks, which allows them to grow up to the next increment of a block before additional disk space must be allocated. In general, all but the last block of a file is full. Figure 1.2 illustrates disk space allocation for a regular file.
Some file systems milk this efficiency even further by allocating more than a single additional block when a file grows. These are referred to as extent-based, as opposed to block-based, file systems. By allocating a larger amount of disk space, the files can grow more contiguously. On the other hand, an extent-based file system runs a greater risk of wasting space that is allocated and not used.
The separate blocks of data comprising a single file are linked to each other by on-disk record structures, which store the location of each of the blocks. This structure is sometimes referred to as a block map. In a similar manner, the available disk space on a file system is maintained in a free-block list. A more complex record structure is used to store the descriptive data concerning each file, sometimes referred to as metadata (a common term for data that describes data). This metadata includes the file's owner (i. e., the user ID [UID]); the file's associated group (the group ID [GID]); the permissions matrix; the file creation, modification and access times; the size and type of the file; and so on. In fact, the only items not stored in this structure are the contents of the file (as mentioned, this is stored on disk) and the file's name and location within the file system.
A file's name is stored within another file called the directory file. The inclusion of the filename within a particular directory also determines where it "lives" within the tree structure. This location has almost nothing to do with the location of the data itself. In fact, the same file can exist in any number of locations, even in the same directory more than once if it has more than one name. If a single collection of data blocks comprising a single file has more than one file system identity, this duplication is only expressed through the separate directory entries and in the number of links indicated within the inode. Directory files, as you might suspect, also contain the inode numbers of the files they contain.
This provides a way for the file system to easily retrieve the metadata when a long file listing is requested with the ls -l command. Figure 1.3 illustrates the components of a single file in a traditional Unix file system.
File Systems: Short Cuts to Data
A file system can be said to be an interface to files-- a way to address files and access their contents. When a user issues an ls command, for example, the file system receives a request for the contents of a directory file. If the user has execute permission, the contents of the directory will be displayed. If the user issues an ls -l command, additional resources must be tapped. Information stored in the associated inode must also be read and displayed. In addition, some of this information must first be resolved; the UID and GID fields will be looked up in the password and group files (or maps) and the textual name will be displayed in place of the numeric identifiers.
Adding a single line to a file can cause it to extend beyond its current last block and require allocation of another block. The new block must be added to the block map. The length of the file as noted in the inode must be changed and the modification time updated to reflect the change, as well.
If a hard link is created to an existing file, the number of links field in the inode must be incremented. If a hard link is removed, the field must be decremented. If the number of links field drops to 0 (i. e., all instances of the file have been removed), the file space can then be reclaimed and the blocks can be restored to the free list. Figure 1.4 displays the file structures associated with two hard-linked files.
Adding or deleting a file involves a similar sequence of operations. When a file is added to a directory, an inode is reserved, space is allocated, the directory file is updated with the name and inode number of the new file, and the block map is updated. When a file is removed, its blocks are added to the free-block list, the directory file is updated as the file's entry is removed, and the inode is made available (provided there are no other links to the file).
When a user is reading a file, the directory identifies the inode-- used to determine file access privileges and locate the file on disk. In fact, a file's access permissions are used in any file operation to determine if the individual performing the operation has the proper privilege to do so. The file's content is then retrieved from disk.
If a file is moved, one or more directory files must be updated. If it is moved within a directory, a single directory file is modified. If it is moved from one directory to another, two directory files must be modified. Neither the block map nor the inode associated with the file is affected. If a file is moved from one file system to another, however, the process of establishing it on the new file system is like creating a new file. The existing inode can no longer be used because it belongs to the initial file system.
The classification of files in Solaris can confuse neophytes. The first breakdown they often hear about is that of regular versus special files. The classification of regular files encompasses a wide range of file types as users think about them-- binaries, scripts, data files, configuration files, and so on. The organizing element is this: The kernel doesn't make any distinction between any of these file types. Differences between them exist only at the user level (e. g., the content, whether they are executable, and so on). File types that don't fall into the regular category are links, pipes, sockets, and so on. These files are recognized as being different and are treated differently by the kernel. Table 1.1 lists file types and the characters used to designate them in a long listing (i. e., ls -l).
Although the Solaris kernel does not differentiate between different types of regular files, users do. So do windowing systems. For this purpose, there is an underlying classing structure that identifies files by type. This structure enables the expected thing to happen when a user double-clicks on an icon within the file manager tool or drops it into another window (e. g., a print tool). In addition, the /etc/ magic file is used to identify file types using embedded magic numbers. Not all file types have magic numbers, of course. For those that do, the offset (generally 0), type (length), and the identifying pattern are specified in the/etc/magic file. The entry 0 string %PDF-1.2 Adobe Portable Document Format (PDF) specifies that version 1.2 of the PDF format is identified by virtue of the fact that its files begin with the string %PDF-1.2. A user can determine the file type of a specific file by issuing the file command. This command will look at the first several bytes of the file and reference the /etc/ magic file to determine the file type.
dumpxfer: executable /opt/ LWperl/ bin/ perl script
log. out: ascii text
logmon: executable /bin/ ksh script
mailman: executable /bin/ ksh script
processlog: executable shell script
watcher: executable /bin/ ksh script
nightlyreport: executable c-shell script
killit: whois: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped
You can add file types to /etc/ magic by editing the /etc/ magic file and placing the offset, string, and type in the file. For example, if you have a type of file called a testfile, which always starts with the string test, the following /etc/ magic entry would be coded:
#offset - type - value - File - Type
0 -string -test -testfile
With this entry, if you have a file called myfile with contents testing 1 2 3, the file command would identify the file type as shown here:
# file myfile
Taking advantage of /etc/ magic is a great way to help your users identify their files. Consider defining file types for any major applications that your users access.
If you have never browsed through the wide range of file bindings available in your windowing system's binder tool, you will probably be impressed by the variety of file types that the windowing system can recognize and treat differently, although most of these you will probably never see. The kernel knows only the eight types listed in Table 1.1. Figure 1.5 illustrates a breakdown of files by type, from a user's point of view. The file types recognized by the kernel are circled. You can see from this incomplete breakdown that differentiating files by type can get to be quite arbitrary. The important thing to remember is that as far as the kernel is concerned, a regular file is a regular file. It will not try to stop you from making a file of C source code executable and running it any more than it would try to prevent you from printing a binary; you'll simply get some very odd results.
Directory files, as we've mentioned earlier, are special files that contain the names and inode numbers of the files they "contain." We put that word in quotes for a good reason. The relationship between directories and files exists only because of these entries. Directory entries, by the way (you might want to look at the include file dirent. h), also contain a file name length for each entry.
Block special device and character special device files are unusual in that they do not contain data. Meant solely as access points for physical devices, such as disks and tapes, these files provide the means to access the related device drivers. The main difference between the two is how they handle input/output (I/ O). Block special device files operate on blocks, while character special device drivers work on a character-by-character basis.
Sockets are special files used for communications between processes, generally between systems. Client/ server applications, as well as many network services, use sockets.
Named pipes-- also called FIFOs for "first in, first out"-- are special files used for communications between processes, generally on the same system.
Symbolic links are pointers to other files. They are similar but not identical to shortcuts in Windows. Symbolic links-- sometimes called symlinks or soft inks-- have as their contents the names of the files they point to.
Hard links, as you probably recognize, are not special files at all. As mentioned earlier in this chapter, a hard link is indistinguishable from the file it links to-- not true with symbolic links-- except for their locations within the file system. In fact, either could be called the link or the file. There is no difference.
Controlling File Access with Access Control Lists
Recent versions of Solaris extend the common file access permissions with optional access control lists (ACLs). For example, if you want to associate a file with more than one group of individuals without having to go through the bother of creating a single group containing all of them or running the risk of making the file accessible to the world, ACLs will come in very handy. ACLs do not require any up-front preparation at the file system or directory level. This puts some of the responsibility for maintaining groups onto users, who can now, essentially, create their own groups. Before ACLs, it was nearly impossible to exclude a particular user from having access to your files or one of your files; now it's both possible and easy. Before ACLs, you could not associate more than one group with a file; now you can.
Each file in Solaris (or any version of Unix, for that matter) has a set of owner, group, and world permissions detailing what each of these entities can do with the files. This access control structure is stored in the inode and consulted each time a file access is attempted. We describe this traditional access control in Chapter 12. The subsidiary ACL is stored on disk and read into an incore memory structure as needed. ACLs define accesses in excess of those provided by the traditional structure.
The content of an ACL entry can be described as three colon-separated fields, for example, user: mfs:-, where the first of the fields identifies the type of entry; the second holds the UID or GID for the entry type chosen; and the third holds the permissions to be set for that user or group. In our example, user: mfs:-, the user mfs is given no permission to the file in question. Even if the file permissions are set to rwxr-xr-x, mfs cannot read or execute it because the ACL specifically denies mfs this access.
The entry type field can be any of the values user, group, other, or mask. ACLs provide the user with the ability to define a mask that applies to all accesses to a set of files for all users-- except the owner-- and takes precedence over other settings.
The two commands that are used to set and report on ACLs are setfacl and getfacl. The getfacl command displays the owner, group, and ACLs established for a particular file. With the -a option, the command displays the owner, group, and access control list for a file. With a -d option, the command displays the owner, group, and default ACLs established for the file. Examples of the command and responses follow:
spoon% getacl regfile
# file: regfile
# owner: em
# group: staff
spoon% getacl extfile
# file: extfile
# owner: em
# group: staff
Not all file system types support ACLs, but they are supported in UFS, NFS, cachefs, and LOFS.
The setfacl command sets, modifies, or removes the permissions for a file or group of files. It can be used to replace the ACL or to modify particular entries. It has a number of options.
-s Sets the permissions specified on the command line
-f Uses the specified file for file settings to be set
-d Deletes entries from the file's ACL
-m Modifies one or more ACL entries
-r Resets the mask permissions
Keep in mind that the setfacl command doesn't create an ACL if it isn't needed. If the user isn't specifying permissions beyond those provided in the normal permissions fields of the inode, the command simply alters those values. In other words, the setfacl command can be used to modify the normal permissions for the normal users (owner, group, and world)-- just like a chmod or chgrp command-- or it can be used to add permissions for specific other users or additional groups, something that chmod and chgrp cannot do.
Where a setfacl command is similar to a chmod or chgrp command, a getfacl is similar to an ls -l. They do similar things, but use a different format both for the command and for the output.
Beginning with Solaris 7, the standard Solaris UFS file system incorporates a logging function that, while transparent to users, makes the file system considerably more reliable. Those of us who remember sitting through painfully long fscks will probably be delighted by the "File system is stable" messages that we now usually see when rebooting a system that has crashed. The reason for the change is logging. Because UFS file systems now maintain a log detailing updates to files, they have the means for recovering the state of a file system without an extensive analysis and rebuilding by fsck.
Fsck makes five passes over a file system, checking different file system structures in each pass, as follows:
1. Blocks and sizes.
4. Reference counts.
5. Cylinder groups.
By examining the file system from each of these perspectives, it is able to repair damages resulting in a file system that is, once again, consistent.
Most file system inconsistencies in the past were due to the constant state of fluctuation that file systems are in and the fact that changes to file systems usually require several operations, as shown in the previous section. Modified data blocks must be written back to disk; additional blocks must be allocated and represented in the block map; inodes must be created, removed, and updated; and directory structures must be updated to reflect new files, deleted files, and moved files. If a crash occurs when some, but not all, of these changes have taken place, an inconsistency is created. The fsck tool attempts to resolve this kind of inconsistency by examining the various structures and determining how to complete the interrupted file system changes. It examines inodes, the block map, and directory files in an effort to piece together a consistent file system by adding, removing, or modifying file system structures. For example, if a directory contains the name and inode number of a file that no longer exists, fsck can remove the reference. When fsck knows nothing about the state of a file system that it is asked to check, it is extremely thorough and time-consuming while examining and repairing it.
The way that logging works in Solaris today is as follows. First, there is a logging file structure in the same partition as the file system. Before any change is made to the file system, a record is written to this log that documents the intended change. The file system change is then made. Afterward, the log is once again modified-- this time to reflect the completion of the intended change. The log file, in other words, maintains enough information about the status of the file system to determine whether it is intact. If there are outstanding changes at mount time, the file structures associated with those changes are checked and adjusted as needed, and an examination of the entire file system is avoided.
Types of File Systems
There are a number of file systems that Solaris can support, out of the box or with some additional software. Before we get into all of the file systems, we will briefly describe the major categories of file systems: disk-based, memory-based, network-based, and so on. The framework within which these different file system types are supported simultaneously has come to be known as the virtual file system. The generalization of the inode structure into the vnode makes allowances for the non-UFS types.
The basic difference between different disk-based file systems is how they store and access data. Earlier in this chapter, we describe how UFS structures such as inodes, block maps, and directories are used to organize data and make it available. It's easy to imagine a file system without directory files; it would simply have a "flat" file space, though it might still use inodes and lists to keep track of files and free and used space. UFS, along with probably every other file system available today, however, is hierarchical. It's fairly accurate to picture each file system as a structure between you and the data on disk that allows you to find and use portions you think of as files.
UFS is the standard Berkeley fat fast file system (FFFS) and the one we have been discussing. UFS is the file system you will most commonly encounter on Solaris systems. Others in this list are third-party file systems with features similar to those of UFS. VxFS is a product from VERITAS Corporation and QFS is from LSC. These two file systems provide some advantages over UFS, primarily by providing for extremely large files and file systems.
For UFS, the maximum for both is a terabyte; for the others the maximum is, more or less, a petabyte (actually 2 63 bytes). Another difference is whether the file system is block-or extent-based; the difference here is whether files are extended a block at a time or some number of blocks. A third is whether there is support for hierarchical storage management (HSM), in which data transparently flows from faster to slower media on the basis on infrequent use; HSM is not available in UFS, but is available in VxFS and QFS. Some of the major differences between these and the UFS file systems are shown in Table 1.3.
If you have particular need for file systems larger than 1Tb or for some of the other features shown in Table 1.3, the article by McDougall cited in The Recommended Readings will provide further insights into the trade-offs of using these products.
In addition to Unix file systems, Solaris supports pcfs and hsfs. These file system types are most commonly encountered when dealing with diskettes and CD-ROMs, but we are not necessarily restricted to these media types.
The pcfs-type file system implements the DOS fat file system. Once mounted, standard Unix commands such as ls and cd can be used with these files. One should be careful, however, in managing the differing line termination conventions between DOS and Unix files. Whereas DOS files end lines with both a carriage return and a line feed, Unix files end lines with only a line feed. The unix2dos and dos2unix commands overcome this minor inconsistency quite handily.
The hsfs is the High Sierra file system, also referred to by the name ISO 9660. The Solaris implementation also supports the Rock Ridge extensions. While mounted, standard Unix commands can be used to access files and move around in directories. However, the underlying geometry and structure of an hsfs file systems is dramatically different from that of UFS-- and, of course, you cannot write to files on a CD-ROM.
File systems of type tmpfs may look and act like disk-based file systems, but actually reside in memory more than on disk. The primary use of /tmp is for swapping. Data is not written to disk unless physical memory is exhausted. The contents of tmpfs file systems are, therefore, temporary. In fact, the /tmp file system, the only file system of this type on most Solaris systems, is emptied out on a system reboot. We briefly describe swapping a little later in this chapter.
The Network File System (NFS), now available on virtually every operating system, ties systems together by providing an intuitive file-sharing mechanism across a network and divergent platforms. Happily, differences in the underlying system's processing (e. g., whether the processor orders bytes into words using the big-endian or little-endian method) are transparent. With automount support, NFS performance can be greatly improved.
Pseudo File Systems
Pseudo file systems, though they may appear as file systems to users, are actually abstractions through which various system and process data can be accessed. Pseudo file systems are used to represent such things as processes, network sockets, device drivers, and FIFOs. In our discussion of the /proc file system, we will clearly show the benefits of accessing information about running processes through pseudo file system abstraction.
Pseudo file systems include those listed in Table 1.4.
Cachefs is a file system type that inserts itself into the interaction between an NFS server and a client. Blocks of data obtained by the client are stored (or cached) in the cachefs portion of the local disk. Cachefs improves access to slow or heavily used file systems. The local copies of files are accessed after the initial fetch, making subsequent fetches unnecessary and reducing traffic on the network. Cachefs is optional and not used by default. File systems are not cached in this way unless you create a cachefs file system and specify it in your mount of the remote file system, either in your /etc/ vfstab file or automount maps.
Cachefs uses the concepts of back and front file systems-- back being the authoritative actual file system and front being the local copy. The back file system is unaware of the way the files are being cached locally. The front file system is used predominantly. Whenever you access a file on the back file system, it is cached on the front file system.
The contents of the local cache can become outdated if the original file on the server is modified after the file has been cached. The system, therefore, keeps track of each cached file's attributes and checks these periodically against the copy on the server to determine whether modification has happened. If it has, the local copy is purged and the next access results in a new load of the file from the server. The time interval between checks of file attributes can be modified.
The cfsadmin command is used to build and manage caches. The command options are listed following:
c Creates a cache within the specified directory
d Removes the specified file system from cache
l Lists file systems stored in the cache specified
s Requests a consistency check with the back file system
u Updates the resource parameters of the specified cache directory (all file systems in the cache must be unmounted at the time you issue this command)
See the man page for the cfsadmin command for additional details. In the following example, a cache is being created and used with NFS file system /usr/ local.
spoon# cfsadmin -c /vol/ cache/ cache1
spoon# mount -F cachefs -o backfstype= nfs, backpath=/ usr/ local,\
cachedir=/ vol/ cache/ cache1 server:/ usr/ local /usr/ local
Automount entries using cachefs might look like one of the following:
/homeauto. home -fstype= cachefs, backfstype= nfs
* -fstype= cachefs, backfstype= nfs, cachedir=/ vol/ cache/ cache1 \
server1:/ export/ home/&
In these examples, the arguments specify the following:
backfstype. The type of the file system being backed.
backpath. Where the back file system is currently mounted.
cachedir. The name of the local (front) directory created to be used as cache.
As specified in the example, the mount sits on top of the same file system that it is backing. In other words, the mount of the cache becomes transparent as far as the user is concerned. The file system mounted as /usr/ local is still referred to as /usr/ local. This is not necessary, but is a good convention to adopt.
It is important to understand that, as much as cachefs can provide a dramatic improvement in NFS performance, it can also degrade performance quite dramatically. The overall effect hinges on the balance between local and remote file accesses. The more cached files can be used, the greater savings will result from not having to go back to the server to fetch them again. If, on the other hand, most every file access requires an update of the file from the server, little is gained. Further, when a cached file is updated on the client, the local copy is marked invalid and the file is written back to the server. When the file is needed
again, it is fetched from the server. This behavior results in more NFS overhead than were the file system not cached.
In general, file systems that change frequently are not good candidates for cachefs because of the possible performance hits. File systems that change infrequently, such as /usr/ local (usually a repository of
site-specific tools), are good candidates.Cachefs options can be specified in automounter maps. In the master map entry shown here, we're using cachefs for an entire indirect map:
* -fstype= cachefs, backfstype= nfs, cachedir=/ vol/ cache/ cache1 \ server:/ export/ home/&
Swapping should not be confused with paging. These are related but different system activities. A Solaris system replaces pages (chunks of data comprising processes and their data) in memory as needed to give all running processes opportunities to execute. This is paging. While this is happening, the page daemon is keeping track of memory requests. When the demand for memory is excessive and the system is having difficulty maintaining the free-memory list, entire processes-- rather than just pages-- are replaced in memory. This is swapping.
When swapping occurs, anonymous memory (basically, the process's data) is moved to the swap area while other pages are flushed to disk. In time, so that a process moved out of memory can continue, the page in swap is returned to memory and page faults (i. e., messages indicating that desired pages are not in memory) result in the pages corresponding to executable code being reloaded from disk.
Though /tmp looks like a normal file system to users and has open permissions so that anyone can store files there, you don't want users to do so, both because these files will not survive a reboot and because an excessive use of the space in /tmp could affect performance if swapping were required. In older versions of Solaris, files were removed from /tmp on reboot, but not directories and their contents. This behavior has since changed and now a recursive operation cleans out everything.
A Solaris system with sufficient memory to hold all processes and data wouldn't actually need disk-based swap space at all. Few, if any, of our systems are ever this well endowed. On the other end of the spectrum, systems that use swap on a regular basis will be obviously suffering; their performance will be unacceptable. On an adequately endowed file system, the swap -l command will often show results such as these, showing that available swap (the disk-based portion) is only lightly used (i. e., most of it is free).
myhost% swap -l
swapfile dev swaplo blocks free
/dev/ dsk/ c0t3d0s1 32,25 8 524392 417400
Making Use of /proc
The /proc file system is an interesting and very useful file system. Though its contents do not correspond to files on disk, its appears to the user as a regular file system. Users can cd into /proc as with any directory for which they have execute permission. They can list its contents with ls or ls -l.
What the contents of the /proc file system represent, however, are not files but interfaces to kernel structures and running processes. By providing an interface which looks like a file system, /proc simplifies the work involved in accessing information available about these processes. For example, before the advent of /proc, programs that use information about kernel structures (e. g., the ps command, which reads and displays the process table) had to read kernel memory. With the /proc file system, these structures are more readily available. This is useful not only to the OS developers, but to any programmers or sysadmins needing access to this type of information.
Although /proc may not be every sysadmin's favorite place to poke around, it offers advantages in debugging. With images of running processes as readily available as files, they are much easier to analyze. Solaris provides a set of tools to facilitate access even further-- the commands in /usr/ proc/ bin, discussed in the next section.
When you examine the contents of /proc, the first thing you'll likely notice is that the directory names correspond to the process IDs of running processes. The init process, for example, will show up as a directory named 1. Within each process directory are a number of files or subdirectories which represent different aspects of the corresponding process.
spoon% cd /proc
0 1129 1281 167 2 22416 23334 27148 28524 3532 382
1 11352 13400 17018 20201 22882 23508 27153 28535 362 436
10382 11353 13403 17065 20281 230 23509 27159 28572 363 467
1039 11354 136 174 20318 23073 23514 28070 29141 366 468
1040 1136 14091 181 20319 23076 23520 28071 3 367 484
1042 1143 14092 18437 20378 23261 238 28072 309 369 5191
1058 1144 158 18494 20418 23264 23898 28437 311 374 5194
1060 11569 1608 187 208 23286 23901 28498 316 375 5705
1068 11570 1615 19024 21357 23289 24431 28508 322 376 5722
1107 11571 16279 19028 217 23308 248 28509 327 377 5725
1110 11591 16280 19570 22247 23311 254 28510 328 378 9562
1125 124 16338 19574 22252 23312 25463 28515 332 379 9877
1126 126 16376 19881 22362 23315 265 28517 349 380
1127 1277 166 19884 22364 23331 27147 28522 3529 381
If you were to compare this list with a ps -ef command, you would notice the matching process IDs.
spoon% ps -ef | head -11
UID PID PPID C STIME TTY TIME CMD
root 0 0 0 Mar 11 ? 0: 02 sched
root 1 0 0 Mar 11 ? 76: 54 /etc/ init -root
2 0 0 Mar 11 ? 4: 55 pageout
root 3 0 0 Mar 11 ? 247: 55 fsflush
nobody 3529 349 0 15: 18: 44 ? 0: 01 /usr/ sbin/ httpd-apache -f
/etc/ httpd-apache. conf
root 362 1 0 Mar 11 ? 0: 11 /usr/ lib/ saf/ sac -t 300
root 124 1 0 Mar 11 ? 90: 48 /usr/ sbin/ rpcbind
root 126 1 0 Mar 11 ? 0: 06 /usr/ sbin/ keyserv
root 187 1 0 Mar 11 ? 3: 05 /usr/ sbin/ cron
daemon 166 1 0 Mar 11 ? 0: 09 /usr/ lib/ nfs/ statd
Each process is actually represented by a directory that holds a number of files and other directories corresponding to different portions and attributes of the process. Note the various structures associated with a single process in the following example. The owner and group reflect those of the person running the process. Read the man page on /proc for additional information.
spoon% ls "1 23308
-rw-------1 nici staff 1323008 May 31 12: 02 as
-r--------1 nici staff 152 May 31 12: 02 auxv
-r--------1 nici staff 40 May 31 12: 02 cred
--w-------1 nici staff 0 May 31 12: 02 ctl
lr-x------1 nici staff 0 May 31 12: 02 cwd ->
dr-x------2 nici staff 1184 May 31 12: 02 fd
-r--r--r--1 nici staff 120 May 31 12: 02 lpsinfo
-r--------1 nici staff 912 May 31 12: 02 lstatus
-r--r--r--1 nici staff 536 May 31 12: 02 lusage
dr-xr-xr-x 3 nici staff 48 May 31 12: 02 lwp
-r--------1 nici staff 1536 May 31 12: 02 map
dr-x------2 nici staff 288 May 31 12: 02 object
-r--------1 nici staff 1952 May 31 12: 02 pagedata
-r--r--r--1 nici staff 336 May 31 12: 02 psinfo
-r--------1 nici staff 1536 May 31 12: 02 rmap
lr-x------1 nici staff 0 May 31 12: 02 root ->
-r--------1 nici staff 1440 May 31 12: 02 sigact
-r--------1 nici staff 1232 May 31 12: 02 status
-r--r--r--1 nici staff 256 May 31 12: 02 usage
-r--------1 nici staff 0 May 31 12: 02 watch
-r--------1 nici staff 2432 May 31 12: 02 xmap
The /proc Commands
The set of commands that facilitate use of the information stored in a /proc file system, sometime referred to as the proc tools, includes the following: pcred, pfiles, pflags, pldd, pmap, prun, psig, pstack, pstop, ptime, ptree, pwait, and pwdx.
The command pcred prints the effective, real, and saved UID and GID of a running process. The command pcred 1234, as shown here, shows that process 1234 has UID of 111 and GID of 11:
solaris% pcred 1234
1234: e/ r/ suid= 111 e/ r/ sgid= 11
The pfiles command lists open files associated with the specified process along with any file limits the process may have imposed. Each open file is described by a number of values, including the inode number, major and minor device numbers of the partition in which the file is stored, UID, GID, and permissions. Although the identity of each file may not be easily apparent from the display, any particular file can be identified by using a combination of the device numbers and inode number, as we shall explain shortly:
spoon% pfiles 28555
Current rlimit: 64 file descriptors
0: S_ IFCHR mode: 0666 dev: 32,24 ino: 127001 uid: 0 gid: 3 rdev: 13,2 O_ RDONLY| O_ LARGEFILE
1: S_ IFCHR mode: 0666 dev: 32,24 ino: 127001 uid: 0 gid: 3 rdev: 13,2 O_ RDONLY| O_ LARGEFILE
2: S_ IFCHR mode: 0666 dev: 32,24 ino: 127001 uid: 0 gid: 3 rdev: 13,2 O_ RDONLY| O_ LARGEFILE
3: S_ IFDOOR mode: 0444 dev: 163,0 ino: 59379 uid: 0 gid: 0 size: 0 O_ RDONLY| O_ LARGEFILE FD_ CLOEXEC door to nscd[ 208]
15: S_ IFCHR mode: 0620 dev: 32,24 ino: 127110 uid: 131 gid: 7 rdev: 24,6 O_ RDWR FD_ CLOEXEC
16: S_ IFCHR mode: 0620 dev: 32,24 ino: 127110 uid: 131 gid: 7 rdev: 24,6 O_ RDWR FD_ CLOEXEC
17: S_ IFCHR mode: 0620 dev: 32,24 ino: 127110 uid: 131 gid: 7 rdev: 24,6 O_ RDWR FD_ CLOEXEC
18: S_ IFCHR mode: 0620 dev: 32,24 ino: 127110 uid: 131 gid: 7 rdev: 24,6 O_ RDWR FD_ CLOEXEC
19: S_ IFCHR mode: 0620 dev: 32,24 ino: 127110 uid: 131 gid: 7 rdev: 24,6 O_ RDWR FD_ CLOEXEC
In the example shown, the process is using nine file handles, but only three actual files. This is evident from looking at a combination of the device numbers (e. g., 32,24) and the inode number (e. g., 127001). Since the same inode number can be used in different file systems for different files, only the combination of these values is sufficient.
The pflags command displays the status of the process, as shown in the following example:
spoon% pflags 28555
data model = _ILP32 flags = PR_ ORPHAN
/1: flags = PR_ PCINVAL| PR_ ASLEEP [ sigsuspend( 0xeffff878) ]
sigmask = 0x00000002,0x00000000
The pgrep command searches for processes by name using the /proc interface. Many sysadmins used to using commands like ps -ef | grep sendmail will come to appreciate the simplification of pgrep sendmail.
The pkill command sends a signal to the process. Like the pgrep command, pkill sends signals to the process named on the command line. Sysadmins familiar with the killbyname script will find the operation of this command familiar. Replacing command sequences of ps -ef | grep inetd; kill -HUP 101, pkill(e. g., pkill -HUP inetd) saves time and simplifies scripting as well.
The pldd command lists the dynamic libraries used with the process.
The pmap command displays the process's address space with memory segment sizes, as shown in the following example output:
spoon% pmap 28555
00010000 140K read/ exec /usr/ bin/ csh
00042000 12K read/ write/ exec /usr/ bin/ csh
00045000 68K read/ write/ exec [ heap ]
EF600000 648K read/ exec /usr/ lib/ libc. so. 1
EF6B1000 32K read/ write/ exec /usr/ lib/ libc. so. 1
EF6B9000 4K read/ write/ exec [ anon ]
EF6C0000 8K read/ exec /usr/ lib/ libmapmalloc. so. 1
EF6D1000 4K read/ write/ exec /usr/ lib/ libmapmalloc. so. 1
EF700000 168K read/ exec /usr/ lib/ libcurses. so. 1
EF739000 32K read/ write/ exec /usr/ lib/ libcurses. so. 1
EF741000 12K read/ write/ exec [ anon ]
EF780000 4K read/ exec /usr/ lib/ libdl. so. 1
EF7B0000 4K read/ write/ exec [ anon ]
EF7C0000 116K read/ exec /usr/ lib/ ld. so. 1
EF7EC000 8K read/ write/ exec /usr/ lib/ ld. so. 1
EFFF6000 40K read/ write [ stack ]
The prun command sets processes running and is the opposite of the pstop command.
The psig command lists the signal actions of the process.
The pstack command shows the stack trace for each process thread.
The pstop command stops processes in similar manner to the default kill command (i. e., the kill command without an argument). It is the opposite of the prun command.
The ptime command displays timing information for processes-- real, user, and system. The following sample output illustrates:
spoon% ptime 28555
The ptree command prints the process's tree containing the process IDs or users:
spoon% ptree 28555
158 /usr/ sbin/ inetd -s
28553 in. telnetd
28586 ptree 28555
The command pwait waits for the process to terminate.
The pwdx command prints the current working directory of the specified process, as shown in the following example:
spoon% pwdx 28555
28555: /home/ nici
Building and Maintaining File Systems
With respect to function, there is little that is more fundamental to systems maintenance than your users' access to their files. Maintenance of file systems--ensuring their availability and integrity--is, therefore, one of the most important aspects of any sysadmin's job.
For many sysadmins, the task of building file systems is simply something that happens during installation. Whether you use the automatic layout feature or profiles in JumpStart, the building of file systems is automated. If you are adding a new disk to a system, on the other hand, you may need to repartition it and manually construct file systems to your own specifications. Once the disk has been physically attached to the system, you want to be sure that the system recognizes it. For Small Computer System Interface (SCSI) disks, you should be sure that the SCSI target is unique for the intended bus. On a system which already has a /dev/ dsk/ c0t3d0 and a /dev/ dsk/ c0t0d0 disk, the most common next choice is /dev/ dsk/ c0t1d0 -SCSI target 1. We recommend attaching new drives when the system is halted.
When you first power up a system after adding a drive, you should stop at the boot monitor (i. e., hit L1-A or Stop-A before it gets too far along in the boot process) and type probe-scsi at the ok prompt. With OpenBoot 3.0, the recommended procedure is to setenv autoboot? false, and then do a reset. This interaction should look something like this:
Type 'go' to resume
Type help for more information
Unit 0 Disk Seagate ST1523ON 1234
showing you that the new disk is recognized. If you don't see your disk at this point, you probably have a problem with your SCSI cabling or with the disk itself. It's pointless to proceed without resolving this problem.
In order to build support for a new disk, you need to reboot with the command boot -r. This command instructs the system to rebuild, and it will configure the device support needed for the new disk. If for some reason you cannot easily boot the system at the console, you can effect this command indirectly by touching the file /reconfigure and then rebooting the system. (Yes, you can do this remotely.) The boot sequence will then include the rebuild and remove the file, preventing the rebuild from occurring during every subsequent reboot. The correct device nodes for the disk are automatically added. If you cannot reboot, or if you have rebooted, but forgot the -r, you can simulate the reconfigure boot with the following set of commands:
drvconfig.Reconfigures the drivers
disks. Looks for new disks and adds the correct /dev links
Once you're booting, you should notice references to the new disk that will look roughly like what follows:
sd1 at esp0: target 1 lun 0
/iommu@ f, e0000000/ sbus@ f, e0001000/ espdma@ f, 400000/ esp@ f, 800000/ sd@ 3,0
<SEAGATE-ST15230N-0638 cyl 3974 alt 2 hd 19 sec 111>
This is a second indication that the disk is recognized and you will be able to access it when the system is fully booted. Most disks that you are likely to purchase will be preformatted and already partitioned. The question is, of course, whether they are partitioned to your liking. Usually, new disks are partitioned as if they are going to contain all the ordinary file systems for a Unix host. If you want to use a disk as one big file system, you don't need to change anything. Simply use slice 2 (once known as partition c), which represents the entire disk. Given the size and price of disks today, you might easily wind up with a 9-or 18Gb file system! You should consider the intended content and how you intend to back up file systems of this size before you choose to go this way. A decision to build a huge file system that might be filled to capacity should be partnered with a commitment to a digital linear tape (DLT) or other high-capacity backup device.
Formatting and Partitioning
Partitioning disks into usable chunks prevents one file system from spilling over into another. This can be a good thing if you're interested in protecting one file system from others. It's also a good thing if your backup strategies are different-- some file systems you will want to back up much more frequently than others. On the other hand, breaking your disk into separate buckets costs you some amount of flexibility. You might easily wind up with excess space in one file system while another fills up.
In Solaris, there are strong conventions about which slice is used for what, but no hard and fast rules. Unless there is an overriding reason, we always recommend following the conventions. Being obvious is a time saver in the long run. Should someone else at some point try to recover data from one of the disks that you set up using another system, they will be more confident working with a disk that is laid out in a familiar way.
0 root (bootable)
1 swap (/ tmp)
2 the entire disk
4 /usr/ local (could be /usr/ openwin)
7 /export/ home
Physically, disks are composed of platters and heads that hover over the spinning platters in order to read and write. Accessing any particular file, therefore, depends on how much one of these heads has to move and how much disk I/O is backed up. Access time is composed of several things: how fast the disk is spinning, how far the head moves on the average, and so on.
Partitioning drives is done with the format command. You must be root to run it. Be careful if you are working on a live system. One of the authors once mistyped a disk address-- the intended drive was one character different from the one she wound up using-- and blew a file system out from underneath a dozen or more busy users. Along with the privilege of being root goes the possibility of making the Big Mistake. Double-check every choice before you hit return.
The format command is usually invoked simply with the command format, but it can also include the disk device as an argument (e. g., format /dev/rdsk/ c0t1d0s2). Note that it is the raw device that is specified:
# format /dev/ rdsk/ c0t1d0s2
selecting /dev/ rdsk/ c0t1d0s2
disk -select a disk
type -select (define) a disk type
partition -select (define) a partition table
current -describe the current disk
format -format and analyze the disk
repair -repair a defective sector
label -write label to the disk
analyze -surface analysis
defect -defect list management
backup -search for backup labels
verify -read and display labels
save -save new disk/ partition definitions
inquiry -show vendor, product and revision
volname -set 8-character volume name
If you enter format at this point, you'll wind up doing a low-level format on the disk. This will take a lot of time and is almost never needed. Only if you are using a drive that you suspect may have problems should you bother with the formatting, analysis, and repair options provided with the format command.
The next step is to repartition the drive. Type partition at the prompt. The following menu will appear:
0 -change `0' partition
1 -change `1' partition
2 -change `2' partition
3 -change `3' partition
4 -change `4' partition
5 -change `5' partition
6 -change `6' partition
7 -change `7' partition
select -select a predefined table
modify -modify a predefined partition table
name -name the current table
print -display the current table
label -write partition map and label to the disk
To view the existing partition, enter print.
Current partition table (original):
Total disk cylinders available: 8152 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 root wm 0 -59 128.32MB (60/ 0/ 0) 262800
1 swap wu 60 -179 256.64MB (120/ 0/ 0) 525600
2 backup wu 0 -8151 17.03GB (8152/ 0/ 0) 35705760
3 stand wm 180 -6975 14.19GB (6796/ 0/ 0) 29766480
4 home wm 6976 -7487 1.07GB (512/ 0/ 0) 2242560
5 unassigned wm 7488 -7743 547.50MB (256/ 0/ 0) 1121280
6 usr wm 7744 -7999 547.50MB (256/ 0/ 0) 1121280
7 var wm 8000 -8031 68.44MB (32/ 0/ 0) 140160
If you want to change the partitioning on this drive, you should be careful to preserve the continuity of the cylinders. Note how the ranges follow in sequence: 0- 59, 60- 179, 180- 6975, and so on. If you alter the size of one partition, adjust the adjoining partitions accordingly and print the partition table again to be sure that the partitions still line up. To adjust a partition, enter its partition number and respond to the prompts. Here, we're combining two partitions into one and assigning the space to partition 6:
Enter partition id tag[ unassigned]:
Enter partition permission flags[ wm]:
Enter new starting cyl[ 7488]:
Enter partition size[ 1121280b, 256c, 547.50mb]: 0c
Enter partition id tag[ usr]:
Enter partition permission flags[ wm]:
Enter new starting cyl[ 0]: 7488
Enter partition size[ 1121280b, 256c, 547.50mb]: 512c
Note that the space can be allocated in blocks, cylinders, or megabytes. We prefer to work in cylinders. When you've finished partitioning the disk and are sure your partition map makes sense, write the label back to the disk:
Ready to label disk, continue? y
You don't have to be using the format command to view the partition table. You can also list it with the prtvtoc command (at the normal Unix prompt), as shown here:
spoon# prtvtoc /dev/ dsk/ c0t0d0s2
* /dev/ dsk/ c0t0d0s2 partition map
* 512 bytes/ sector
* 219 sectors/ track
* 20 tracks/ cylinder
* 4380 sectors/ cylinder
* 8154 cylinders
* 8152 accessible cylinders
* 1: unmountable
* 10: read-only
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 35180160 521220 35701379
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount
0 2 00 0 262800 262799
1 3 01 262800 525600 788399
2 5 01 0 35705760 35705759
3 6 00 788400 29766480 30554879
4 8 00 30554880 2242560 32797439
5 0 00 32797440 1121280 33918719
6 4 00 33918720 1121280 35039999
7 7 00 35040000 140160 35180159
TIP You can use the fmthard command to put a partition table on a disk and avoid the format command. If you have several disks with the same layout, this can save you time. Try using prtvtoc to get the vtoc that you want to copy.
Creating New File Systems
After you've partitioned your new disk to your liking, it's time to build new file systems. You need to build a file system before you can use a partition (unless it's to be used as a raw partition). A certain percentage of the overall space will be used for overhead-- the inodes, free-block lists, block maps, superblocks, and space reserved for file system elbow room (free space), as an absolutely full disk would be impossible to work with. As file systems become too full, they slow down. Space becomes harder to allocate, fragmentation increases, and file activity is likely to be higher. See the material on monitoring your environment in Chapter 9.
When preparing to create a file system, there are a number of decisions that you can make that might significantly influence its performance. These include the block size, ratio of disk space to inodes, and free space-- all of which are parameters that can be specified with the newfs command. The newfs command is, in effect, a front end to mkfs that supplies values for many of the command's options.
At a minimum, you have to tell newfs where to build the file system. A command like newfs /dev/rdsk/c0t2d0s2 would create a file system using all the space on the specified disk. For values for parameters different from the default, use the options shown here:
-a Alternate blocks per cylinder (default is 0).
-b Block size, in bytes (default is 8192).
-c Cylinders per cylinder group (default is 16).
-i Amount of disk space to reserve, in bytes, for each inode created (default is 2048).
-f Size of a fragment, in bytes (default is 1024).
-m Percentage of free space (10 percent on small disks, less on larger disks).
-o Optimization method, space or time (default is time).
-r Rotational speed of the disk, in revolutions per minute (default is 5).
-s Size of the file system, in blocks (default is 2880).
-t Tracks per cylinder (default is 2).
-v Verbose-- displays parameters passed to mkfs (not used by default).
-N No change-- don't create file system, but provide parameters that would be used (not used by default).
Here is an example of a newfs command that specifies one inode for each 8K of disk space and reserves 1 percent of the overall space for free space:
newfs -i 8192 -m 1 /dev/ rdsk/ c0t2d0s2
The ratio of disk space reserved for inodes to disk space reserved for files in this case is 1 to 16. This is because inodes are roughly, if not precisely, 512 bytes in size. Since the default is 1 to 4, this shows the user is expecting larger files than is usually the case.
If you specify the -N and -v options with newfs, you'll see output that details the resulting mkfs command and the parameters that would be used in creating a file system without creating it. Similarly, the mkfs command with a -m parameter reveals the parameters of an existing file system, as shown here:
# mkfs -m /dev/ rdsk/ c0t2d0s2
mkfs -F ufs -o nsect= 80, ntrack= 19, bsize= 8192, fragsize= 1024,
cgsize= 32, free= 4, rps= 90, nbpi= 4136, opt= t, apc= 0, gap= 0, nrpos= 8,
maxcontig= 16 /dev/ rdsk/ c0t2d0s2 4023460
Note that the mkfs command includes a parameter for the type of file system. The newfs command defaults to UFS.
UFS supports block sizes up to 64K. The block size is specified when a file system is created. In addition, a smaller unit of space-- the fragment-- is created to allow for more efficient data storage within the last block of a file. Each block can hold a number of segments (1,2,4,8) fragments. Transfer is still done in blocks. Superblock replication makes the file system more resistant to failure. Also, placement of superblocks and inodes is such that it is unlikely that damage to the file system would destroy so much data that the file system would not be largely recoverable.
A raw partition does not contain a file system at all. Instead, the application using it (often a data base) keeps track of where the data is on the disk. This kind of access can be faster, since updates to metadata and block-mapping structures are not required. On the other hand, it is not general purpose; only the controlling application knows how to access the data it contains.
The newfs command provides a simplified interface to the mkfs command. In the following command, we're building a file system on partition 5, specifying less than the normal percentage of free space. There are numerous other parameters that you can specify (especially if you deal directly with mkfs) to tune your file systems, but you should not use these options without considerable knowledge of how the file system will be used and the implications of each parameter.
spoon# newfs -m 4 /dev/ rdsk/ c0t0d0s5
setting optimization for space with minfree less than 10%
newfs: construct a new file system /dev/ rdsk/ c0t0d0s5: (y/ n)? y
/dev/ rdsk/ c0t0d0s5: 1121280 sectors in 256 cylinders of 20 tracks, 219 sectors
547.5MB in 16 cyl groups (16 c/ g, 34.22MB/ g, 16448 i/ g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 70336, 140640, 210944, 281248, 351552, 421856, 492160, 562464, 632768, 703072, 773376, 843680, 913984, 984288, 1054592,
Some file system tuning parameters can be adjusted at any time, but only a few. The best source of advice on file system and general performance tuning is, of course, Adrian Cockcroft, who writes for Sun and for SunWorld magazine. The second edition of his Sun Performance and Tuning book (with Richard Pettit) is included in the bibliography and recommend ed reading list.
Mounting and Unmounting
Sysadminis used to do a lot of mounting and unmounting of file systems; the mount and unmount commands are, after all, privileged. Only the superuser can issue them to mount or unmount a file system. Then automounter and vold came along, relieving sysadmins of having to be present and involved in every mount and, in the case of automounter, bringing considerable improvement in NFS performance at the same time.
With automounter, reference to a mountable file system results in its being mounted. A cd to a directory or an ls of its contents causes a file system to mount-- provided the user issuing the command has proper permissions for the mount point, and the client has permission to mount the file system in question. A mount point is simply a directory. However, those that automount uses are reserved; you cannot, for example, mount a file system on /home when automounter is running and the mounting of home directories is within its purview. Vold automatically mounts file systems on diskette or CD-ROM when they are inserted into the drive-- provided, of course, that it is running. It is by default. The file systems unmount and eject the media if the eject command is issued-- provided that no one is using the mounted files; simply having cd'ed into a mounted directory occupies it.
The Life Cycle of Mount Operations
Whenever a system is booted, file systems are mounted. Certain of these file systems are mounted early in the boot process; the system would not be able to boot without them. You might have noticed in the /etc/ vfstab file shown earlier in this chapter that neither the root nor the /usr file system was specified to mount at boot time. These file systems house the files that direct the process of booting (described in Chapter 4) and must be available soon after the basic hardware checks. Other file systems are mounted later in the boot process, as configured in the /etc/ vfstab file. Still others are mounted only as requested via a mount command or, as previously mentioned, in response to requests via automounter.
The /etc/ vfstab file specifies file systems to be mounted along with locations and options. It is read during the boot process by the start script /etc/ rc2. d/ mountall.
The file itself is a white space delimited list of files with columns specifying the following:
Here is a sample /etc/ vfstab file:
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
/proc -/proc procfs -no -fd
-/dev/ fd fd -no -swap
-/tmp tmpfs -yes -/
dev/ dsk/ c0t3d0s0 /dev/ rdsk/ c0t3d0s0 / ufs 1 no -/
dev/ dsk/ c0t3d0s6 /dev/ rdsk/ c0t3d0s6 /usr ufs 2 no -/
dev/ dsk/ c0t3d0s5 /dev/ rdsk/ c0t3d0s5 /opt ufs 5 yes -/
dev/ dsk/ c0t3d0s1 --swapfs -no -
NFS server support is not provided (by default) until run state 3 (discussed in Chapter 4). This means that systems do not export their files systems until they have moved into this run state, commonly referred to as multiuser with NFS. Refer to the file /etc/ rc3. d/ S15nfs. server to see how the NFS processes are started for an NFS server. The client script, /etc/ rc2. d/ S73nfs. client, allows mounting of remote file systems in run state 2 (this is also effective in run state 3 as we shall explain in Chapter 4).
The file that controls what file systems are shared by a host is /etc/ dfs/ dfstab. In the following sample dfstab file, we're using a netgroup rather than a string of host names. Refer to the man page for more information on netgroups.
# place share (1M) commands here for automatic execution
# on entering init state 3.
# share [-F fstype] [ -o options] [-d "< text>"] <pathname> [resource]
# .e. g.,
# share -F nfs -o rw= engineering -d "home dirs" /export/ home2
share -F nfs -o rw= utensils -d "apps" /apps
share -F nfs -o rw= utensils -d "archives" /archives
share -F nfs -o rw= utensils -d "home" /raid/ home
share -F nfs -o ro= utensils -d "cdrom" /cdrom/ cdrom0
File systems of type UFS repeat superblock information at various intervals within a file system. These structures contain file system statistics (e. g., the number of files, the label, the overall size, etc.). If the initial superblock is damaged, fsck gives you an opportunity to provide another reboot -n after an fsck; avoid the sync so you don't overwrite your fixes by flushing the output buffers.
Automounter is a standard Solaris feature that streamlines the management of shared file systems and reduces network traffic. Automounter also helps avoid file system problems in environments with many servers and many more clients by timing out on file system mounts instead of freezing one system's resources when a server from which it has mounted a file system becomes unavailable. In addition to the material presented here, we suggest that you refer to anything written by Hal Stern (see bibliography and recommended reading) regarding NFS and automounter to get a more thorough picture than we can present here.
Automounter uses centrally managed configuration information to specify the file systems and mount points for a network. By doing so, it helps to enforce a consistency in the overall naming and arrangement of file systems. It reduces the overhead of NFS mounts by dismounting file systems after a period of nonuse (the default is 5 minutes). Automounter is one of the busy sysadmin's best friends-- it provides what a user needs, when a user needs it, and it goes away by itself. It dramatically reduces NFS hangs and is fairly easy to administer. File systems that are automounted don't need to be specified in each potential client's /etc/ vfstab file. In fact, all the client system needs is the mount point (created by automountd), and a reference to the file system will cause it to be mounted. The rest is accomplished through NIS maps or NIS+ tables-- or their file equivalents, but this is rarely the case.
The services provided by automount are actually brought about through the efforts of several components-- the automount maps (detailing what gets mounted, where and how), the user-level process that acts like an NFS daemon, and the kernel module, autofs, that invokes mounts and dismounts.
The daemon process, automountd, runs on clients. RPC requests are generated in response to requests for file systems as routine as a user changing directory (i. e., using cd) into a directory managed by automounter. Once automountd intercepts a request for data, it passes the request to the NFS server and steps aside (there would be a performance hit if automounter got involved in every transfer of data). After 5 minutes of inactivity, however, automounter sends an NFS request for a dismount.
On large and frequently changing networks, there are few tools that will save you as much headache as the automounter. The days of hand editing files on numerous hosts (e. g., the /etc/ vfstab files) are over with the centralized management possible with automounter. Not only does automounter make it possible for users to log in virtually anywhere on your net and have their familiar home environment available to them, it ensures that file systems unmount when not in use and reduces the network traffic related to NFS. A third advantage is that automounter keeps a system from hanging if the server from which it is automounting file systems is not available.
Although automounter provides users with automatic mounting, there is some setup required on the part of the sysadmin-- a proper configuration of the automount maps and sharing entries on the systems from which these file systems derive.
Security and Automounter
Automounter is subject to the same security considerations as NFS. Even though the tool provides a convenient way to automate mounts, it still requires basic access privilege. Regardless of what automount maps may dictate, a remote file system cannot be mounted if it is not shared and cannot be mounted if access to the mount point is not available.
Automounter uses one or more files-- usually two or three-- to control what file systems are mounted and when. The /etc/ auto_ master file details the other files and maps that are used. The other two maps that you'll generally find in use are /etc/ auto_ home and /etc/ auto_ direct. The existence and configuration of these files depends on how automounter is used at a particular site.
# Master map for automounter
#+ auto_ master
/net -hosts -nosuid
/home auto_ home
In this particular /etc/ auto_ master file, the /net entry tells automounter (i. e., automountd) that it is to manage the mounting of file systems in the /net directory. Any file systems exported and available to the local system can be access via the /net directory. The /home entry instructs automounter to manage home directories using the information found in the /etc/ auto_ home file. The third entry, starting with /-, calls the /etc/ auto_ direct map into use. We'll discuss direct maps in just a moment.
# Home directory map for automounter
#+ auto_ home
* -rw Rwanda:/ raid/ home/&
Entries in /etc/ auto_ home of the type user -nosuid host:/home/user cause an individual's home directory to mount when requested. An entry of the form * -nosuid server:/home/&, on the other hand, causes any home not previously specified to be mounted from the system server. The nosuid option prevents suid program execution from the mounted file system.
The third commonly used automounter file that you will encounter is auto_ direct. This file is used for direct maps. Let's briefly examine the difference between direct and indirect maps and the implications of these differences on your network.
Direct and Indirect Maps
Automounter uses two types of maps to describe the mounting of file systems and directories onto mount points. They are called direct and indirect. For file systems using direct maps, the mounts occur on a directory (e. g., /export/ home is mounted on /home). For file systems using indirect maps, the mounts occur within a directory (e. g., /export/ home/ sandra is mounted on /home/ sandra). Figure 1.6 displays the result of both direct and indirect maps. Automounter knows what file systems it is required to watch. It can be said to own certain mount points and to intercept requests when they are made.
Here is an example of a direct map. The lines in this particular map direct automounter to use webserver:/ usr/ WWW whenever anyone refers to the directory /WWW, the intranet:/ usr/ intranet directory when anyone refers to /intranet, and mailserver:/ var/ mail when anyone refers to /var/ mail.
# direct directory map for NFS mounts
/WWW webserver:/ usr/ WWW
/intranet intranet:/ usr/ intranet
/var/ mail -actimeo= 0 mailserver:/ var/ mail
An indirect map might look like this:
evan server:/ export/ home/ evan
sandra server:/ export/ home/ sandra
It would mount the file systems/ export/ home/ evan and /export/ home/ sandra on /home/ evan and /home/ sandra (these entries appear in the auto_ home map). A more general solution is to use an entry such as the following one. It would cause the home directory for any user from the file server server to be mounted as /home/ username.
server:/ export/ home/&
This causes each user's home directory to mount on /home/ user, and so on.
Direct maps are a lot like indirect maps, but are more explicit. Each direct map entry provides a one-to-one mapping of source and mount point, much like a traditional mount. Whereas with indirect maps, more general-purpose associations between mount points and directories are managed by the automounter using the entries both literal and wild-carded from the /etc/ auto-* files, direct maps specify direct relationships.
Direct maps can also be of the form /filesystem host1,host2,host3:/filesystem. When this is the case, any of the hosts listed can supply the file system when a request for an automount is made. Whichever system responds first provides the mount-- an easy way to load balance and reduce reliance on any single server, without a lot of extra effort. You can also add weighting factors in parentheses (the higher the better) to influence this process. An auto_ direct entry of /apps server1(50),server2,server3(10):/apps would encourage your systems to mount the /apps directory from server1.
As stated earlier, automounter is more useful the more systems you manage and the more your users move from one system to another, whether physically or using remote login commands. To simplify maintenance of your automount maps, the automounter provides a number of variables that take the place of literal entries. The variable ARCH, for example, represents the architecture of a particular machine. OSREL, on the other hand, specifies the OS release. If you were mounting /usr/ lib from a remote system, you could more easily and more
File Systems and Performance
With respect to system performance, file systems are often the most significant factor-- whether we're looking at performance on a single host or across a network. Because disk I/ O is generally slower than any other system operation, it tends to be the weak link in the performance chain; faster central processing units (CPUs), increases in network bandwidth, and more efficient applications are easily lost in the shadow of disk I/ O.
In general, the single improvement that will have the greatest impact on a system's performance, however, is additional memory. A large amount of random access memory (RAM) reduces disk operations since data can be kept in memory for longer periods of time.
Other factors that affect file system performance include the disks themselves (i. e., seek time, latency, etc.), the block size, whether disk accesses are random or sequential, the degree to which caching is used to speed up disk I/ O, the location of disks (local or remote), and so on.
Usage patterns differ depending on the operations a disk performs. Sequential reads, for example, are generally much more efficient than random-access reads. File systems can be configured to optimize a particular usage pattern-- and this is a good idea if they are used in one particular way most of the time. In general, however, file system defaults provide a good general-purpose configuration.
If you know that a file system will be used primarily for sequential reads, a larger block size and a higher value for maxcontig (read-ahead cluster size) will improve performance. File systems such as UFS adjust themselves to some degree to observed access patterns. If the file system detects that reads are happening sequentially, it will use a read-ahead algorithm and initiate reads for the following blocks as it reads a requested block. Different types of file systems use different algorithms. The UFS file system requires that the previous and current reads be sequential; there is only one reader of the file at the time, the blocks of the file must be sequential on the disk, and the file I/ O must be accomplished with read and write system calls. These precautions keep the file system from going into read-ahead mode when it might not be appropriate.
The default values for the read-ahead algorithm might not be optimal. For the best performance on highly sequential reads, an increase in the read-ahead cluster size is suggested. The read-ahead cluster size is set using the maxcontig parameter. A maxcontig of 8 means 8 blocks. With the default block size of 8192 bytes, this represents 64K.
Where applicable, the use of raw partitions can provide a significant speedup in system performance. Increasingly, applications such as database software bypass the conveniences and the performance costs of file systems and access data on disks directly.
If a system has more than one disk, it's a good idea to spread file systems across the disks in such a way that disk I/ O is likely to be balanced. This will prevent the situation where one disk has a long queue of pending requests while the other is basically idle, giving you more balanced I/ O and overall better through-put. The iostat command will help you evaluate disk activity.
Local disks will almost always perform better than disks that must be accessed over a network, but this is not always the case. Trade-offs are made between convenience and performance. A good rule of thumb is expressed in the use of dataless clients in which root and swap are local file systems and less frequently accessed file systems are accessed over the network. Though we are not particularly advocating the use of dataless clients, the design principle is a good one. By centralizing infrequently used systems and keeping the most basic file systems locally, you buy yourself some efficiencies of scale while maintaining reasonable performance on your clients.
There are other techniques to increase the speed or the reliability of file systems (both are important). Disk mirroring, for example, provides two copies of everything and is the greatest boon to reliability, but at the cost of doubling your disks. Disk striping, in which data file systems span disks, increases performance significantly but increases the risk of data loss; the greater the number of disks involved in a striping operation, the greater the chance that one will fail. RAID offers many configurations for increased reliability, including the ability in RAID Level 5 to recreate data, when one disk in a set is lost, from data on the other disks, but is generally expensive.
Like many aspects of running a network, choices of disk technology boil down to trade-offs. As Adrian Cockroft (see references) says in his book, Sun Performance Tuning, the choices are "fast, cheap, safe; pick any two."
Laying Out File Systems
The location of file systems across your network is an issue that you should consider critical to your system management strategy. The use of central servers for homedirectories, e-mail, and applications adds a level of serviceability that doesn't exist if your networked clients are all fully independent standalone hosts. At the same time, it introduces a certain complexity and raises some security concerns.
When making decisions about where to locate file systems, you should consider both the complexity of your network and its performance and reliability. The authors, in fact, suggest trading off performance in favor of an easier-to-manage overall file system arrangement across your network.
One way to look at the issue of file system layout is to consider that, on a network where each host's file systems are local, the work load of installing, managing, and backing up these file systems is on your shoulders or is dispersed throughout the network onto the shoulders of your users. If, on the other hand, file systems are primarily hosted on file servers, the effort is condensed onto your file servers and traffic is added to your network. On a network where file systems are dispersed but their use is primarily managed through client/ server applications, the complexity and load are often on the applications themselves. In a typical network, there is often a mix of these layout strategies. Still, there is benefit in considering the pros and cons of each approach before making decisions regarding how you will distribute files and applications across a network.
There are two seemingly opposing goals that you should keep in mind when deciding how to lay out file systems on a network basis. One goal is to aggregate functions so that your network is easier to manage. The other is to isolate functions (minimizing the dependencies between systems) so that each system is easier to manage. Obviously, these two opposing goals require you to make some very thoughtful choices. You also need to consider performance. At the same time that there is an advantage to having smaller file systems and optimizing each for its particular role, there is a flexibility benefit in joining smaller file systems into larger ones so that available space is not reserved for anything in particular but for whatever need presents itself.
Related functions should be collocated whenever possible. If you use the same tools or even the same occasions to manage a certain function, it makes sense to have them on the same system. At the same time, a single-purpose file server will be much easier to support and will provide more overall reliability. Whenever two or more critical services are provided by a single system, there will be occasions when a failure of one service takes out the other.
A name server that is also a print server or a mail server introduces a certain extremely powerful system well equipped with disk and memory is a better choice than a number of less well endowed systems, each providing a single service. Unless you can afford a system with virtually unlimited resources and set it up with high availability in mind (see Chapter 13), the authors consider the separation of functions among a group of modest systems a better choice than providing all services on a single box.
Clients and File Systems
Before we get into the layout of client file systems, it makes sense to consider the various types of clients likely to be present on a network and, accordingly, the file systems likely to be resident on them. Solaris clients fall into three general classes-- diskless, dataless, and standalone. Diskless clients have no disks; all file activity occurs over the network-- even swapping. Due to the burden that diskless clients put on a network, they are seldom used today. Their one-time popularity in highly security-conscious environments was short-lived.
Dataless clients have disks, but use them only for swapping and to hold the root file system. All system, application, and user files are accessed over the network. This keeps the more I/ O-intense operations off the network while providing for centralized system administration of the other file systems.
Standalone clients have a full complement of file systems on their local disks. Capable of running without the network, these clients seldom actually run this way for fairly obvious reasons. They generally use services available over the network-- such as naming services, printing services, e-mail, and access to the Internet.
Most networks today are heterogeneous. In addition to Solaris servers (and Solaris clients), they have other Unix systems and a mix of Intel and Macintosh systems running variations of Microsoft Windows, MacOS, and Linux. These clients get certain types of support from the network-- for example, home directories, applications, mail, and sometimes data directories and other shared directories to facilitate working together when the application itself doesn't run in a client/ server mode.
The layout of an individual system should involve adequately sized partitions. Before you can make intelligent decisions about how much space to allocate to each partition, you need to know how the system is to be used. You can probably get by with a fairly standard configuration for your Solaris clients. We recommend this, especially if you have a lot of them, as this will make it possible for you to use JumpStart to install and upgrade them (see Chapter 6). It's generally a different story where your file servers are concerned. The default partitioning cannot anticipate the applications you'll install, the number of users you'll be supporting, or how often you purge directories.
Here are some general guidelines about space on server and client systems. Clearly, the amount of disk space available will impact how large your partitions should be. The values in the following listing are suggested ranges.
File System Server Client
/ 32-128 32-64
swap 100-200 50-100
/usr 400 400
/opt 150-500 150-200
/var 100-300 50-100
/usr/ local 100-500 --
/home 100 * users --
/export/ install 500 --
/opt 100-1500 100-300
Things you should consider when planning partition sizes include:
For most networks, centralizing mail services makes a lot of sense-- you only have to configure one server sendmail file and you can keep mail files in a single place where they are easy to manage. The central location also facilitates removal of lock files or pop files. You probably will want to provide a single mail server that corresponds to MX record in DNS; individual hosts can get mail by mounting /var/ mail, or by using services such as POP3 or IMAP4.
With the low price and large disks often available on PC and Macintosh systems, it might make sense to limit the use they make of network resources. Some server resources, on the other hand, can facilitate file sharing without requiring users to use ftp or fetch to transfer files back and forth, an extra step.
How do you decide what to mount and what to automount? One of the clear advantages of automounting is that it's all driven from maps that you create and administer centrally. This is in line with what we've been saying all along about the ideal configuration for sysadmins who want to go home at night. Avoid complexity-- and when you can't, keep it where you can closely watch it. More important, the network load is considerably decreased with automount and you won't suffer from system hangs when systems or file systems are unavailable-- (see Figure 1.7).
As a general rule, it is best to avoid hard mounting remote file systems on NFS servers. To do so is complicated and somewhat risky (it could lead to a deadlock situation in which each of two servers is waiting on the other). However, it isn't necessarily obvious how to avoid mounts if users telnet to your file servers, because their home directories will be automounted. One of the authors, in fact, recently saw a file server run out of swap space while trying to mount home directories from another server that had crashed. Though no one was logged in at the time, the server with the mount problem was a mail server, and it was attempting to automount a home directory each time a piece of mail arrived. The necessity to check for .forward files in the users' home directories required this. As the number of hung sendmail processes skyrocketed, swap space was soon exhausted. Code-pendency problems of this sort can be avoided, but only with careful assignment of server roles and an understanding of the processes involved.
TIP A tidy solution to the problems mentioned in the previous paragraph is to configure sendmail to look somewhere else for .forward files. This feature is available in current releases of sendmail (look for the ForwardPath variable). Another option is to run mail and home directory services on the same machine, avoiding the requirement for cross-mounting and its potential consequences.
NFS starts up only when a system boots or enters run state 3-- if and only if there are file systems to be shared. How does the system know? Easy-- it looks at the file /etc/ dfs/ dfstab. If there are file systems listed in the file, NFS will share them. The /etc/ rc3. d/ nfs. server script starts NFS.
NFS file systems should always be shared using host lists-- and should never be available to the world in general. To share file systems with no access restrictions introduces a security risk. File systems can be shared read-only (as should always be the case with CD-ROMs unless you want to encounter a lot of errors). The most common mode of sharing file systems is to include a list of hosts and give them read/ write access, like this:
share -o rw= congo, rwanda, malawi
Only under extreme circumstances should you allow another host to have the same authority on a file system as the local root. To allow this, use the following syntax:
share -o rw= congo, rwanda, malawi root= zimbabwe
If you are supporting netgroups (available only with NIS or NIS+), you can often save yourself a lot of trouble by using them in place of lists of hosts. Consider, for example, the following /etc/ dfs/ dfstab file:
#this is the dfstab file
share /home -o
rw= congo, rwanda, malawi, chad, zaire, sudan, burundi, kenya, zimbabwe
share /usr/ local -o
rw= congo, rwanda, malawi, chad, zaire, sudan, burundi, kenya, zimbabwe
This file could be reduced to the following more approachable file:
#this is the dfstab file
share /home -o rw= africa
share /usr/ local -o rw= africa
You would define the netgroup africa in the /etc/ netgroup file. To view the file systems that are shared by any system, simply use the share command, as shown here:
-/usr/ man ro= africa "man pages"
-/usr/ WWW rw= tonga: botswana: malawi: gabon: zaire: burundi "WWW"
The NFS mount options are described in Table 1.6.
Securing NFS versus Secure NFS
There are a number of measures that you can take to improve the security of your NFS services. If your security concerns warrant, you should probably go to the trouble of running Secure NFS. However, you should be prepared to deal with the added complexity and be prepared for the lack of Secure NFS products for non-Solaris clients.
Here are some guidelines for running NFS. Refer to Chapter 12 for more information on NFS and security:
Administering Secure NFS
Authentication is the process of validating a username and password entered at login. For Secure NFS, the authentication system is more rigid. You have a choice of the Diffie-Hellman and/ or the Kerberos authentication, each of which uses complex encryption to provide the validation information that supports the authentication process. Authentication services work because the public and private keys are established ahead of time. Public and private keys, the user elements of asymmetric authentication systems, can be thought of as reverse operations-- one undoes what the other does. Keeping one private while openly disclosing the other sets the stage for both private messaging (public key used for encryption, private key used for decryption) and nonrepudiation (private key used for encryption, public key used for description).
The commands to establish key sets for the Diffie-Hellman authentication are the newkey and nisaddcred commands. Your users will have to establish their own secure RPC passwords. Once created, these keys are stored in the publickey database.
Verify that the name service is running. If you are using NIS+, the command to do this is nisping -u. This command will tell you if the service is up and when the last updates were made. It also will tell you about replica servers. For NIS, use ypwhich. It will tell you if ypbind is running and which NIS server the system is bound to.
The process that manages authentication keys is the keyserv process (/ usr/ sbin/ keyserv). This needs to be running for secure NFS to work. Use key-login to decrypt and store the secret key. If the login password is the same as the network password, you don't have to do this.
Add the option -o sec=dh to the share options. For example:
share -F nfs -o sec= dh /export/ home
Add the entry to the auto_ master map as well. For example:
/home auto_ home -nosuid, sec= dh
NFS refuses access if the security modes do not match, unless -sec=none is on the command line.
The file /etc/. rootkey should be preserved across system upgrades and changes to ensure that the authentication continues to work. If you inadvertently destroy this file, you can create another with the keylogin -r command.
The process for Kerberos authentication involves updating the dfstab file and the auto_ master map, as shown here:
# share -F nfs -o sec= krb4 /export/ home
/home auto_ home -nosuid, sec= krb4
Again, NFS refuses the mount if the security modes do not match.
The decisions that you make regarding the layout of file systems on a single host and, even more so, across a network will determine, to a large degree, how easy your site is to manage and how well it runs. Solaris supports a plethora of file system types, many of which are not file systems in the traditional sense, but interfaces to the system. Become familiar with the tools and options at your disposal-- it will be a good investment of your time.
In this chapter, we explained that:
Let us assume that the goal is not to avoid problems, but rather to have processes and procedures in place that force them to approach you in single file (no pun intended). The manageable network is not one that runs flawlessly, but one in which there is time and capacity enough to resolve each problem without major disruption. In this initial part of the book, we provide our thoughts and suggestions to help you accomplish this goal. We make suggestions about automating your system installation and patching by using JumpStart. We encourage you to plan file systems with reliability, security, performance and manageability in mind. We provide insights into naming services such as DNS and Sun's NIS and NIS+. We detail the Solaris boot process as it moves from cold hardware to fully operational. We describe Solaris run states andprovide instructions on modifying the processes that start or shut down automatically. We discuss PROM level diagnostics that you can use to isolate problems.
We don't live in the world of infinite budgets and patient users, waiting for every problem to occur like the Maytag repair man. We assume you don't either. We're not sitting in front of a pile of brand new systems in an office space not yet filled with people and activity. We assume you're not either. Our networks are not comprised entirely of Solaris servers and Solaris clients. We assume yours are not either. We want leisure hours, happy families, and personal down time. We assume you do too. Our vantage point is that of system administrators with too much to do, working in an environment where change is the only thing that remains a constant. We hope you will find something in this section to make setup and management of your systems less chaotic.
Posted August 26, 2003
This edition is more up to date with Solaris 9, BAC, Sunfire information, and more. I liked the first edition, this one is much ticker and meatier with more useful tips and tricksWas this review helpful? Yes NoThank you for your feedback. Report this reviewThank you, this review has been flagged.