CTS 2301C (Unix/Linux Administration I) Project #3
Hard Disk Administration

 

Due: by the date shown on the syllabus and in Canvas

Background:

When using a modern journaling filesystem such as ext4, by default fsck (filesystem check) is never forced.  With a traditional (older) filesystem type such as ext2, the system defaults to checking the filesystem every so many reboots (technically after X number of mounts, but since all filesystems are usually mounted at boot time it comes to the same thing), or every so many months, or after an improper shutdown, whichever comes first.

A problem with traditional filesystem setup occurs with large disks.  Since all storage volumes have the same default setup, once a check is forced for any reason all filesystems are checked at once.  This can take a very long time!  A staggered schedule can be used to avoid this problem.  An issue with journaling filesystems is that an occasional error can still occur, and if no checking is ever performed the error can snowball, causing other problems.  (Also, hackers can induce errors even with a journaling filesystem.)

A staggered schedule means that each filesystem is still checked at the same frequency but not all on the same day.  For example, if you have only two filesystems you can have each one checked every six months, the first filesystem every January and July 1st, and the second filesystem every February and August 1st.  This is done by changing the date of the last check for each filesystem to different values, while keeping the six-month interval between checks the same on both filesystems.

The same staggering can be done if you force checks by the number of reboots.  If you have both filesystems checked every 20 reboots say, change the current mount count so that the first filesystem thinks it has been mounted X times and the second “X+1” times.

This staggering of disk checking generalizes to any number of filesystems.  This way, except after an improper shutdown, only a single filesystem gets checked at any one time.

The concept of a staggered schedule is used in many areas, such as applying update to clusters of servers, or scheduling backups.  Be sure you understand this concept.

In addition to setting a checking schedule, you will explore other disk related commands: hdparm to check drive capabilities, change settings, and perform timing tests; and smartctl to perform health checks on your disks.  Note that this part of the project requires real hardware; this won't work on a virtual machine.

Description:

Answer the following questions and perform the indicated tasks:

Part I — Changing Tunable Parameters

  1. For each ext4 filesystem on your Fedora system, when are checks forced by default?  Use the tune2fs utility to find out.

    Note!  There is an issue with SE Linux security subsystem that may prevent you from running tune2fs on some storage volumes).  If you are root and are using the correct device name as shown by the mount command, and you still get various “permission denied” errors, then you may need to set SE Linux to permissive mode, to complete this project.

    Run the command “getenforce”.  If the output shows “Enforcing” then you should run the command “setenforce 0” to change the mode to “Permissive”.  Now the tune2fs commands should work (even if they still produce error messages)!

  2. Next adjust the parameters that control when fsck checks are forced.  You should make sure all filesystems are checked regularly, but on a staggered schedule as described above.  Note this utility works for ext4 filesystems too, but not for other types of filesystems.  (Other tools may or may not be available for other filesystem types.)  What commands did you use?

    Part II — Manually Checking Storage Volumes

  3. Why would it not be a good idea to run fsck on a mounted filesystem?  How (or when) can the root disk volume be safely checked?
  4. Unmount your /home volume with umount, and manually check it (and no other volume) for errors using the appropriate fsck command for the filesystem type used for that storage volume.  What is the type for your filesystem mounted at /home?  What is the name of the fsck utility for that type of filesystem?

    Note!  You can't unmount a volume (using the umount command) if it is in use by any process.  /home will be in use if you logged in as a non-root user.  You must log out and then login as root.  To do this, it may be easier to use a virtual console rather than the GUI: after logging out, hit control+alt+F2 to switch to a non-GUI console window.  (control+alt+F1 through F7 are all different virtual consoles, one or more is usually a GUI.)  Later you can switch back to the GUI console using control+alt+F7 (sometimes F1 is used as the GUI console and F2 is the command line).  If you still can't unmount /home, try using the command “fuser -m /home” to see what process is using that volume.  Then you can kill that process, and then the umount command should work.

    What was the exact fsck command you used?  What was the output?

  5. Run the tune2fs utility on the storage volume for /homeWhat is the mount-count and last-time-checked values now?

    If the “last checked” date hasn't changed, it is because fsck won't actually check if it doesn't think the filesystem needs it.  If this happened to you, repeat the previous step using the “-f” option to fsck.

    Remount /home using mount and examine the mount-count again.  Has it increased by one?
  6. Reboot your computer into single user mode.  (That is, reboot into run level 1; don't just change run-levels at the command line.)  On some systems, single-user mode is run-level 1, or (with systemd init) the “rescue.target”; typing “1” always works.  How exactly did you do this?

    Hint:  You can add the run-level number of 1 to the GRUB boot prompt if you have configured GRUB to show one.  To get a GRUB prompt at boot time if it doesn't show by default, edit /etc/default/grub and make sure that “TIMEOUT” is not set to zero.  If so, change it to “5” (which is the number of seconds to display the GRUB menu before automatically booting), and comment out the “GRUB_TIMEOUT_STYLE=hidden” line if present (or you have to hit the escape key to show the menu).  (See GRUB configuration settings for details.)

    Booting into single user mode hopefully causes the boot process to stop while the root storage volume is mounted in read-only mode, making it safe to run fsck on it (but don't do that yet).  However this varies by distro, so you can't count on it!  Some operating systems mount all filesystems even in single user mode, and some ask for the root password.  (Sometimes such distributions have an “emergency” mode that doesn't mount anything or require any password.)

    The code that runs before init (that creates the initial ram disk, among other tasks) is known as dracut on modern Linux systems, including Fedora.  Many useful GRUB (kernel) command line options are documented in dracut.cmdline(7).  One of these options allows you to drop to a root shell, before mounting any volumes (including the real root filesystem).  Add the following to the grub kernel options at boot time:

    rd.break=pre-mount
    

    This should work better than single or emergency.  From the shell prompt that appears, you should be able to run the fsck command (the command name used depends on the type of the filesystem) on the (real) root filesystem.  (However, some volumes such as /home may not be available, and most commands won't be available either; this is because only the RAM disk is mounted and it doesn't have most commands or /etc files in it.)

    When done, exiting the shell should cause the boot process to continue.

  7. After booting into single user mode and once the shell prompt appears, run the commands “mount”, “findmnt”, and “lsblk”.  What storage volumes (if any) were mounted, other than root and swap?  Before you can run fsck safely, you must first un-mount any mounted filesystems.  Depending on your version of Unix or Linux, you may or may not be able to un-mount the root volume while in single user mode.  If so, you can probably remount it as read-only.  (Or try the dracut boot option mentioned above, instead of trying single user mode; note this will likely mean commands such as lsblk and files such as /etc/fstab won't be available at this time.  The mount command will be available however.)

    You can un-mount (with umount) most filesystems if not busy.  But you may find some filesystems are busy (the one holding /var/log/* for example) and those can't be un-mounted until you stop (“kill”) whatever processes are using files on it.  Or wait for them to finish on their own.  One way to find those processes is the command:

       fuser /var/log/*

    As for the root filesystem, if you can't un-mount it, you can remount it as read-only with the correct mount options.  The command is:

       mount -no remount,ro /

    Now you can safely run the correct fsck utility for that type of filesystem.

    Note is that the output of “mount” won't show the root filesystem mounted as read-only; it may still show it as “rw”!  This is because that status is saved in the file /etc/mtab which is updated when you run mount.  But, once you change the root filesystem to read-only /etc/mtab can't be updated, so the old “rw” status can't be changed.  However the system does know the filesystem is mounted as read-only; view /proc/mounts instead for accurate status.  (Modern Fedora no longer has /etc/mnt as a separate file; instead it is a symlink to /proc/mounts.)

  8. Run the correct fsck utility for /home volume, after determining its type.  What is the output?  Now, run the command again, this time using the filesystem-specific option to force a check.  (Hint: you need to check the correct man page to find that option.)  What is the output this time, and why was it different from the first time?
  9. Bring your system fully up to your normal run-level (“3” for non-GUI, “5” for GUI).  From single user mode (or emergency mode), use the telinit command to change the run-level.  Note this won't work unless you remount any previously un-mounted filesystems, and remount the root filesystem as read-write!  (If you used the dracut option to interrupt the boot process, don't try telinit.  Simply exit the shell, and the boot to the normal run-level will resume.)

    Part III — Managing Storage Volume Health and Performance

    Note that some of these commands may not work on virtual disks.  You have three choices: Install smartmontools for your host OS, use a Live Linux distro and boot from a Flash drive, or skip this section (won't affect your grade).  If you chose to skip, please read through all the material anyway. To create a Live Kali Linux flash drive from Windows:

    1. Install Rufus to create a bootable flash drive;
    2. Download (and verify) Kali Linux (Live).
    3. Insert a USB flash drive you are willing to overwrite.  Run Rufus to create a a bootable flash drive from your Kali Linux download.

    Do not try to run hdparm on your computer's disks as that command can be dangerous!  (Running smartctl should be safe.)

  10. Use smartctl --scan to identify your disks.  Which of the names shown are your hard disks?  If you have hardware RAID, which one of the names is for that?
  11. SATA, PATA, SAS, and older IDA disk drives can be examined and controlled by the hdparm command.  While designed for (E)IDE disk drives, many of the options will work for SCSI drives as well.  Using hdparm, determine the disk geometry for your disk (and show it).  What option(s) did you use?
  12. Determine the drive identity using both the “-i” and “-I” options for hdparmWhat is the identity data for each of your drives, as shown using each option?  When might this information be useful when configuring your system?
  13. Using hdparm disable the write cache on your disks.  What was the exact command used?  When would this be a good setting to use?
  14. Perform a timing test on your disk to determine the throughput, using “hdparm -t disk”.  Record the MB/Sec value.  Repeat the test 9 more times, recording all ten values.  Now bring the system into single user mode (so that nothing is running) using telinit.  Repeat the previous test another 10 times.  Explain your results.
  15. If you have modern ATA or SCSI disks you can get all sorts of information about your disk using smartctl command, part of the smartmontools package.  Run the command (as root) “smartctl -i /dev/name-of-your-disk”.  Is SMART support enabled for this drive?  (If not but it is available, attempt to enable it with “smartctl -s on”, and try again.)  What is the make, model, and capacity reported for your disk?
  16. Perform a drive test.  (Note that these can be run regularly using smartd, if you configure that on Linux.)  Run the command “smartctl -t short /dev/name-of-your-disk”.  when completed, check on the result of the test: “smartctl -l selftest /dev/name-of-your-disk”.  Were any problems reported?
  17. Next run a drive health check using the command “smartctl -H /dev/name-of-your-disk”.  Did your drive pass?
  18. Finally, examine the data maintained by your drive, using the command “smartctl -A /dev/name-of-your-disk”.  How many times has your drive been power-cycled (attribute 12)?  How many hours has it been powered up (attribute 9)?  Which attributes (if any) indicate the drive is about to fail?

To be turned in:

The answers to the questions above and the portion of your system journal describing the changes you made.  Be sure to include a table or list, that shows for each filesystem when checking will be forced.  (That is, list the schedule for checking—the disk checking policy—in an easy to read way, and don't merely write down the commands you typed.)

A copy of the section of your journal showing just the changes for this project, with your name clearly printed at the top, should be submitted to the correct Canvas dropbox for this assignment.  The name(s) of other classmates you worked with must be included.

Send questions about the assignment to .  Please use a subject similar to “Unix/Linux Admin I, Project 3 (Hard Disk Administration) Questions” so I can tell which emails are questions about the assignment (and not submissions).

Please see your syllabus for more information about project grading and also about submitting projects.

Information on tune2fs utility:

tune2fs is a Linux utility that allows you to examine and change the settings in the superblock, which is the name given to the part of a filesystem that holds the filesystem label, its size and type, and other information.  This tool only works for ext2, ext3, and ext4 filesystem types.  You must be logged on as root in order to use tune2fs.  The command to examine the values in the superblock of some storage volume such as /dev/sda1 is:

     # tune2fs -l volume

There are four parameters that control when a check is forced:

Do not attempt to change any other values!  This can be a dangerous command so be careful what you change!

The max-mount-counts parameter is the number of times the filesystem can be mounted before it will be automatically checked for errors using fsck.  Since most volumes are mounted once each time the system is booted, this often is a count of reboots.  The mount-count parameter is the number of mounts since the last check.

The interval-between-checks parameter is the amount of time that is allow to pass before a check will be forced (at the next mount).  The time-last-checked parameter is the amount of time since the last check was forced.

Why two schemes?  Because many reboot cycles in a short interval of time often means problems or changes are occurring, so checking every so many reboots is reasonable.  But normally a Unix system stays up for long periods of time without requiring any reboots, often months or years, so waiting for 10 reboots before checking for errors may allow some error to go undetected for long periods of time.  So it makes sense to scan the disk for errors every few months as well.  (If the system doesn't shutdown normally, a scan is forced at the next reboot.)

See the man pages for more information about tune2fs, hdparm, and smartctl and smartd.