Error Handling in Scripts

Handling errors is part of programming. Even if you write flawless code, you can still run into error conditions. The environment on your computer changes over time, as you install and uninstall software, create directories, and perform upgrades and updates.

For example, a script that used to run without issue can run into difficulties if directory paths change, or permissions are changed on a file. The default action of the Bash shell is to print an error message and continue to execute the script. This is a dangerous default.

If the action that failed is critical to some other processing or action that happens later in your script, that critical action will not be successful. How disastrous that turns out to be, depends on what your script is trying to do.

A more robust scheme would detect errors and let the script work out if it needed to shut down or try to remedy the fault condition. For example, if a directory or file is missing it may be satisfactory to have the script recreate them.

If the script has encountered a problem from which it cannot recover, it can shut down. If the script has to shut down, it can have the chance to perform whatever clean-up is required, such as removing temporary files or writing the error condition and shutdown reason to a log file.

Detecting the Exit Status

Commands and programs generate a value that is sent to the operating system when they terminate. This is called their exit status. It has a value of zero if there were no errors, or some non-zero value if an error occurred.

We can check the exit status—also known as a return code—of the commands the script uses, and determine whether the command was successful or not.

In Bash, zero equates to true. If the response from the command is anything other than true, we know a problem has occurred and we can take appropriate action.

Copy this script into an editor, and save it to a file called “bad_command.sh.”

You’ll need to make the script executable with the chmod command. This is a step that’s required to make any script executable, so if you want to try the scripts out on your own machine, remember to do this for each of them. Substitute the name of the appropriate script in each case.

When we run the script we see the expected error message.

There’s no such command as “bad_command”, nor is it the name of a function within the script. It cannot be executed, so the response is not zero. If the response is not zero—the exclamation point is used here as the logical NOT operator—the body of the if statement is executed.

In a real-world script, this could terminate the script, which our example does, or it could try to remedy the fault condition.

It might look like the exit 1 line is redundant. After all, there’s nothing else in the script and it is going to terminate anyway. But using the exit command allows us to pass an exit status back to the shell. If our script is ever called from within a second script, that second script will know that this script encountered errors.

You can use the logical OR operator with the exit status of a command, and call another command or a function in your script if there is a non-zero response from the first command.

This works because either the first command runs OR the second. The left-most command is run first. If it succeeds the second command is not executed. But if the first command fails, the second command is executed. So we can structure code like this. This is “logical-or./sh.”

We’ve defined a function called error_handler . This prints out the exit status of the failed command, held in the variable $? and a line of text that is passed to it when the function is called. This is held in the variable $1. The function terminates the script with an exit status of one.

The script tries to run bad_command which obviously fails, so the command to the right of the logical OR operator, ||, is executed. This calls the error_handler function and passes a string that names the command that failed, and contains the line number of the failing command.

We’ll run the script to see the error handler message, and then check the exit status of the script using echo.

Our little error_handler function provides the exit status of the attempt to run bad_command, the name of the command, and the line number. This is useful information when you’re debugging a script.

The exit status of the script is one. The 127 exit status reported by error_handler means “command not found.” If we wanted, we could use that as the exit status of the script by passing it to the exit command.

Another approach would be to expand error_handler to check for the different possible values of the exit status and to perform different actions accordingly, using this type of construct:

Using set To Force an Exit

If you know that you want your script to exit whenever there is an error, you can force it to do that. it means you forgo the chance of any cleanup—or any further damage, too—because your script terminates as soon as it detects an error.

To do this, use the set command with the -e (error) option. This tells the script to exit whenever a command fails or returns an exit code greater than zero. Also, using the -E option ensures the error detection and trapping works in shell functions.

To also catch uninitialized variables, add the -u (unset) option. To make sure that errors are detected in piped sequences, add the -o pipefail option. Without this, the exit status of a piped sequence of commands is the exit status of the final command in the sequence. A failing command in the middle of the piped sequence would not be detected. The -o pipefail option must come in the list of options.

The sequence to add to the top of your script is:

Here’s a short script called “unset-var.sh”, with an unset variable in it.

When we run the script the unset_variable is recognized as an uninitialized variable and the script is terminated.

The second echo command is never executed.

Using trap With Errors

The Bash trap command lets you nominate a command or a function that should be called when a particular signal is raised. Typically this is used to catch signals such as SIGINT which is raised when you press the Ctrl+C key combination. This script is “sigint.sh.”

The trap command contains an echo command and the exit command. It will be triggered when SIGINT is raised. The rest of the script is a simple loop. If you run the script and hit Ctrl+C you’ll see the message from the trap definition, and the script will terminate.

We can use trap with the ERR signal to catch errors as they occur. These can then be fed to a command or function. This is “trap.sh.” We’re sending error notifications to a function called error_handler.

The bulk of the script is inside the main function, which calls the second and third functions. When an error is encountered—in this case, because bad_command doesn’t exist—the trap statement directs the error to the error_handler function. It passes the exit status from the failed command and the line number to the error_handler function.

Our error_handler function simply lists the details of the error to the terminal window. If you wanted, you could add an exit command to the function to have the script terminate. Or you could use a series of if/elif/fi statements to perform different actions for different errors.

It might be possible to remedy some errors, others might require the script to halt.

A Final Tip

Catching errors often means pre-empting the things that can go wrong, and putting in code to handle those eventualities should they arise. That’s in addition to making sure the execution flow and internal logic of your script are correct.

If you use this command to run your script Bash will show you a trace output as the script executes:

Bash writes the trace output in the terminal window. It shows each command with its arguments—if it has any. This happens after the commands have been expanded but before they are executed.

It can be a tremendous help in tracking down elusive bugs.

RELATED: How to Validate the Syntax of a Linux Bash Script Before Running It