Shell Scripting: Mutex Locks

Mutexes in Linux shells#

Linux shells do not have built-in mutexes. However, it is possible to simulate them using the following approaches:

1. Implementation With `mkdir`#

The mkdir command in Linux is atomic on POSIX-compliant local file systems. This means that creation of a new directory either completes successfully or fails, and there is no intermediate state.

If two or more processes attempt to create the same directory simultaneously, only one will succeed, and the rest will receive an error. This makes the mkdir command a suitable and portable way for creating locks.

Before entering the critical section, the process checks if the lock directory exists. If the directory does not exist, then it means no other process is in the critical section or accessing the protected resource.

The process can then attempt to create the lock directory. If the directory is created successfully, then the lock is obtained and the process can enter the critical section. If there is an error creating the directory, then another process must be entering or has entered the critical section. The process can then choose to wait or abort execution.

Once the process is done, it can delete the lock directory to release the lock. A trap can also be set to clean up the directory in case the script exits cleanly.

1
LOCK_DIR="/tmp/lock_file"
2

3
if ! mkdir "$LOCK_DIR" 2>/dev/null; then
4
    # resource is already locked
5
    # : exit or wait for it to be unlocked
6
    echo "Another processes has the lock."
7
    exit 1
8
fi
9

10
trap 'rmdir "$LOCK_DIR"' EXIT
11
# lock acquired on resource
12
# enter your critical section here...

`mkdir` caveats#

The biggest problem for using the mkdir command is stale locks.
Stale locks result from the lock directory not being successfully deleted.
This can happen in the following situations:
- When the script is killed by a SIGKILL signal, signal 9, before the lock directory is cleaned up. This signal cannot be trapped by a shell script.
- When the script’s execution is abruptly stopped due to a system crash or a power loss.
When a stale lock results, subsequent executions may fail to create the lock directory.
On Network file systems, the atomicity of the mkdir command cannot be guaranteed due to issues such as caching.
Use of the -p flag changes the behaviour of the command.
The mkdir approach relies on a non-zero exit code if the lock directory exists and a zero exit code if it is created successfully. With the -p flag, if the lock directory exists, a non-zero exit code is not returned breaking the mutex implementation.
This approach also lacks lock management features such as lock timeouts, lock information, and lock cleanup.
To use this approach, these features have to be manually implemented.

2. Implementation With `flock`#

flock is a Linux utility used to manage advisory locks on open files from within a shell script or the commandline. The utility helps to achieve synchronized access to a file protecting its contents from accidental corruption. It relies on the underlying flock() system call, which associates the lock with the file descriptor a process holds for an open file.

The lock is an advisory lock and thus it is not enforced by the kernel. This means that a process can still have access to the file even if it is locked. All processes that access your resource must use flock and respect the lock for it to work.

It can be used to obtain shared or exclusive access to a file, wait on a lock on a file, or release a lock on a file. The default behavior of the utility is an exclusive lock. Any locks not released are automatically released when the process ends.

NOTE
The lock file can be either a separate lock file or the actual data file to be protected by the mutex.
A personal recommendation is to use the data file as the lock file unless your requirements need otherwise, such as when protecting a directory. This way, you have fewer file system footprints and also simplify your script logic.

`flock` flags#

flag	description
-w seconds	If the lock is not immediately available, wait for x seconds for it to be available. If x seconds elapse and it’s not available, flock fails and exits.
-s	Obtain a shared lock used to read the file contents safely. Multiple processes can obtain a shared lock on the same file.
-x	Obtain an exclusive lock used to modify the file contents safely. Prevents any other process from acquiring a shared or exclusive lock.
-u	Removes an existing lock held by the current process.
-n	If the lock cannot be acquired, flock fails immediately without blocking.
-o	Close the file descriptor on which the lock is held. Useful when a process may fork a child process and the file descriptor and lock should not be inherited.
-c cmd	Execute the command string cmd with the lock held.

Simple `flock` usage#

1
# Exclusivley locks the file data.txt for writing
2
flock data.txt -c '
3
    DAT=$(cat data.txt);
4
    echo $((DAT+1)) > data.txt;
5
    cat data.txt
6
'
7

8
# Obtain a shared lock on data.txt for reading
9
flock -s data.txt -c 'cat data.txt'

`flock` with file descriptors#

A file descriptor is usually a non-negative integer used to uniquely identify a system resource in the context of a running process. The system resource can be a file, directory, pipe, device, or network socket.

You may already be familiar with the 3 default file descriptors that a process starts with:

FD No.	Abbreviation	Name	Default resource
0	stdin	Standard Input	Keyboard
1	stdout	Standard Output	Console Screen
2	stderr	Standard Error	Console Screen

flock can be used with file descriptors as well.

1
# In this example the data file is also the lock file
2

3
# open the data file and assign it the file descriptor number 300
4
DATA_FILE='data.txt'
5
exec 300 > "$DATA_FILE"
6

7
# obtain an exclusive lock on the file for writing. We use -n to overide default blocking behaviour
8
if ! flock -n 300; then
9
    echo "The DATA FILE is already locked. Exiting."
10
    exit 1
11
fi
12

13
# The exclusive lock was obtained so we can enter critical section
14
echo "In critical section"
15

16
# ---------------------
17
# if the script terminates, the file descriptor will be closed and the lock released automatically.
18
# ---------------------
19

20
# if the script continues with further processing after the critical section, you can manually unlock the file or close the FD as follows:
21

22
### This only releases the lock on the file. The file descriptor is still open and can be used later.
23
flock -u 300
24

25
### This closes the file descriptor in which case the lock is automatically released.
26
exec 300 > &-

Example Use for Mutexes#

We’ll create a simple script to demonstrate mutexes in use in Linux.

Scenario#

Say we have a file X containing a list of directories. We also have a worker script that processes the directories in our file X. The worker script picks a directory in the file X and marks it as pending, P. When done processing it, it marks the directory as complete, C. We can assume it takes 10s to process a single directory.

Solution#

If we have 10 directories in the file X, it will take 100s for all directories to be processed.

We can launch multiple instances of our worker script to speed this up. This will, however, introduce a concurrency problem. Multiple workers might pick the same unprocessed directory before it is marked as pending, P. This will potentially corrupt the data in our directory. This is where the mutex comes in.

We can lock the file X exclusively when picking a directory and unlock it once we’ve updated the status. This will serialize reads and writes on our file X, ensuring our workers are synchronized.

solution implementation#

1
/data/dir_0
2
/data/dir_1
3
/data/dir_2
4
/data/dir_3
5
/data/dir_4

1
#!/bin/bash
2

3
DATA_FILE="dirs.txt"
4
DATA_FILE_FD=200
5

6
# open data file and assign a file descriptor
7
eval "exec $DATA_FILE_FD<>\"$DATA_FILE\""
8

9
# function to attempt to the lock data file 3 times
10
function Lock_Data_File {
11
    local max_attempts=3
12
    local attempt=1
13

14
    while [ $attempt -le $max_attempts ]; do
15
        # wait for the lock to be available for 2 seconds
16
        if flock -w 2 $DATA_FILE_FD; then
17
            return 0
18
        fi
19
        ((attempt++))
20
    done
21

22
    echo "Failed to lock data file after 3 attempts..!"
23
    return 1
24
}
25

26

27
# get initial directory to process
28
if ! Lock_Data_File; then
29
    exit 1
30
fi
31

32
LINE_INDEX_TEXT=$(awk '!/.+:.+/ {printf "%d:%s\n", NR, $0; exit}' "$DATA_FILE")
33

34
while true; do
35
    # if no more data to process, exit, this will release the lock and file decriptor as well
36
    if [ -z "$LINE_INDEX_TEXT" ]; then
37
        echo "No more directories to process"
38
        exit 0
39
    fi
40

41
    LINE_INDEX=`echo "$LINE_INDEX_TEXT" | cut -d: -f1`
42
    LINE_TEXT=`echo "$LINE_INDEX_TEXT" | cut -d: -f2`
43

44
    # mark the dir as processing and unlock data file
45
    sed -i "${LINE_INDEX}s/^/P:/" "$DATA_FILE"
46

47
    flock -u $DATA_FILE_FD
48

49
    # process the dir, sleep for 10s to simulate long process
50
    echo Processing $LINE_TEXT
51
    sleep 10
52
    echo Done Processing $LINE_TEXT
53

54
    # lock the data file, mark the dir as processed and get next directory to process
55
    if ! Lock_Data_File; then
56
        exit 1
57
    fi
58

59
    sed -i "${LINE_INDEX}s/^P:/C:/" "$DATA_FILE"
60

61
    LINE_INDEX_TEXT=$(awk '!/.+:.+/ {printf "%d:%s\n", NR, $0; exit}' "$DATA_FILE")
62
done

WARNING
This is a sample script you can use to reference or base your worker logic on and not a final script. There are unhandled errors, such as updating the data file, which you may have to address as well.

Summary#

Linux shells don’t provide built-in mutex implementations.
It is possible to simulate mutex behavior using file operations, specifically the mkdir and flock commands.
The mkdir command is limited in portability and also may not work on all file systems.
The flock command is more robust and portable.
The flock command and its associated flock system call use advisory locks. These are not enforced by the kernel.
With the flock command, the worker processes must respect the advisory locks for the mutex behavior to work.

Conclusion#

Unless you have a good reason, prefer the flock command for mutex implementations in Linux shell scripts. This is more portable than using the mkdir command.

Mutexes in Linux shells#

1. Implementation With mkdir#

mkdir caveats#

2. Implementation With flock#

flock flags#

Simple flock usage#

flock with file descriptors#

Example Use for Mutexes#

Scenario#

Solution#

solution implementation#

Summary#

Conclusion#

1. Implementation With `mkdir`#

`mkdir` caveats#

2. Implementation With `flock`#

`flock` flags#

Simple `flock` usage#

`flock` with file descriptors#