Lec 17: Process Groups and Terminal Signaling

1 Pipelines and Process Groups

In the last lesson and lab, we've been discussing job control and the mechanisms that enable it. Generally, job control is a feature of the shell and supported by the terminal device driver. The shell manages which jobs are stopped or running and notifies the terminal driver which job is currently in the foreground. The terminal device driver listens for special keys, like Ctrl-c or Ctrl-z, and delivers the appropriate signal to the foreground process, like terminate or stop.

That narrative is fairly straightforward, as long as there is only one process running within a job, but a job may contain more than one process, which could complicate the actions of the terminal device driver. Additionally, jobs can be further grouped together into sessions, and the mechanisms that enable all this interaction requires further discussion. In this lesson, we will explore process grouping and how this operating system services support job control and shell features we've grown to rely on (and love?).

1.1 Pipeline of processes

Consider the following pipeline:

sleep 10 | sleep 20 | sleep 30 | sleep 50 &

Here we have four different sleep commands running in a pipeline. The sleep command doesn't read or write to the terminal; it just sleeps for that many seconds and then exits. None of the sleep commands are blocking or waiting on input from another sleep command, so they can all run independently. We just happend to put them in a pipeline, but what is the impact of that? How long will this job take to complete?

One possibility is that each sleep command will run in sequence. First sleep 10 runs, then sleep 20, then sleep 30 runs, and finally sleep 50 runs, and thus it would take 10+20+30+50 = 110 seconds for the pipeline to finish. Another possibility is that they run all at the same time, or concurrently or in parallel, in which case the job would complete when the loggest sleep finishes, 50 seconds.

These two possibilities, in sequence and in parallel, also describe two possibilities for how a pipeline is executed. In sequence would imply that the shell forks the first item in the pipeline, lets that run, then the second item in the pipeline, lets that run, and so on. Or, in parallel: the shell forks all the items in the pipeline at once and lets the run concurrently. The major difference between these two choices is that a pipeline executing in sequence would have a single process running at a time for each total job while executing in parallel, however, would have multiple currently running processes per job.

By now, hopefully, you've already plugged that pipeline into the shell and found out that, yes, the pipeline executes in parallel, not in sequence. We can see this as well using the ps command.

sleep 10 | sleep 20 | sleep 30 | sleep 50 &
[1] 4128
aviv@saddleback: ~ $ ps -o pid,args
  PID COMMAND
 3981 -bash
 4125 sleep 10
 4126 sleep 20
 4127 sleep 30
 4128 sleep 50
 4129 ps -o pid,args

1.2 Process Grouping for Jobs

The implication of this discovery, that all process in the pipeline run concurrently, is that the shell must use a procedure for forking each of the process individually. But, then, how are these process linked? They are suppose to be a single job after all, and we also know that the terminal device driver is responsible for delivering signals to the foreground job. There must be some underlying procedure and process to enable this behavior, and, of course, there is.

The operating system provides a number of ways to group processes together. Process can be grouped into both process groups and sessions. A process group is a way to group processes into distinct jobs that are linked, and a session is way to link process groups under a single interruptive unit, like the terminal.

The key to understanding how the pipeline functions is that all of these process are places in the same process group, and we can see that by running the pipeline again. This time, however, we can also request that ps outputs the parent pid (ppid) and the process group (pgid) in addition to the process id (pid) and the command arguments (args).

#> sleep 10 | sleep 20 | sleep 30 | sleep 50 &
[1] 4134
#> ps -o pid,pgid,ppid,args
  PID  PGID  PPID COMMAND
 3981  3981  3980 -bash
 4131  4131  3981 sleep 10
 4132  4131  3981 sleep 20
 4133  4131  3981 sleep 30
 4134  4131  3981 sleep 50
 4135  4135  3981 ps -o pid,pgid,ppid,args

Notice first that the shell, bash, has a pid of 3981 and process group id (pgid) that is the same. The shell is in it's own process group. Similarly, the ps command itself also has a pid that is the same as its process group. However, the sleep commands, are in the process group id of 4131, which also is the pid of the first process in the pipeline. We can visualize this relationship like so:

Figure 1: Processes grouping in pipelines

As you can see, the rule of thumb for process grouping is that process executing as the same job, e.g., a single input to the shell as a pipeline, are placed in the same group. Also, the choice of process group id is the pid of the process.

2 Programming with Process Groups

Below, we will look at how we program with process groups using system calls, and we will investigate this from the perspective of the programmer as well as how the shell automatically groups process. We will use series of fairly straight forward system calls, and to bootstrap that discussion, we outline them below with brief descriptions.

Retrieving pid's or pgid's:

pid_t getpid() : get the process id for the calling process
pid_t getppid() : get the process id of the parent of the calling proces
pid_t getpgrp() : get the prcesso group id of the calling process
pid_t getpgid(pid_t pid) : get the process group id for the proces identified by pid

Setting pgid's:

pid_t setpgrp() : set the process group of the calling process to iteself, i.e. after a call to setpgr(), the following condition holds getpid() == getpgrp().
pid_t setpgid(pid_t pid, pid_t pgid) : set the process group id of the process identified by pid to the pgid, if pid is 0, then set the process group id of the calling process, and if pgid is 0, then the pid of the process identified by pid and is made the same as its process group, i.e., setpgid(0,0) is equivalent to calling setpgrp().

2.1 Retrieving the Process Group

Each process group has a unique process group identifier, or pgid, which are typically a pid of a process that is a member of the group. Upon a fork(), the child process inherits the parent's process group. We can see how this works with a small program that forks a child and prints the proces group identifies of both parent and child.

int main(int argc, char * argv[]){
/*inherit_pgid.c*/

  pid_t c_pid,pgid,pid;

  c_pid = fork();

  if(c_pid == 0){
    /* CHILD */

    pgid = getpgrp();
    pid = getpid();

    printf("Child:  pid: %d pgid: *%d*\n", pid, pgid);

  }else if (c_pid > 0){
    /* PARRENT */

    pgid = getpgrp();
    pid = getpid();

    printf("Parent: pid: %d pgid: *%d*\n", pid, pgid);

  }else{
    /* ERROR */
    perror(argv[0]);
    _exit(1);
  }

  return 0;
}

Here is the output of running this program.

#> ./inherit_pgid
Parent: pid: 3630 pgid: *3630*
Child:  pid: 3631 pgid: *3630*

Notice that the process groups are the same, and that's because a child inherits the process group of its parent. Now let's look at a similar program that doesn't fork, and instead just prints the process group identifier of itself and its parent, which is the shell.

/*getpgrp.c*/
int main(int argc, char * argv[]){

  pid_t pid, pgid; //process id and process group for this program                                                                         
  pid_t ppid, ppgid; //process id and proces group for the _parent_                                                                        

  //current 
  pid = getpid();
  pgid = getpgrp();

  //parent
  ppid = getppid();
  ppgid = getpgid(ppid);

  //print this parent's process pid and pgid                                                                                               
  printf("%s: (current) pid:%d pgid:%d\n", argv[0], pid, pgid);
  printf("%s: (parrent) ppid:%d pgid:%d\n", argv[0], ppid, ppgid);

  return 0;
}

If we were to run this program in the shell, you might expect that both the child and the parent would print the same process group. Of course, why shouldn't this be the case? The program is a result of a fork from the shell, and thus the parent is the shell and the child is the program, and that's what just happened before, the parent and child had the same process group. But, looking at the output, that is not what occurs here.

#> ./getpgrp
./getpgrp: (current) pid:3760 pgid:3760
./getpgrp: (parrent) ppid:369 pgid:369

Instead, we find that the parent, which is the shell, is not in the same process group as the child, the getpgrp program. Why is that? This is because the new process is also a job in the shell and each job needs to run in its own process group for the purpose of terminal signaling. What we can now recognize from these examples, starting with the pipeline of sleep commands, is that a shell will fork each process separately in a job and assign the process group id based on the first child forked, as is clear upon further inspection of the output of these two examples:

#> sleep 10 | sleep 20 | sleep 30 | sleep 50 &
[1] 4134
#> ps -o pid,pgid,ppid,args
  PID  PGID  PPID COMMAND
 3981  3981  3980 -bash
 4131  4131  3981 sleep 10
 4132  4131  3981 sleep 20
 4133  4131  3981 sleep 30
 4134  4131  3981 sleep 50
 4135  4135  3981 ps -o pid,pgid,ppid,args

#> ./inherit_pgid
Parent: pid: 3630 pgid: *3630*
Child:  pid: 3631 pgid: *3630*

2.2 Setting the Process Group

Finally, now that we have learned to identify the process group, the next thing to do is to assign new process groups. There are two functions that do this: setpgrp() and setpgid().

setpgrp() : sets the process group of the calling process to itself. That is the calling process joins a process group of one, containing itself, where its pid is the as its pgid.
setpgid(pid_t pid, pid_t pgid) : set the process group of the process identified by pid to pgid. If pid is 0, then sets the process group of the calling process to pgid. If pgid is 0, then sets the process group of the process identified by pid to pid. Thus, setgpid(0,0) is the same as setpgid().

Let's consider a small program that sets the process group of the child after a fork using setpgrp() call from the child. The program below will print the process id's and process groups from the child's and parent's perspective.

/*setpgrp.c*/
int main(int argc, char * argv[]){

  pid_t cpid, pid, pgid, cpgid; //process id's and process groups

  cpid = fork();

  if( cpid == 0 ){
    /* CHILD */

    //set process group to itself
    setpgrp();

    //print the pid, and pgid of child from child
    pid = getpid();
    pgid = getpgrp();
    printf("Child:          pid:%d pgid:*%d*\n", pid, pgid);

  }else if( cpid > 0 ){
    /* PARRENT */

    //print the pid, and pgid of parent
    pid = getpid();
    pgid = getpgrp();
    printf("Parent:         pid:%d pgid: %d \n", pid, pgid);

    //print the pid, and pgid of child from parent
    cpgid = getpgid(cpid);
    printf("Parent: Child's pid:%d pgid:*%d*\n", cpid, cpgid);

  }else{
    /*ERROR*/
    perror("fork");
    _exit(1);
  }

  return 0;
}

And, here's the output:

#> ./setpgrp
Parent:         pid:20178 pgid: 20178
Parent: Child's pid:20179 pgid:*20178*
Child:          pid:20179 pgid:*20179*

Clearly, something is not right. The child sees a different pgid is different than the parent. What we have here is a race condition, which is when you have two processes running in parallel, you don't know which is going to finish the race first.

Consider that there are two possibility for how the above program will execute following the fork. In one possibility, after the fork, the child runs before the parent and the process group is set properly, and in the other scenario, the parent runs first reads the process group before the child gets a chance to set it. It is the later that we see above, the parent running before the child, thus the wrong pgid.

To avoid these issues, when setting the process group of a child, you should call setpgid()=/=setpgrp() in both the parent and the child before anything depends on those values. In this way, you can disambiguate the runtime process, it will not matter which runs first, the parent or the child, the result is always the same, the child is placed in the appropriate process group. Below is an example of that and the output.

/*setpgid.c*/
int main(int argc, char * argv[]){

  pid_t cpid, pid, pgid, cpgid; //process id's and process groups

  cpid = fork();

  if( cpid == 0 ){
    /* CHILD */

    //set process group to itself
    setpgrp(); //<---------------------------!

    //print the pid, and pgid of child from child
    pid = getpid();
    pgid = getpgrp();
    printf("Child:          pid:%d pgid:*%d*\n", pid, pgid);

  }else if( cpid > 0 ){
    /* PARRENT */

    //set the proccess group of child 
    setpgid(cpid, cpid); //<------------------!

    //print the pid, and pgid of parent
    pid = getpid();
    pgid = getpgrp();
    printf("Parent:         pid:%d pgid: %d \n", pid, pgid);

    //print the pid, and pgid of child from parent
    cpgid = getpgid(cpid);
    printf("Parent: Child's pid:%d pgid:*%d*\n", cpid, cpgid);


  }else{
    /*ERROR*/
    perror("fork");
    _exit(1);
  }


  return 0;
}

#> ./setpgid
Parent:         pid:20335 pgid: 20335
Parent: Child's pid:20336 pgid:*20336*
Child:          pid:20336 pgid:*20336*

3 Process Groups and Terminal Signaling

Where process groups fit into the ecosystem of process settings is within the terminal settings. Let's return the terminal control function, tcsetpgrp(). Before, we discussed this function as setting the foreground processes, but just from its name tcsetpgrp(), it actually sets the foreground process group.

3.1 Foreground Process Group

This distinction is important because of terminal signaling. We know now that when we execute a pipeline, the shell will fork all the process in the job and place them in the same process group. We also know that when we use special control keys, like Ctrl-c or Ctrl-z that the terminal will deliver special signals to the foreground job, such as indicating to terminate or stop. For example, this sequence of shell interaction makes sense:

#> sleep 10 | sleep 20 | sleep 30 | sleep 50 &
[1] 24253
#> ps
  PID TTY          TIME CMD
 4038 pts/3    00:00:00 bash
24250 pts/3    00:00:00 sleep
24251 pts/3    00:00:00 sleep
24252 pts/3    00:00:00 sleep
24253 pts/3    00:00:00 sleep
24254 pts/3    00:00:00 ps
#> fg
sleep 10 | sleep 20 | sleep 30 | sleep 50
^C
#> ps
  PID TTY          TIME CMD
 4038 pts/3    00:00:00 bash
24255 pts/3    00:00:00 ps

We started the sleep commands in the background, we see that there are 4 instances of sleep running, and we can move them from the background to the foreground, were they are signaled with Ctrl-c to terminate via the terminal. All good, right? There is something missing: Given that there are multiple processes running in the foreground, how does the terminal know which of those to signal to stop or terminate signal? How does it differentiate which processes are in the foreground?

The answer is, the terminal does not identify foreground process individually. Instead, it identifies a foreground process group. All processes associated with the foreground job are in the foreground process group, and instead of signalling processes individually both shell and the terminal think of execution in terms of process groups.

3.2 Orphaned Stopped Process Groups

Process group interaction has other side effects when you consider programs that fork children. For example, consider the program (orphan) below which simply forks a child, and then both child a parent loop forever:

   /*orphan.c*/
   int main(int argc, char * argv[]){

  pid_t cpid;

  cpid = fork();

  if( cpid == 0 ){
    /* CHILD */

    //child loops forever!                                                                                                                 
    while(1);

  }else if( cpid > 0 ){
    /* PARRENT */

    //Parrent loops forever                                                                                                                
    while(1);

  }else{
    /*ERROR*/
    perror("fork");
    _exit(1);
  }

  return 0;
}

If we were to run this program, we can see that, yes, indeed, it forks and now we have two versions of orphan running in the same process group.

#> ./orphan &
[1] 24468
#> ps -o pid,pgid,ppid,comm
  PID  PGID  PPID COMMAND
 4038  4038  4037 bash
24468 24468  4038 orphan
24469 24468 24468 orphan
24470 24470  4038 ps

Moving the orphan program to the foreground, it can then be terminated by the terminal using Ctrl-c.

#> fg
./orphan
^C
#> ps -o pid,pgid,ppid,comm
  PID  PGID  PPID COMMAND
 4038  4038  4037 bash
24471 24471  4038 ps

The resulting termination is for both parent and child, which is as expected since they are both in the foreground process group. While we might expect an orphan to be created, this does not occur. However, let's consider the same program, but this time, the child is placed in a different process group as the parent:

/*orphan_group.c*/
int main(int argc, char * argv[]){

  pid_t cpid;

  cpid = fork();

  if( cpid == 0 ){
    /* CHILD */

    //set process group to itself
    setpgrp();

    //child loops forever!
    while(1);

  }else if( cpid > 0 ){
    /* PARRENT */

    //set the proccess group of child 
    setpgid(cpid, cpid);

    //Parrent loops forever
    while(1);

  }else{
    /*ERROR*/
    perror("fork");
    _exit(1);
  }


  return 0;
}

Let's do the same experiment as before:

#> ./orphan_group &
[1] 24487
#> ps -o pid,pgid,ppid,comm
  PID  PGID  PPID COMMAND
 4038  4038  4037 bash
24487 24487  4038 orphan_group
24488 24488 24487 orphan_group
24489 24489  4038 ps
#> fg
./orphan_group
^C
#> ps -o pid,pgid,ppid,comm
  PID  PGID  PPID COMMAND
 4038  4038  4037 bash
24488 24488     1 orphan_group
24490 24490  4038 ps

This time, yes, we see that we have created an orphan process. This is clear from the PPID field which indicates that the parent of the orphan_group program is init, which inherits all orphaned processes. This happens because the terminal signal Ctrl-c is delivered to the foreground process group only, but the child is not in that group. The child is in its own process group and never recieves the signal, and, thus, never terminates. It just continues on its merry way never realizing that it just lost its parent. In this examples lies the danger of using process groups; it's very easy to create a bunch of orphans that will just cary on if not killed. To rid yourself of them, you must explictely kill them with a call like killall

#> killall orphan_group
#> ps -o pid,pgid,ppid,comm
  PID  PGID  PPID COMMAND
 4038  4038  4037 bash
24494 24494  4038 ps

And good riddance …

IC221: Systems Programming (SP16)