Research Computing >> Hardware >> skew4 >> Running large / multiple jobs in skew

LARGE / MULTIPLE JOBS IN SKEW

Contents:

Note: While these tips were written with Kellogg's Unix server in mind, they are also applicable to other single-server Unix systems. Tips on running jobs on cluster systems, such as SSCC, are covered separately. For more information, please review the manual pages for the commands mentioned.


Job control

A job or process can be in one of three states: running in the foreground, running in the background or suspended. Most of the time, when you run a program, the system will 'fork' a child process and assign it a process ID (PID) number. This ID number is useful if you need to terminate the program. (The program will not get a process ID if it is a command built into the shell)

When you submit a command in the foreground, you cannot submit another command in the same terminal window until the program finishes running. In practical terms, you will not see your prompt until the job submitted ends, it is suspended or it is cancelled.

To run a job in the background, just add an ampersand ("&") at the end of the command. You can run in the background any job that does not require your input. For example, the following command would run a SAS program called "test.sas" in the foreground:

sas test

To run the same program in the background, type:

sas test &

Managing your jobs:

  • To suspend a job running in the foreground, press CTRL-Z (hold down the Control key and press "z"). This stops the job temporarily (it can resume execution later).
  • To cancel a job running in the foreground, press CTRL-C. This stops the job; the job does not longer exist.
  • Listing existing jobs: Within one session, you can see the list of active jobs using the jobs command. This command will list a job number, the status (running, suspended) and the command. If you use the "-l" option ("jobs -l"), it will display the process IDs as well. For example, if "test.sas" was running in the background:
     
     [1]  + 19454 Running                       sas test
     
     
    The job ID is 1, the PID is 19454. The "+" indicates the most recently submitted job; a job submitted prior to "sas test" would be indicated by a minus ("-"). Note that the "jobs" command only lists jobs within one session. If you logged out or you are monitoring from a different session, use the ps in combination with the grep command to get a list of all the jobs running for your login ID:

     
     % ps -ef | grep netid
    
          UID   PID  PPID  C    STIME TTY      TIME CMD
        netid 18598     1  0   Feb 23 ?        0:04 sas
        netid 19698 19690  0 08:10:23 pts/19   0:07 stata -k1000000
        netid 19815 19796  0 08:19:51 pts/21   0:06 emacs crrhs.do
    
     
     
    The "e" option lists information about all the processes running in the workstation, while the "f" option request full information (including the owner of the process The second column in this command's output is the process ID number, which you may use to terminate (see the "kill" command below) or stop (check the manual page on "stop" in skew) the job.

  • Sending a job to the background: If you submitted a job in the foreground and want to send it to the background, suspend it (CTRL-Z) and send it to the background using the "bg" command. For example, in the following command sequence, commands issued by the user are in bold:

     
     % sas test
     ^Z
     Suspended
     % bg
     [1]    sas test &
     
     
  • Bringing a background job into the foreground: Use the "fg" command. If you have a single job running in the background, "fg" will bring it to the foreground. Otherwise, use "fg %j", where "j" is the job ID.

  • Terminating or "killing" a job: To terminate a job, use the "kill" command:
     
     kill PID
     
    
    where PID is the process ID. "kill" here is the equivalent of CTRL-C, except CTRL-C can only terminate a job running in the foreground. For "stubborn" jobs (jobs that catch the interrupt signal), use the "-9" option, which sends an "uncatchable" interrupt signal:
     
     kill -9 PID
     
     
    The disadvantage of "kill -9" is that the children processes may not be cleaned completely.

For more details, read the pages on Managing jobs and processes

Tips for large jobs

There are two commands users may look into for jobs that are going to take a long time to run or for jobs that are computationally intensive:

  • nice and renice: The "nice" command alters the priority of a job in the system. If a user lowers the priority of a job running in the background by the maximum allowed (19), it improves performance for the users running processes in the foreground. At the same time, the "niced" job gets a larger share of the CPU's time, as each time the CPU is idle (which is most of the time), it is used for the niced process. Thus, a large job will run faster when "niced", unless many users are doing the same. To "nice" a job add the nice command before the command or program you want to run. For example, to nice an SPSS job:
     
     nice -19 spss -m < pop.sps > pop.out &
     
    
    For more information, refer to the manual pages on nice and renice.

  • nohup: Some processes are susceptible to hang ups when the user logs out: the logout process sends a termination signal to all children processes; some programs running in the background may not catch that signal and will terminate. To run a process immune to hang ups, use the "nohup" (no hang up) command. For example:

     
     nohup matlab < testing.m &
     

Running multiple jobs

Skew is a shared resource. Abstain from running several jobs at concurrently, especially jobs that utilize the CPU or RAM memory intensively (e.g., Matlab or Gauss). Please review our Unix policies.

If you have multiple jobs, run them sequentially. There are two ways of accomplishing this:

  1. Issue the sequence of commands, separated by a semi-colon (;). For example, if you had three SAS programs to run (f1.sas, f2.sas, f3.sas), you could run the three jobs one after the other with the following command:
     
     % sas f1; sas f2; sas f3 &
     
     
  2. Write a shell script: This is simply a text file with the sequence of commands you would type at the prompt. To create the script
    1. Create a text file (call it 'statjobs', for example) in any text editor.
    2. Start the file with a reference to the shell (bash, ksh, tcsh, csh) used to interpret the commands in the script.
    3. Type the commands in the same way you would write them at the prompt.
    4. Make the file executable using the chmod command: chmod +x statjobs or chmod 700 statjobs.
    5. Run the script in the background: nice +19 statjobs & or statjobs &

     

    For example, the following script runs two Stata programs, pgm1.do and pgm2.do:
    
     #!/bin/csh
     stata -b do pgm1
     stata -k 1000000 -b do pgm2
     
    Shell scripts can automate some repetitive tasks, such as compiling a Fortran program and running it. There are a couple of shell scripts available in our samples web page .

© 2001-2010 Kellogg School of Management, Northwestern University