Wednesday, January 9, 2013

Anatomy of command line arguments in Linux

http://mylinuxbook.com/command-line-arguments-in-linux-part2


While designing a simple C program or a full fledged command line application, it is pretty usual to have a requirement for arguments to be passed while running the executable/application. These arguments are known as command line arguments. These parameters govern the behaviour of the program to some extent, as these are the inputs based on which output is computed/displayed.
Another usage of these command line arguments comes in the form of various options of a command, be it on Linux, Windows or any platform. In Linux, any command is actually an executable being triggered through Linux shell. In code, the entry point to this executable (in ELF format) will be the main() method . The Linux shell communicates the command line arguments to the program by passing these parameters to the main() method. In this article, we shall go through the advanced concepts related to command line arguments in Linux using C programming examples.


Command Line arguments in Linux


The main() method

The main() method can be defined in more than one way in a C program (see an interesting discussion on this here). One is, the usual, without any arguments:
 
int main()
Another way, we can define main(), if it accepts command line arguments, is:
 
int main(int argc, char *argv[])
In the above definition, it accepts following two arguments
  • An int, which is the number of arguments ie argc
  • A pointer array, which points to each argument ie argv[]
Note that, all the arguments are received in a char array, or to say a string in literal terminology.
What happens internally, when you run a program, is that the shell or gui parses the command and calls execve() which executes the linux system call execve().  The first string in your command specifies the program name that is to be run and rest are its arguments. All the strings are NULL terminated are passed to execve system call.
 int execve(const char *filename, char *const argv[], char *const envp[]);
  • argv is an array of argument strings passed to the new program where first argument should contain the filename associated with the file being executed.
  • envp is an array of strings, conventionally of the form key=value, which are passed as environment to the new program.
Both argv and envp contain NULL terminated entries.
The kernel sets up the argument vector and environment and calls the entry point of the program.
Things would be clear regarding the usage, with the help of an example. Note, these parameters are not part of the main() definition, they are inaccessible.
cmd.c
 
#include < stdio.h >

int main(int arc, char* argv[])
{
    int i = 0;
    printf(“Inside main\n”);
    for (i = 0; i < argc; i++)
    {
         printf(“ %d-th argument received is %s\n”, i, argv[i]);
    }

    return 0;
}
Check out and absorb the way main() method is defined, and how are we access the arguments inside the definition, as general ‘int’ and pointer array variables.
How do we run this now? First, lets compile it.
 
$gcc cmd.c -Wall -o cmd
Compilation went fine. Further, coming to run the built executable, it is here we need to provide the command line arguments. An example, as to how command line arguments are provided.
 
$./cmd --name “Rupali” --line 12
To have a look at the output of the executable, here it is:
Inside main
 0th argument received is ./cmd
 1th argument received is --name
 2th argument received is Rupali
 3th argument received is --line
 4th argument received is 12
Interestingly, the zero-th index, i.e. the first argument received is same as the name of the program (See here to learn how process name in Linux can be changed through this argument). The next set of arguments are received in the same order as we provided on command line i.e. ‘–name’, followed by ‘Rupali’ and then ‘–line’ and finally ‘12’. And since, the loop iterated five times, that is the number of arguments received by the ‘main()’ method.
You can try out running the same program with different number and varied command line arguments, as the nature of the program does not hold any restrictions on them.

Retrieving command line arguments

There are standard functions available for retrieving the passed command line arguments in the main program.
These are:
 
getopt()
getopt_long()
The header file that holds these protocols are
unistd.h
getopt.h
Talking about them one by one.

getopt()

Picking up its syntax from its man page, here how it’s protocol looks like:
 
int  getopt(int argc, char * const argv[], const char *optstring);
  • The first parameter ‘argc’ it takes is the number of command line options passed.
  • The second parameter is takes is a pointer to the arguments, similar to ‘main()’ method.
  • The third parameter is an option string. Now this option string, depicts the options to the program as options to a command. If an option requires some input, then the option in the option string is followed by a colon ‘:’.
  • The option string would be clearer through examples.
Following examples illustrates some example commands, and corresponding option string parameter used in the ‘getopt()’.
 
Command   Option string used in getopt()
ls -lt     “lt”
find -n  flilename   “n:”
gcc -c -o output   “co:”
Talking about what the method returns, it returns the currently parsed character option as per the option string. However, if an unrecognised option is encountered, it returns ‘?’. Hence, every time it is called, it parses and returns the corresponding option. If all the passed options have been parsed, it returns -1. There is a case, where it will return 1. It is discussed later in this section as it requires the knowledge of option string.
This method is used for parsing the short command line options and is very useful in governing the actions to be done for any particular option present. A short command line option means character options only. Any command has several command line options, however those are options. That means, it is up-to the user to mention it, if required and can always skip mentioning it if not required. Even, the behaviour of the command should be independent of the order of these options. However, if we have to implement these options in our program, it is possible, but would be complicated to compare and search the argument strings. Hence, for all the input options, the programmer would have to do parsing, finding and action.
To illustrate with an aid of an example, here is a simple ‘mkdevice’ command usage with two options available, to create a device node in the system.
 
mkdevice [options] 
Options:
-c : Create a Device with name as provided
-k : The device node to be created with this device number
To think as an end user, this command can be used in any of the following ways as per a requirement. Of course, the command can be used in more ways than following use cases.
 
$mkdevice -c mydevice
$mkdevice -k  3
$mkdevice -c  mydevice -k 3
We need to to parse the string of options to find ‘-c’ and then call its routine of action if it is present. Hope, now the sequence parse, find and action sounds more logical.
Recalling a use case where getopt() returns 1, if it encounters a prefixed ‘-’ in the option string to receive a non-option as one of the command line arguments. For example, if the command receives a filename as one of its command line arguments, it is considered as a non-option. Hence, we specify this non-option with following option string
“-co:”
Therefore, the getopt() method call looks like:
getopt(argc, argv,"-co:")
If the command name is ‘compiler’, following is how our command looks like
compiler main.c -c -o main
Here, while parsing the first option, ‘main.c’, getopt() will return 1, as it would be treated as a non-option.
Moving further, let us now implement our above example command using method ‘getopt()’ to understand more on how to use this standard method.
 
#include < stdio.h >
#include 
#include 
#include 

#define MAXLEN 30

struct Device
{
 char name[MAXLEN];
 int number;
};

int main(int argc, char** argv)
{
 char optc = 0;
 struct Device dev = {"dev_0", 112};
 int devNum = 0;

 while ((optc = getopt(argc, argv,"c:k:")) != -1)
 {
     switch(optc)
     {
         case 'c':
      printf("Creating the device of name %s\n", optarg);

      strncpy(dev.name, optarg, MAXLEN);
      break;
         case 'k':
      devNum = atoi(optarg);
      dev.number = devNum;
      break;
     default:
      printf("Invalid Option!\n");
  exit(0);
     }
 }

 printf("Device %s created !\n", dev.name);   
 printf("Device : %s\n", dev.name);
 printf("Number : %d\n", dev.number);

 return 0;
}
Before scratching the implementation, let us check out what it does.
 
$ gcc mkdevice.c -Wall -o mkdevice
$ ./mkdevice -c mlbdev -k 12
Creating the device of name mlbdev
Device mlbdev created !
Device : mlbdev
Number : 12
So, to get a clear overall picture, the above source code implementation, it creates a device node. The name of the device would be ‘dev_0’, unless specified by the ‘-c’ option and the device number by default would be 112 unless specified by the ‘-k’ option.
Understanding the source code above, how it does that, the line of code of our great interest is
 
 while ((optc = getopt(argc, argv,"c:k:")) != -1)
Here, in an iteration, it gets the options one by one to populate the variable ‘optc’ with the listed options as command line arguments by the user. For each option specified, there is a switch case, where the corresponding action is followed.
Note the variable optarg which is neither declared nor defined anywhere in the our program, but still used. It is one of the global variables which points to the next input string after an option which expects some input. Hence, in our example, when getopt() encounters ‘-c’ option, it gets to know from the option string (“c:”), that a value parameter is expected. Hence, it assigns the pointer to the very next argument which is to string “mlbdev” in our case.
Hence following are the global variables assigned values are
  • optarg : pointer to the argument inputted with the option with argument.
  • optind : index of the next element in the parameter list argv
  • optopt : In case there is any inputted option which was not recognized in the option string, it sets this variable to the actual option character.
You can take the liberty to use these global variables as per your requirement, provided you understand them. Following example depicts one such usage of all the global variables.
#include < stdio.h >
#include 
#include 
#include 

#define MAXLEN 30

struct Device
{
 char name[MAXLEN];
 int number;
};

int main(int argc, char** argv)
{
 char optc = 0;
 struct Device dev = {"dev_0", 112};
 int devNum = 0;

 /*Just to verify the name of device created before parsing aything.*/
 while ((optc = getopt(argc, argv,"c:k:")) != -1)
 {
      if(optc == 'c')
      {
      printf("DEVICE name is %s\n", optarg);
          if (strlen(optarg) > 10)
      {
/*check run-case 1*/
       printf("Invalid device name\n");
           exit(-1);
          }
      optind = 1;/*Reseting index to 1 again to start actual parsing*/
  break;
      }
 }

 while ((optc = getopt(argc, argv,"c:k:")) != -1)
 {
     switch(optc)
     {
          case 'c':
          strncpy(dev.name, optarg, MAXLEN);
          break;

          case 'k':
          devNum = atoi(optarg);
          dev.number = devNum;
          break;

      case '?':
      if (optopt == 'c' || optopt == 'k')
      {
    /*check run-case 3*/
        printf("Option -%c requires an argument\n", optopt);
        exit (-1);
      }
      break;

          default:
          printf("Invalid Option!\n");
          exit(0);
     }
 }

 printf("Device %s created !\n", dev.name);   
 printf("Device : %s\n", dev.name);
 printf("Number : %d\n", dev.number);

 return 0;
}
As we run it,
run case 1
./mkdevice -c ff123456789 -k 5
DEVICE name is ff123456789
Invalid device name
run case 2
$ ./mkdevice -c okdev -k 15
DEVICE name is okdev
Device okdev created !
Device : okdev
Number : 15
run case 3
$ ./mkdevice -c okdev -k
DEVICE name is okdev
Option -k requires an argument
The ‘getopt()’ method is there to make our lives easier, not to worry about implementation of this parsing, finding and action. It parses, until we reach the end of the parameter list.

getopt_long()

Linux also supports long options, which are more than a character. Such options are generally prefixed with double dashes, and are called long options.
For example:
 
mycmd --display --file file.txt
ls --all
A command can support both short and long options. To support long options, we have method ‘getopt_long()’. Here is the syntax taken from the man page :
 
    #include 

    int getopt_long(int argc, char * const argv[],
               const char *optstring,
               const struct option *longopts, int *longindex);
First, second and third parameters are same as in ‘getopt()’ method. If there are no short options, and only long options with a double dash, then the parameter ‘optstring’ should be an empty string and not null. The third parameter ‘longopts’ is a pointer to an array of structures named ‘option’, which is defined in getopt.h as
struct option
{
  const char *name;
  /* has_arg can't be an enum because some compilers complain about
  type mismatches in all the code that assumes it is an int.  */
  int has_arg;
  int *flag;
  int val;
};
The structure has a name field which stores the name of the long option. has_arg stores no_argument(=0), required_argument (=1) or optiona_argument (=2) depending upon if it has an argument or not. A flag is a pointer which may be NULL or non-NULL. If it is a valid pointer holding a non-NULL value, then it would point to the val member of the struct, during processing.
However, it is is NULL, getopt_long() would return the value ‘val’, the member of this struct. In cases, where there are long and short options for the same behaviour, member ,’val’ can be assigned the corresponding short option character, which can be used similar to the usage of ‘getopt()’. Hence, if all long options have a corresponding short options, we can conveniently use ‘getopt_long()’ with structure having its flag set to NULL and val equal to the short option character, which would be then returned. One can develop a better understanding after seeing an example.
Prior to an example, and coming back to the parameters of ‘getopt_long()’, the last parameter i.e. ‘longindex’ points to the index of the ‘option’ structure array which is being encountered. An important implementation point to note is, the array of ‘option’ struct should have the last element with all zero’s.
Using the same command ‘mkdevice’ example source as illustrated above, we’ll add following long options to it using getopt_long() method.
--create : create a device with name as argument
--number : device should have device number as provided argument.
The modified source code:
 
#include < stdio.h >
#include 
#include 
#include 
#include 

#define MAXLEN 30

struct Device
{
 char name[MAXLEN];
 int number;
};

int main(int argc, char** argv)
{
 char optc = 0;
 struct Device dev = {"dev_0", 112};
 struct option cmdLongOpts[] = {
 {"create", required_argument, NULL, 'c'},
 {"number", required_argument, NULL, 'k'},
 {0, 0, 0, 0}
};
 int devNum = 0;

 while ((optc = getopt_long(argc, argv,"c:k:", cmdLongOpts, NULL)) != -1)
 {
    switch(optc)
    {
        case 'c':
        printf("Creating the device of name %s\n", optarg);

        strncpy(dev.name, optarg, strlen(optarg));
        break;
        case 'k':
        devNum = atoi(optarg);
        dev.number = devNum;
        break;
    default:
        printf("Invalid Option!\n");
        exit (0);
    }
 }

 printf("Device %s created !\n", dev.name);   
 printf("Device : %s\n", dev.name);
 printf("Number : %d\n", dev.number);

 return 0;
}
Compiling and running it:
 
$ gcc mkdevice.c -Wall -o mkdevice
$ ./mkdevice --create mlbdev
Creating the device of name mlbdev
Device mlbdev created !
Device : mlbdev
Number : 112

$ ./mkdevice --create mlbdev --number 456
Creating the device of name mlbdev
Device mlbdev created !
Device : mlbdev
Number : 456

$ ./mkdevice --create mlbdev -k 321
Creating the device of name mlbdev
Device mlbdev created !
Device : mlbdev
Number : 321

Interesting notes

1) In case of long options, if option string contains ‘W;’, that is, a ‘W’ followed by a semi colon, then the command also accepts long options prefixed with ‘-W’ along with double dash (–).
For example, in our example source if the call to getopt_long() is:
 
optc = getopt_long(argc, argv,"c:k:W;", cmdLongOpts, NULL)
Then, following example usage will also work.
 
$ ./mkdevice -Wcreate mlbdev -k 321
$ ./mkdevice -Wcreate mlbdev -Wnumber 321
2) Generally short command line options have a dash prefixed and long command line options have double dash prefixed.
 
$ ls -l -t
$ mkdir --help
3) Generally command line options can be in any order, and behaviour will not change if the order of options change.
4) Make sure, while working the argument list i.e. argv, it should never be altered.

References

http://www.cprogramming.com/tutorial/c/lesson14.html
http://computer.howstuffworks.com/c38.htm
http://www.codingunit.com/c-tutorial-command-line-parameter-parsing

No comments:

Post a Comment