22 May 2009

Arrays and Pointer to arrays : One Dimension

In this snippet we will look at manipulating pointers to single-dimensional arrays with ease. For this we first need to review one-dimensional arrays.Subsequent snippet will cover multidimensional arrays.


***************************
Manipulating simple arrays
***************************

Suppose we have the following two declarations

int array[10];

and

int * parray[10];

Diagrammatically, the difference between the two arrays  is as shown below
two figures.

Figure-1:

              +——————————+
array [0]      |   int    |
              +——————————+
array [1]      |   int    |
              +——————————+
array [2]      |   int    |
              +——————————+
array [3]      |   int    |
              +——————————+
array [4]      |   int    |
              +——————————+
array [5]      |   int    |
              +——————————+
array [6]      |   int    |
              +——————————+
array [7]      |   int    |
              +——————————+
array [8]      |   int    |
              +——————————+
array [9]      |   int    |
              +——————————+




Figure-2:


             +——————————+         +——————————+
parray[0]     |  int *   |  ——————>|   int    | *parray[0]
             +——————————+         +——————————+
+——————————+
parray[1]     |  int *   |  ——————————————————————————————
———>  |   int
| *parray[1]
             +——————————+           +——————————+
+——————————+
parray[2]     |  int *   |  ————————>|   int    | *parray[2]
             +——————————+           +——————————+              +——————————+
parray[3]     |  int *   |  ——————————————————————————————
———> |   int    |
*parray[3]
             +——————————+          +——————————+               +——————————+
parray[4]     |  int *   |  ——————> |   int    | *parray[4]
             +——————————+          +——————————+
+——————————+
parray[5]     |  int *   |  ——————————————————————————————
—————>|   int
| *parray[5]
             +——————————+           +——————————+
+——————————+
parray[6]     |  int *   |  ————————>|   int    | *parray[6]
             +——————————+           +——————————+              +——————————+
parray[7]     |  int *   |  ——————————————————————————————
———> |   int    |
*parray[7]
             +——————————+          +——————————+               +——————————+
parray[8]     |  int *   |  ——————> |   int    | *parray[8]
             +——————————+          +——————————+
+——————————+
parray[9]     |  int *   |  ——————————————————————————————
———>  |   int
| *parray[9]
             +——————————+
+——————————+


           int * implies int pointer



In the first declaration we access the ith int in the array by performing array[i]. In the second declaration, we access the ith integer (though this int is not part of the array) by performing     *parray[i]., In the second declaration, the ith int pointer (which are elements of the array parray) can be accessed by performing  parray[i] 


***************************
Dual Nature of array names:
***************************

Array names have dual nature. Sometimes it signifies the whole array. That
is why, given

TYPE  array_name[M];

We can find the size of the array by printing out  sizeof (array_name). For
example:

Given

long     array_of_long[100] ;

Then, the size of array  array_of_long  is given by  sizeof(array_of_long.
Hence the array name signifies the whole array.

At other times the array name behaves as a pointer. The C compiler creates
a  pointer whose name is again the array name itself. And this points to
the first element of the array. Thus given:

TYPE array_name[M];

Then array_name points to the first element, i.e.  array_name[ 0 ].  Note
that the elements of the array is of the type TYPE.   array_name[ 0 ] can
be anything, an int, char, struct etc.

Diagrammatically, this can be viewed as:
Figure-3:

+——————————————+         +——————————+
|  array_name  | ——————> |   TYPE   | array_name[0]
+——————————————+         +——————————+
                        |   TYPE   | array_name[1]
                        +——————————+
                        |   TYPE   | array_name[2]
                        +——————————+
                        |   TYPE   | array_name[3]
                        +——————————+
                        |   TYPE   | array_name[4]
                        +——————————+
                        |   TYPE   | array_name[5]
                        +——————————+
                             .
                             .
                             .
                             .
                        +——————————+
                        |   TYPE   | array_name[M-1]
                        +——————————+


***************
Variable names:
***************

When we declare an array as below:

TYPE array_name[M];

The compiler provides  the following variable names.

array_name[0], array_name[1],  . . . . ,  array_name[M-1].

These variable names can be manipulated  like any other variable. It should
be noted that the M number of variables are of the type TYPE.


*****************************
Incrementing the array names:
*****************************

We saw earlier that in the case of single dimensional array, the array name
(in our example this is array_name) is also a pointer to the first element,
which in our examples is  array_name[0].

We can't increment the contents of the array name, which is a constant, but
we can do, for example

p = array_name + 1;

When we do this, p points to the next element, i.e.,  array_name[1].

In general, if we do:

p = (array_name + i)

Then p points to the ith element, i.e.,   array_name[i]

This result is important, which states that:

( array_name + i )  points to element  array_name[i] .



******************************
*************************************
Relationship between index and pointer in single dimensional array:
******************************
*************************************

We know that if p points to some object, then *p is that object
Thus, it follows that- since

(array_name + i ) points to array_name[i] ,

therefore, it follows that

*(array_name + i)  =   array_ name[i]

the above relation, rewritten as below is the heart and soul of arrays and
pointers

array_name[i] =  *(array_name + i)

An important corollary happens when we put i=0,  we have

array_name[0] =  *array_name


******************************
********
Alternative declaration of array name:
******************************
********

If we have a general single dimensional array declaration

TYPE array_name[M];

Then the array name can be considered to behave as the following
declaration. Because array_name points to an element of type TYPE.

TYPE *array_name;  /* i.e., array name points to TYPE */

Thus, there are two ways we can write a prototype of a function in which
the pointer array_name is passed.

The first method is
void f( TYPE  array_name[] );

the second method is
void f( TYPE  *array_name );



******************************
**********
Pointers to arrays (single dimensional):
******************************
**********

A pointer to a single dimensional array is declared as, for example,

char  (*p)[10] ;

Here  p points to the  whole array of 10 characters, as shown in the
diagram below



+——————————————+         +——————+
|      p       | ——————> | char |  0
+——————————————+         +——————+
                        | char |  1
                        +——————+
                        | char |  2
                        +——————+
                        | char |  3
                        +——————+
                        | char |  4
                        +——————+
                        | char |  5
                        +——————+
                        | char |  6
                        +——————+
                        | char |  7
                        +——————+
                        | char |  8
                        +——————+
                        | char |  9
+——————————————+         +——————+
|    p+1       | ——————> | char |
+——————————————+         +——————+
                        | char |
                        +——————+
                            .
                            .
                            .


Since  p points to an array of  10 chars therefore (*p) is the array of 10
chars. That is, (*p) means the whole array of 10 characters. And just to
confirm that-  we can print out sizeof(*p) we get 10 bytes.

This also means, as per our earlier snippet, if we increment p by 1, the
value will change by 10 bytes.


***************
The  Technique:
***************
Let us look at a few tricks for manipulating pointers to an array.

Before coming to the tricks, The basics are that:
any single dimensional array declaration has the following declaration

TYPE     array_name[M] ;

While pointer to an array type of declaration has the syntax.

TYPE     ( *pointer_to_array )[M] ;

If we compare the two we can see that , (*pointer_to_array) is nothing but
the array name. What this means is that wherever results we had with
respect to array_name previously, if we substitute it by
(*pointer_to_array) we will get the same result. This is the basis of the
trick. We will now apply this and see a few more methods.


We will use the following examples to show the methods.

We will compare the following two declarations to explain the methods.

char   (*p)[10] ;

char    array_name[10];

In the above case     (*p) is the name of the array

Therefore:
1.    Size of the array array_name is given by sizeof(array_name).  Since
(*p) is the name of the array, therefore size of array (*p) is given by
sizeof(*p).


2.    The compiler creates the following memory variables array_name[0],
array_name[1], array_name[ 2], .... ,  array_name[M-1] where they hold a
char. Similarly, the compiler creates the following variable (*p)[0],
(*p)[1], (*p)[2]  .... ,  (*p)[M-1], where each of the variable holds a
char.


3.    Just as array_name points to array_name[0], similarly, (*p) points to
(*p)[0].

If we perform (array_name + 1) it will point to array_name[1]. Similarly,
if we do  (*p)+1 it will point to (*p)[1]. Which means that;

(*p)[1] =  *((*p)+1)

 In general we have:

 (*p)[i] =  *((*p)+i)

A corollary of this is:

(*p)[0]  =  **p


4.    However the following analogy fails in the case of pointer to arrays.
We said that, if there is a declaration as below;

TYPE array_name[M];
then, if we wish to  pass array_name as a parameter in the function, then
there can be two ways of writing the prototype;

void f( TYPE  array_name[] );

or

void f( TYPE  *array_name );

The first method works for pointer to array, but the second method does not
work.

Thus, if we wish to pass a pointer to a single dimensional array in a
function, then function prototype should be

void f( TYPE  (*p)[] )


This will not work:
void f( TYPE  **p)



Of course, in all the  above examples of pointer to arrays we have assumed
that we have initialized ‘p’ correctly, may be by the pointer we get by a
malloc or some other means.

The following program will show more inputs about manipulating pointers to
arrays,


#include

main()
{
 int arrray[10] = {0,1,2,3,4,5,6,7,8,9};

 int (*p)[10];

 int i;

 p = (int (*)[10]) arrray;     /* initialize p by casting array */

  for(i=0;i<10;i++)
      printf("%d, %d, %d \n",arrray[i], (*p)[i], *((*p)+i));

}


We will see the output as:

0, 0, 0
1, 1, 1
2, 2, 2
3, 3, 3
4, 4, 4
5, 5, 5
6, 6, 6
7, 7, 7
8, 8, 8
9, 9, 9

which proves that  all the expressions in the printf statement are
identical.


This little trick works in the following case too.

Suppose we have

int (**q)[10];
If we have initialized q correctly then (**q) is the name of the array.
Therefore

1.  Size of array  is   sizeof(**q)

2.  The following variables are created (**q)[0], (**q)[1],  ....,
(**q)[9].

3.  (**q) points to (**q)[0] and in general  (**q)[i]  =  *((**q)+i)


The following code will show  what we have meant so far.

The data structure used in the following code is shown here:



+——————————————+         +——————————————+         +——————+
+——————————————+
|      q       | ——————> |       p      | ——————> | int  |  <—————— |
arry      |
+——————————————+         +——————————————+         +——————+
+——————————————+
                                                 | int  |  8
                                                 +——————+
                                                 | int  |  12
                                                 +——————+
                                                 | int  |  16
                                                 +——————+
                                                 | int  |  20
                                                 +——————+
                                                 | int  |  24
                                                 +——————+
                                                 | int  |  28
                                                 +——————+
                                                 | int  |  32
                                                 +——————+
                                                 | int  |  36
                                                 +——————+
                                                 | int  |  40  bytes
                       +——————————————+          +——————+
                       |    p+1       | ——————>  | int  |
                       +——————————————+          +——————+
                                                 | int  |
                                                 +——————+
                                                     .
                                                     .
                                                     .

sizeof(*p) = 40 bytes,  assuming 32-bit integer.




#include

main()
{
 int arry[10] = {0,1,2,3,4,5,6,7,8,9};

 int (*p)[10];

 int (**q)[10];

 int i;


 p = (int(*)[10])arry;   /* initialize p with pointer array */

 q = &p;      /* initialize q with the address of p */

  for(i=0;i<10;i++)
      printf("%d, %d, %d, %d, %d, %d \n",
        arry[i], *(arry+i), (*p)[i], *((*p)+i), (**q)[i], *((**q)+i));

}
==============================
========
Result:
0, 0, 0, 0, 0, 0
1, 1, 1, 1, 1, 1
2, 2, 2, 2, 2, 2
3, 3, 3, 3, 3, 3
4, 4, 4, 4, 4, 4
5, 5, 5, 5, 5, 5
6, 6, 6, 6, 6, 6
7, 7, 7, 7, 7, 7
8, 8, 8, 8, 8, 8
9, 9, 9, 9, 9, 9

 As we see all the expressions in the printf statement have the same
meaning.


In the next snippet, we will discuss about pointers to two-dimensional
arrays.

11 May 2009

Preventing Precedence Errors: EqualitytInequality

A very commonly observed precedence error during usage of expressions is inadequate parenthesizing.  Here we take examples of Equality/Inequality expressions:

if(var1 & var2 == 0)

The above is incorrect because == has a higher precedence than &, therefore in the above expression  var2==0  will be computed first, which is a bool. Then that bool will be bit_AND'ed with var1.  And this was not our intention.

The correct way  is:

if((var1 & var2) == 0)


In practice, the test for equality/inequality must be the last logical operation, but it can happen otherwise, if there are operators of lower precedence than  ==  or   !=  in the expression.

If we recall our three liner:

S—U—Ari
Re—Bit—Lo—C
Assignment—Comma


The  “e”  in Re stands for equality/inequality, and when they are there in an expression with operators of lower precedence then the  ==  or !=  will get executed earlier. To change the order of evaluation we need to appropriately parenthesize these operators. These operators(with precedence lower than  ==  and   != )  are   Bit—Lo—C—Assignment—Comma.   In practice we will find bit and assignment operator mixed with  ==  and   != operators.

So when we see this mixture, i.e.,==  or  !=, &,  | ,  ^, =   in a statement,  parenthesis are to be taken care.

The following are some examples where precedence errors are very common.


a)    Test if FLAG_A is clear in message incorrect
if(message & FLAG_A == 0)

correct
if((message & FLAG_A) == 0)


b)    Test if FLAG_ A is set in message  incorrect
if(message & FLAG_A != 0)

correct
if((message & FLAG_A) != 0)



c)    Test if all flags are clear incorrect
if ( message & (FLAG_A | FLAG_B | FLAG_C) == 0)

 correct
if (( message & (FLAG_A | FLAG_B | FLAG_C)) ==0)

d)    Incorrect
if ( c = getchar() != EOF)

correct
if (( c = getchar()) != EOF)

In all the above cases, parenthesizing is to be done for operands with precedence lower than  == or !=  so that they are executed earlier.

In short, when and expression has   equality/inequality operator as the final operand and we see that there are operators in the expression whose precedence levels and are still lower than equality / inequality, then we should get the parenthesis correct. The operators with lower precedence than  “Re” are  Bit—Lo—C—Assignment—Comma

16 April 2009

How to measure the size of a variable

Let us go through a small test to find out if this entire snippet is really worth reading. The test begins now.  We should complete it in 60 seconds!

Consider these  declarations in a code.

     char x, y, *p, **pp;
     char  *array[5], arr[5];
     char  *f(int), **g(int) ;

   struct node  a, b,* h(int); / * assume the size of node is 20 */

Then, what will be the values of the following?

1.    sizeof(x)
2.    sizeof(y)
3.    sizeof(*p)
4.    sizeof(p)
5.    sizeof(**pp)
6.    sizeof(*pp)
7.    sizeof(pp)
8.    sizeof(* array [0])
9.    sizeof(array[0])
10.   sizeof(arr [5] )
11.   sizeof(*f(5))
12.   sizeof(f(5))    /* is the size of a function?! */
13.   sizeof(**g(6))
14.   sizeof(*g(6))
15.   sizeof(g(6))
16.   sizeof(a)
17.   sizeof(*b)
18.   sizeof(b)
19.   sizeof(*h(5))
20.   sizeof(h(5))


Considering that size of char is 1 byte and pointers have a size of 4 bytes. Here  are the answers.

1.    1 byte
2.    1 byte
3.    1 byte
4.    4 bytes
5.    1 byte
6.    4 bytes
7.    4 bytes
8.    1 byte
9.    4 bytes
10.   1 byte
11.   1byte
12.   4 bytes
13.   1 byte
14.   4 bytes
15.   4 bytes
16.   20 bytes
17.   20 bytes
18.   4 bytes
19.   20 bytes
20.   4 bytes

We are almost sure; we got all the answers right, but did we beat the clock? Did we do it within 60 seconds?

Well! Here is how to get the measure on size really fast… For this we need to remember the following two things. 

First: refresh what we discussed in the “Moving the Stars” snippet. That virtual equal to sign helps a lot in getting the size right. Second: Any pointer of any type has the size 4 bytes (we are by default considering 32 bit machines. The pointer size can, of course, change with machines.)

Let us answer the questions we put earlier…

Question 1 and Answer:
     We have
      char x;

     Which can be read as
      char = x

     Therefore,
      sizeof(x) = sizeof(char) = 1

Question 2 and Answer:
     As above, the answer is again 1 byte

Question 3 and Answer:
     Given
      char *p;
     What is the sizeof(*p)?

     Because char = *p, therefore
      sizeof(*p)= sizeof(char) = 1

Question 4 and Answer:
     Given
      char  *p;
     What is the sizeof(p)?

     Because char *  = p, therefore,
      sizeof(p) = sizeof(char *) = 4

Question 5 and Answer:
     Given
      char **pp;
     What is the sizeof (**pp)?

     Because char = **pp, therefore
      sizeof (**pp) = sizeof (char) = 1

Question 6 and Answer:
     Given,
      char **pp;
      What is the  sizeof(*pp)?

     Because char * =*pp, therefore
      sizeof(*pp) = sizeof(char *) = 4

Question 7 and Answer:
     Given,
      char **pp,
     What is the sizeof (pp)?

     Because char ** = p, therefore
      sizeof(p) = sizeof(char **) = 4


Question 8 is interesting, because we are apparently trying to find the size of a function, which is unusual. But we can compute the size of functions. However what it means is the size of its return values.

Let us take up Question 8 and Answer:

     Given char * f(int); what is sizeof(*f(5));

     Because char = * f(int), therefore
      sizeof (* f(5)) = sizeof(char) = 1


Question 9 and Answer:

     Given  char *f(int) what is  sizeof(f(5))?

     Because  char * = f(int), therefore       sizeof (f(5)) = sizeof(char *) = 4.


Now that we know what we mean by size of a function, the remaining questions should be easy to answer quickly. Let's do it to be sure. This trick works with any C/C++ declarations.

By the way, why do we need to know the size of any variable correctly, the answer is to malloc correctly to get the correct amount of memory. If we give a wrong size for memory allocation, chances are that we will take days
to find out why the program misbehaves in strange ways.

This 
snippet has helped to explain in detail the aspects of sizeof.

04 April 2009

Refresh your Pointers knowledge

This little piece of literature will let us prevent a category of defects related to pointers. But first, let us take a small test. It will help us to decide if the entire snippet  is really worth reading. So here are the questions.

Consider, following are some of the declarations in a piece of code.
     int x , y, *p, **pp ;
   int *array[5], arr[5];
   int *f(int), **g(int);


Now, for which of the following eight statements the compiler will complain and for which it will not complain?
Let us try the test; check if we can hit it right in less than sixty seconds!
1.    x = y;
2.    *p = array [0];
3.    *p = **pp;
4.    x = f (4);
5.    p = *g (3);
6.    array [0] = arr[0];
7.    arr [0] = * array [1];
8.    array [5] = **g (3);


Here are the answers. For all odd number statements 1,3,5… .. the compiler will not complain.
For all even number statements 2,4,6… . The compiler will complain.

Let us now figure out quickly some of the mistakes we saw earlier. The way to find  what works and what does not, is rather simple. That is because, we can consider a C/C++ declarations as an equation with an “=” sign, across which you can move the stars!

Let's hold on! We will see what we meant above

Let us consider the declaration  int * x;

Let's imagine an "equal to" sign in between as follows

     int = * x ; /* this one tells you that *x is int */

We can move the star across the “=” sign, as shown here:

     int * = x ; /* this one tells that x is an integer pointer or a
pointer to an integer*/


One more example will make things clearer.

     int ** y

Let's imagine it as:

     int = **y   (this means  **y is an integer).


After moving one star across:

     int * = *y ( this means that *y is an integer pointer).

After moving one more star across:
     int ** = y,  this means that y is an integer pointer pointer.



Now , let us combine the above two examples
      In the first example we saw  *x  is an integer
     In the second example we saw that  **y  is also an integer.

So, the compiler will not complain, if we write
     *x = **y
          or
     **y = *x

Similarly,
     x = * y   is also ok with the compiler because both the sides are
int *.

By now we have seen, how to quickly find what will work and what will not work by  MOVING THE STARS!
It is a good idea to have a look at the declarations for minimizing related defects.

Let us go back to the quiz questions we discussed earlier. We will go through the answers quickly by using this  MOVING THE STARS! method.

We have copied and pasted the declarations over here for easy reference:
     int x , y, *p, **pp ;
     int *array[5], arr[5];
     int *f(int), **g(int);

1.    x = y  is correct, because both sides are int
2.    *p = array [0] is wrong, because LHS is int but RHS is  int *
3.    *p = **pp    is correct because both sides are int.
4.    x = f (4) is wrong because LHS is  int and RHS is int *.
5.    p= *g (3) is correct because both sides are int *
6.    array [0] = arr [0] is wrong because LHS is int *  but RHS is  int.
7.    arr [0] = * array [1]   is correct, because both sides are  int.
8.    array [5] = ** g (3) is wrong, because LHS is  int*  while RHS is int.


Experience shows that regularly using this little trick helps in writing as well as reading code faster.
Of course, it also helps us in reducing mistakes.

13 February 2009

goto

- A goto statement causes your program to unconditionally transfer control to the statement associated with the label specified on the goto staement.

- Because the goto statement can interfere with the normal sequence of processing, it makes a program more difficult and maintain. Often, a break statement, a continue statement, or a function call can eliminate the need for a goto statement.

- If an active block is exited using the goto statement, any local variables are destroyed when the control is transferred from that block.

- Also , one cannot use a goto statement to jump over initializations.

- A goto statement is allowed to jump within the scope of a variable length array, but not past any declaration of objects with variably modified types.

goto name1;
/* name1 is the identifier for the jumping location */
/* name1 is a valid variable followed by colon */

…………. ………….
…………. ………….
…………. ………….
name1: Statement;



Eversince I started writing C codes (which is more than a decade back) I have been instructed not to use the keyword 'goto' ever. and I were to say that we all have been doing a goto-less programming then it would not be out of place. Infact, goto is much like a contradiction to the philosophy of C. C follows a structured instruction flow, and use of goto makes it break that flow and starts jumping the code sequence.

The inclusion of goto in the ANSI C itself shows that it has some utility in the programming, and should be used whenever and wherever required. The most common use of goto statements is when you need to jump out of multiple loops for a safe landing, and it is expected that it should not be used for jumping out of single loops. Instead, whenever we encounter multiple loops, say 3 or 4, we need to use goto.

Having said that goto is the most efficient code line when we need to jump out of multiple loops, it is also a must to know that the landing code should not be with-in a loop (or a loop with-in a loop).

An example of such a code using goto is here :

#include
void display(int matrix[3][3]);

int main(void)
{
int matrix[3][3]= {1,2,3,4,5,2,8,9,10};
display(matrix);
return(0);
}

void display(int matrix[3][3])
{
int i, j;

for (i = 0; i < 3; i++)
for (j = 0; j < 3; j++)
{
if ( (matrix[i][j] < 1) || (matrix[i][j] > 6) )
goto out_of_bounds;
printf("matrix[%d][%d] = %d\n", i, j, matrix[i][j]);
}
return;
out_of_bounds: printf("number must be 1 through 6\n");
}


And finally here are some interesting facts against goto statements.......

1. MATLAB does not have goto at all, and it sometimes make programming so much difficult without it.

2. Java has a reserved keyword as goto, but does not define its syntax.

3. Dijkstra's letter against goto : http://www.cs.ubbcluj.ro/~adriana/FP/Requirements/dijkstra68goto.pdf

29 January 2009

csh programming should be avoided

I am continually shocked and dismayed to see people write test cases, install scripts, and other random hackery using the csh. Lack of proficiency in the Bourne shell has been known to cause errors in /etc/rc and .cronrc files, which is a problem, because you MUST write these files in that language.

The csh is attractive because the conditionals are more C-like, so the path of least resistance is chosen and a csh script is written. Sadly, this is a lost cause, and the programmer seldom even realizes it, even when they find that many simple things they wish to do range from cumbersome to impossible in the csh.


1. FILE DESCRIPTORS

The most common problem encountered in csh programming is that you can't do file-descriptor manipulation. All you are able to do is redirect stdin, or stdout, or dup stderr into stdout. 
Bourne-compatible shells offer you an abundance of more exotic possibilities. 

1a. Writing Files

In the Bourne shell, you can open or dup arbitrary file descriptors.
For example, 

exec 2>errs.out

means that from then on, stderr goes into errs file.

Or what if you just want to throw away stderr and leave stdout
alone? Pretty simple operation, eh?

cmd 2>/dev/null

Works in the Bourne shell. In the csh, you can only make a pitiful 
attempt like this:

(cmd > /dev/tty) >& /dev/null

But who said that stdout was my tty? So it's wrong. This simple
operation *CANNOT BE DONE* in the csh.

Along these same lines, you can't direct error messages in csh scripts
out stderr as is considered proper. In the Bourne shell, you might say:

echo "$0: cannot find $file" 1>&2

but in the csh, you can't redirect stdout out stderr, so you end
up doing something silly like this:

sh -c 'echo "$0: cannot find $file" 1>&2'

1b. Reading Files

In the csh, all you've got is $<, which reads a line from your tty. What
if you've redirected stdin? Tough noogies, you still get your tty, which 
you really can't redirect. Now, the read statement 
in the Bourne shell allows you to read from stdin, which catches
redirection. It also means that you can do things like this:

exec 3 < file1
exec 4 < file2

Now you can read from fd 3 and get lines from file1, or from file2 through
fd 4. In modern, Bourne-like shells, this suffices: 

read some_var 0<&3
read another_var 0<&4

Although in older ones where read only goes from 0, you trick it:

exec 5<&0 # save old stdin
exec 0<&3; read some_var
exec 0<&4; read another_var
exec 0<&5 # restore it


1c. Closing FDs

In the Bourne shell, you can close file descriptors you don't
want open, like 2>&-, which isn't the same as redirecting it
to /dev/null.

1d. More Elaborate Combinations

Maybe you want to pipe stderr to a command and leave stdout alone.
Not too hard an idea, right? You can't do this in the csh as I
mentioned in 1a. In a Bourne shell, you can do things like this:

exec 3>&1; grep yyy xxx 2>&1 1>&3 3>&- | sed s/file/foobar/ 1>&2 3>&-
grep: xxx: No such foobar or directory

Normal output would be unaffected. The closes there were in case
something really cared about all its FDs. We send stderr to sed,
and then put it back out 2.

Consider the pipeline:

A | B | C

You want to know the status of C, well, that's easy: it's in $?, or
$status in csh. But if you want it from A, you're out of luck -- if
you're in the csh, that is. In the Bourne shell, you can get it, although
doing so is a bit tricky. Here's something I had to do where I ran dd's
stderr into a grep -v pipe to get rid of the records in/out noise, but had
to return the dd's exit status, not the grep's:

device=/dev/rmt8
dd_noise='^[0-9]+\+[0-9]+ records (in|out)$'
exec 3>&1
status=`((dd if=$device ibs=64k 2>&1 1>&3 3>&- 4>&-; echo $? >&4) |
egrep -v "$dd_noise" 1>&2 3>&- 4>&-) 4>&1`
exit $status;


The csh has also been known to close all open file descriptors besides
the ones it knows about, making it unsuitable for applications that 
intend to inherit open file descriptors.


2. COMMAND ORTHOGONALITY

2a. Built-ins

The csh is a horrid botch with its built-ins. You can't put them
together in many reasonable ways. Even simple little things like this:

% time | echo

which while nonsensical, shouldn't give me this message:

Reset tty pgrp from 9341 to 26678

Others are more fun:
% sleep 1 | while
while: Too few arguments.
[5] 9402
% jobs
[5] 9402 Done sleep |


Some can even hang your shell. Try typing ^Z while you're sourcing 
something, or redirecting a source command. Just make sure you have
another window handy. Or try 

% history | more

on some systems.
Aliases are not evaluated everywhere you would like them do be:

% alias lu 'ls -u'
% lu
HISTORY News bin fortran lib lyrics misc tex
Mail TEX dehnung hpview logs mbox netlib
% repeat 3 lu
lu: Command not found.
lu: Command not found.
lu: Command not found.

% time lu
lu: Command not found.


2b. Flow control

You can't mix flow-control and commands, like this:

who | while read line; do
echo "gotta $line"
done


You can't combine multiline constructs in a csh using semicolons.
There's no easy way to do this

alias cmd 'if (foo) then bar; else snark; endif'


You can't perform redirections with if statements that are
evaluated solely for their exit status:

if ( { grep vt100 /etc/termcap > /dev/null } ) echo ok

And even pipes don't work:

if ( { grep vt100 /etc/termcap | sed 's/$/###' } ) echo ok

But these work just fine in the Bourne shell:
if grep vt100 /etc/termcap > /dev/null ; then echo ok; fi 
if grep vt100 /etc/termcap | sed 's/$/###/' ; then echo ok; fi

Consider the following reasonable construct:

if ( { command1 | command2 } ) then
...
endif

The output of command1 won't go into the input of command2. You will get
the output of both commands on standard output. No error is raised. In
the Bourne shell or its clones, you would say 

if command1 | command2 ; then
...
fi


2c. Stupid parsing bugs

Certain reasonable things just don't work, like this:

% kill -1 `cat foo`
`cat foo`: Ambiguous.

But this is ok:

% /bin/kill -1 `cat foo`

If you have a stopped job:

[2] Stopped rlogin globhost

You should be able to kill it with 

% kill %?glob
kill: No match

but

% fg %?glob

works.

White space can matter:

if(expr)

may fail on some versions of csh, while

if (expr)

works! Your vendor may have attempted to fix this bug, but odds are good
that their csh still won't be able to handle

if(0) then
if(1) then
echo A: got here
else
echo B: got here
endif
echo We should never execute this statement
endif



3. SIGNALS

In the csh, all you can do with signals is trap SIGINT. In the Bourne
shell, you can trap any signal, or the end-of-program exit. For example,
to blow away a tempfile on any of a variety of signals:

$ trap 'rm -f /usr/adm/tmp/i$$ ;
echo "ERROR: abnormal exit";
exit' 1 2 3 15

$ trap 'rm tmp.$$' 0 # on program exit



4. QUOTING
You can't quote things reasonably in the csh:

set foo = "Bill asked, \"How's tricks?\""

doesn't work. This makes it really hard to construct strings with
mixed quotes in them. In the Bourne shell, this works just fine. 
In fact, so does this:

cd /mnt; /usr/ucb/finger -m -s `ls \`u\``

Dollar signs cannot be escaped in double quotes in the csh. Ug.

set foo = "this is a \$dollar quoted and this is $HOME not quoted" 
dollar: Undefined variable.

You have to use backslashes for newlines, and it's just darn hard to
get them into strings sometimes.

set foo = "this \
and that";
echo $foo
this and that
echo "$foo"
Unmatched ". 

Say what? You don't have these problems in the Bourne shell, where it's
just fine to write things like this:

echo 'This is 
some text that contains
several newlines.'


As distributed, quoting history references is a challenge. Consider:

% mail adec23!alberta!pixel.Convex.COM!tchrist
alberta!pixel.Convex.COM!tchri: Event not found.


5. VARIABLE SYNTAX

There's this big difference between global (environment) and local
(shell) variables. In csh, you use a totally different syntax 
to set one from the other. 

In the Bourne shell, this 
VAR=foo cmds args
is the same as
(export VAR; VAR=foo; cmd args)
or csh's
(setenv VAR; cmd args)

You can't use :t, :h, etc on envariables. Watch:
echo Try testing with $SHELL:t

It's really nice to be able to say

${PAGER-more}
or
FOO=${BAR:-${BAZ}}

to be able to run the user's PAGER if set, and more otherwise.
You can't do this in the csh. It takes more verbiage.

You can't get the process number of the last background command from the
csh, something you might like to do if you're starting up several jobs in
the background. In the Bourne shell, the pid of the last command put in
the background is available in $!.

The csh is also flaky about what it does when it imports an 
environment variable into a local shell variable, as it does
with HOME, USER, PATH, and TERM. Consider this:

% setenv TERM '`/bin/ls -l / > /dev/tty`'
% csh -f

And watch the fun!


6. EXPRESSION EVALUATION

Consider this statement in the csh:
if ($?MANPAGER) setenv PAGER $MANPAGER


Despite your attempts to only set PAGER when you want
to, the csh aborts:

MANPAGER: Undefined variable.

That's because it parses the whole line anyway AND EVALUATES IT!
You have to write this:

if ($?MANPAGER) then
setenv PAGER $MANPAGER
endif

That's the same problem you have here:

if ($?X && $X == 'foo') echo ok
X: Undefined variable

This forces you to write a couple nested if statements. This is highly
undesirable because it renders short-circuit booleans useless in
situations like these. If the csh were the really C-like, you would
expect to be able to safely employ this kind of logic. Consider the
common C construct:

if (p && p->member) 

Undefined variables are not fatal errors in the Bourne shell, so 
this issue does not arise there.

While the csh does have built-in expression handling, it's not
what you might think. In fact, it's space sensitive. This is an
error

@ a = 4/2

but this is ok

@ a = 4 / 2


The ad hoc parsing csh employs fouls you up in other places 
as well. Consider:

% alias foo 'echo hi' ; foo
foo: Command not found.
% foo
hi



7. ERROR HANDLING

Wouldn't it be nice to know you had an error in your script before
you ran it? That's what the -n flag is for: just check the syntax.
This is especially good to make sure seldom taken segments of code
code are correct. Alas, the csh implementation of this doesn't work.
Consider this statement:

exit (i)

Of course, they really meant

exit (1)

or just

exit 1

Either shell will complain about this. But if you hide this in an if
clause, like so:

#!/bin/csh -fn
if (1) then
exit (i)
endif

The csh tells you there's nothing wrong with this script. The equivalent
construct in the Bourne shell, on the other hand, tells you this:


#!/bin/sh -n
if (1) then
exit (i)
endif

/tmp/x: syntax error at line 3: `(' unexpected



RANDOM BUGS

Here's one:

fg %?string
^Z
kill %?string
No match.

Huh? Here's another

!%s%x%s

Coredump, or garbage.

If you have an alias with backquotes, and use that in backquotes in 
another one, you get a coredump.

Try this:
% repeat 3 echo "/vmu*"
/vmu*
/vmunix
/vmunix
What???


Here's another one:

% mkdir tst
% cd tst
% touch '[foo]bar'
% foreach var ( * )
> echo "File named $var"
> end
foreach: No match.


8. SUMMARY


While some vendors have fixed some of the csh's bugs (the tcsh also does much better here), many have added new ones. Most of its problems can never be solved because they're not actually bugs, but rather the direct consequences of ill design decisions. It's inherently flawed.

Do yourself a favor, and if you *have* to write a shell script, do it in the Bourne shell. It's on every UNIX system out there. However, behavior can vary.

There are other possibilities.

The Korn shell is the preferred programming shell by many sh addicts, but it still suffers from inherent problems in the Bourne shell's design,such as parsing and evaluation horrors. The Korn shell or its public-domain clones and supersets (like bash) aren't quite so ubiquitous
as sh, so it probably wouldn't be wise to write a sharchive in them that you post to the net. When 1003.2 becomes a real standard that companies are forced to adhere to, then we'll be in much better shape. Until then, we'll be stuck with bug-incompatible versions of the sh lying about.

If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don't want to write the silly thing in C, then Perl may be for you. You can get at networking functions, binary data, and most of the C library. There are also translators to turn your sed and awk scripts into Perl scripts,
as well as a symbolic debugger. Tchrist's personal rule of thumb is that if it's the size that fits in a Makefile, it gets written in the Bourne shell, but anything bigger gets written in Perl.

03 January 2009

Let Us C by Yaswant Kanitkar

Anyone who is looking for a softcopy of 'Let Us C' by Yashwant Kanitkar can drop me a request in the comments section of this page. I will send the book by email (so also put your email ID while putting the request).

Hope you will have a great time reading this book, as it has become a universal passport to all the C related questions at the freshers level.

Keep coming back.

bbye!

01 January 2009

Ubuntu 8.10 'Hardy Heron' released!

Ubuntu 8.10 'Hardy Heron' version has been launched with a Beta tag, and is ready for download from its website.

The main driving point for Ubuntu has been the Windows user, and the way it has been designed and created is with an intention to make the user shift away from Windows to Ubuntu. Even in the last 8.04 edition it has paid special attention to the ease with which a user can install Ubuntu on his Windows PC. And this feature remains to be there in this 8.10 edition too.

Hardy Heron beta incorporates GNOME 2.22 which brings with it a whole raft of new features, changes, fixes and improvements. One of the most significant changes is the new Nautilus file manager that uses the GVFS virtual File system.

In this version Firefox 3.0 Beta 4 has been provided as the default browser.

CD and DVD burning just got a lot easier with Brasero. It's wonderfully quick and easy to use.

The upgraded kernel (now 2.6.24-12.13), which brings with it power management for 64-bit users, kernel-based virtualization and the 'Completely Fair Scheduler' process.


New Features in Ubuntu 8.10
-------------------------------------------------

3G Support

For constant connectivity public WiFi has limitations. Improvements to the network manager in Ubuntu 8.10 makes it simple to detect and connect to 3G networks and manage connectivity. This connectivity is delivered through an inbuilt 3G modem, through 'dongle' support, through a mobile phone or through Bluetooth. It is a complex environment that Ubuntu 8.10 simplifies through a single interface and the auto-detection of many of the most popular devices.

Write Ubuntu to and Install from a USB Drive

Ubuntu has been made available to users as an image for CDs and DVDs to date. But CDs and DVDs are slower, less portable and less convenient than USB sticks. Now, a simple application in Ubuntu will allow users to write Ubuntu to a USB drive, even a modified version of Ubuntu with their data on it, so it can be carried everywhere to plug in and use on any machine.

Guest Sessions

In a world of 'always on' pervasive computing it is more likely that users lend their computers to colleagues or friends at conferences, cafes or at parties so they can check email, etc. Guest sessions allow users to lock down a session easily so a guest can use the full system without interference with programs or data.

BBC Content

Starting the media players within Ubuntu (Totem Movie Player and Rhythmbox) launches a menu of selected content from the broadcaster that is free to air. This is a mixture of video, radio and podcasts and available in high quality, much of it playable using non-proprietary codecs. Content is constantly updated via the corporation's stream and will vary dependent on location, though some content will be available for every user.

Latest Gnome 2.24 Desktop Environment

The GNOME desktop environment project releases its latest version which is incorporated into Ubuntu 8.10. New features include a new instant messaging client, a built-in time tracker, improved file management and toolbars plus better support for multiple monitor use with the ability to set screen resolution by monitor.


Finally, with this release of Ubuntu it becomes almost sure that more and more number of users will start migrating to it. This happens to be the best of all the available Unix flavours for personal systems like Desktops and the Notepads.

bzero() and memset()

bzero() and memset() are two functions used for similar use - to set or initialize the values of a memory segment to zero (or NULL). Where bzero() is used only for setting the values to zero, memset() is also used for initializing the values to some others values too.
the function prototype for both are like as below:

#include 
void bzero(void *mem, int bytes);
void* memset(void *mem, int val, int bytes);
 
mem : the memory segment to initialize, holds the address of the starting block.
bytes : Size of the memory segment.
val :  value to be filled in the segment.
 
(both these functions are defined in the string.h header file, and hence it needs to be included
whenever these are included in the file.)
 
 
These two functions have been historically been used in an interchangable manner, the reason being the
nature of UNIX and its vastness. bzero() was originally a part of the BSD-Unix, and memset() was a part of
the AT&T Unix. And later memset() was also adopted in the ANSI C and POSIX standards whereas bzero()
was depracated. And while writing professional codes it is not advisable to use bzero() and longer, instead
memset() should be used. (Although both will remain to be supported in the all the platforms).