We have already seen how
variables are memory cells that we can access by an identifier. But these
variables are stored in concrete places of the computer memory. For our
programs, the computer memory is only a succession of 1 byte cells (the
minimum size for a datum), each one with a unique address.
A good simile for the computer memory can be a street
in a city. On a street all houses are consecutively numbered with an unique
identifier so if we talk about 27th of Sesame Street we will be able to find
that place without trouble, since there must be only one house with that number
and, in addition, we know that the house will be between houses 26 and 28.
In the same way in which houses in a street are
numbered, the operating system organizes the memory with unique and consecutive
numbers, so if we talk about location 1776 in the memory, we know that there is
only one location with that address and also that is between addresses 1775 and
1777.
Address (dereference) operator (&).
At the moment in which we declare a
variable it must be stored in a concrete location in this succession of cells
(the memory). We generally do not decide where the variable is to be placed -
fortunately that is something automatically done by the compiler and the
operating system at runtime, but once the operating system has assigned an
address there are some cases in which we may be interested in knowing where the
variable is stored.
This can be done by preceding the variable identifier
by an ampersand sign (&),
which literally means "address of". For example:
ted = &andy;
would assign to variable
ted the address of variable
andy, since when preceding the name of the variable
andy with the ampersand
(&) character we are no
longer talking about the content of the variable, but about its address in
memory.
We are going to suppose that andy
has been placed in the memory address 1776
and that we write the following:
andy = 25;
fred = andy;
ted = &andy;
the result is shown in the following
diagram:
We have assigned to
fred the content of variable
andy as we have done in many other occasions in
previous sections of this tutorial, but to ted
we have assigned the address in memory where the operating system stores
the value of andy, that we
have imagined was 1776 (it
can be any address, I have just invented this one). The reason is that in the
allocation of ted we have
preceded andy with an ampersand
(&) character.
The variable that stores the address of another
variable (like ted in the
previous example) is what we call a pointer. In C++ pointers have
certain virtues and they are used very often. Farther ahead we will see how
this type of variable is declared.
Reference operator (*)
Using a pointer we can directly
access the value stored in the variable pointed by it just by preceding the
pointer identifier with the reference operator asterisk (*),
that can be literally translated to "value pointed by".
Therefore, following with the values of the previous example, if we write:
beth = *ted;
(that we could read as: "beth equal
to value pointed by ted") beth
would take the value 25,
since ted is
1776, and the value pointed by
1776 is 25.

You must clearly differenciate that
ted stores 1776,
but *ted (with an asterisk
* before) refers to the value
stored in the address 1776,
that is 25. Notice the
difference of including or not including the reference asterisk (I have
included an explanatory commentary of how each expression could be read):
beth = ted; // beth equal to ted ( 1776 )
beth = *ted; // beth equal to value pointed by ted ( 25 )
Operator of address or dereference
(&)
It is used as a variable prefix and can be translated as "address of",
thus: &variable1 can be read
as "address of variable1"
Operator of reference (*)
It indicates that what has to be evaluated is the content pointed by the
expression considered as an address. It can be translated by "value pointed by".
* mypointer can be read as "value
pointed by mypointer".
|
At this point, and following with the same example
initiated above where:
andy = 25;
ted = &andy;
you should be able to clearly see
that all the following expressions are true:
andy == 25
&andy == 1776
ted == 1776
*ted == 25
The first expression is quite clear
considering that its assignation was andy=25;.
The second one uses the address (or derefence) operator (&)
that returns the address of the variable andy,
that we imagined to be 1776.
The third one is quite obvious since the second was true and the assignation of
ted was
ted = &andy;. The fourth expression uses the
reference operator (*) that, as
we have just seen, is equivalent to the value contained in the address pointed
by ted, that is
25.
So, after all that, you may also infer that while the
address pointed by ted remains
unchanged the following expression will also be true:
*ted
== andy
Declaring variables of type pointer
Due to the ability of a pointer to
directly reference the value that it point to, it becomes necessary to specify
which data type a pointer points to when declaring it. It is not the same to
point to a char as it is
to point to an int or a
float type.
Therefore, the declaration of pointers follows this
form:
type * pointer_name;
where type
is the type of data pointed, not the type of the pointer itself. For
example:
int * number;
char * character;
float * greatnumber;
they are three declarations of
pointers. Each one points to a different data type, but the three are pointers
and in fact the three occupy the same amount of space in memory (the size of a
pointer depends on the operating system), but the data to which they point do
not occupy the same amount of space nor are of the same type, one is
int, another one is char
and the other one float.
I emphasize that the asterisk (*)
that we use when declaring a pointer means only that it is a pointer,
and should not be confused with the reference operator that we have seen a bit
earlier which is also written with an asterisk (*).
They are simply two different tasks represented with the same sign.
// my first pointer
#include <iostream.h>
int main ()
{
int value1 = 5, value2 = 15;
int * mypointer;
mypointer = &value1;
*mypointer = 10;
mypointer = &value2;
*mypointer = 20;
cout << "value1==" << value1 << "/ value2==" << value2;
return 0;
}
|
value1==10 /
value2==20
|
Notice how the values of
value1 and value2
have changed indirectly. First we have assigned to mypointer
the address of value1 using
the deference ampersand sign (&).
Then we have assigned 10 to
the value pointed by mypointer,
which is pointing to the address of value1,
so we have modified value1
indirectly.
In order that you can see that a pointer may take
several different values during the same program we have repeated the process
with value2 and the same
pointer.
Here is an example a bit more complicated:
// more pointers
#include <iostream.h>
int main ()
{
int value1 = 5, value2 = 15;
int *p1, *p2;
p1 = &value1; // p1 = address of value1
p2 = &value2; // p2 = address of value2
*p1 = 10; // value pointed by p1 = 10
*p2 = *p1; // value pointed by p2 = value pointed by p1
p1 = p2; // p1 = p2 (value of pointer copied)
*p1 = 20; // value pointed by p1 = 20
cout << "value1==" << value1 << "/ value2==" << value2;
return 0;
}
|
value1==10 /
value2==20
|
I have included as comments on each line how the code
can be read: ampersand (&) as
"address of" and asterisk (*) as
"value pointed by". Notice that there are expressions with pointers
p1 and p2
with and without the asterisk. The meaning of using or not using a reference
asterisk is very different: An asterisk (*)
followed by the pointer refers to the place pointed by the pointer, whereas a
pointer without an asterisk (*)
refers to the value of the pointer itself, that is, the address of where it is
pointing.
Another thing that can call your attention is the
line:
int *p1, *p2;
that declares the two pointers of
the previous example putting an asterisk (*)
for each pointer. The reason is that the type for all the declarations of the
same line is int (and not
int*). The explanation is because of the level of
precedence of the reference operator asterisk (*)
that is the same as the declaration of types, therefore, because they are
associative operators from the right, the asterisks are evaluated first than
the type. We have talked about this in
section 1.3: Operators,
although it is enough that you know clearly that -unless you include
parenthesis- you will have to put an asterisk (*)
before each pointer that you declare.
Pointers and arrays
The concept of array is very much
bound to the one of pointer. In fact, the identifier of an array is equivalent
to the address of its first element, like a pointer is equivalent to the
address of the first element that it points to, so in fact they are the same
thing. For example, supposing these two declarations:
int numbers [20];
int * p;
the following allocation would be
valid:
p = numbers;
At this point
p and numbers
are equivalent and they have the same properties, the only difference is that
we could assign another value to the pointer p
whereas numbers will
always point to the first of the 20 integer numbers of type
int with which it was defined. So, unlike
p, that is an ordinary variable pointer,
numbers is a constant pointer (indeed an
array name is a constant pointer). Therefore, although the previous expression
was valid, the following allocation is not:
numbers
= p;
because numbers
is an array (constant pointer), and no values can be assigned to constant
identifiers.
Due to the character of variables all the
expressions that include pointers in the following example are perfectly valid:
// more pointers
#include <iostream.h>
int main ()
{
int numbers[5];
int * p;
p = numbers; *p = 10;
p++; *p = 20;
p = &numbers[2]; *p = 30;
p = numbers + 3; *p = 40;
p = numbers; *(p+4) = 50;
for (int n=0; n<5; n++)
cout << numbers[n] << ", ";
return 0;
}
|
10, 20, 30, 40, 50,
|
In chapter "Arrays" we used bracket signs
[] several times in order to specify the index of the
element of the Array to which we wanted to refer. Well, the bracket signs
operator [] are known as offset
operators and they are equivalent to adding the number within brackets to the
address of a pointer. For example, both following expressions:
a[5] = 0; // a [offset of 5] = 0
*(a+5) = 0; // pointed by (a+5) = 0
are equivalent and valid either if
a is a pointer or if it is an array.
Pointer initialization
When declaring pointers we may want
to explicitly specify to which variable we want them to point,
int number;
int *tommy = &number;
this is equivalent to:
int number;
int *tommy;
tommy = &number;
When a pointer assignation takes
place we are always assigning the address where it points to, never the value
pointed. You must consider that at the moment of declaring a pointer, the
asterisk (*) indicates only that
it is a pointer, it in no case indicates the reference operator (*).
Remember, they are two different operators, although they are written with the
same sign. Thus, we must take care not to confuse the previous with:
int
number;
int *tommy;
*tommy = &number;
that anyway would not have much
sense in this case.
As in the case of arrays, the compiler allows the
special case that we want to initialize the content at which the pointer points
with constants at the same moment as declaring the variable pointer:
char * terry =
"hello";
in this case static storage is
reserved for containing "hello" and
a pointer to the first char
of this memory block (that corresponds to 'h') is assigned to
terry. If we imagine that
"hello" is stored at addresses 1702 and following, the
previous declaration could be outlined thus:
it is important to indicate that
terry contains the value 1702
and not 'h' nor
"hello", although 1702
points to these characters.
The pointer terry
points to a string of characters and can be used exactly as if it was an Array
(remember that an array is just a constant pointer). For example, if our
temper changed and we wanted to replace the 'o'
by a '!' sign in the
content pointed by terry,
we could do it by any of the following two ways:
terry[4] = '!';
*(terry+4) = '!';
remember that to write
terry[4] is just the same as to write
*(terry+4), although the most usual expression is
the first one. With either of those two expressions something like this would
happen:

Arithmetic of pointers
To conduct arithmetical operations
on pointers is a little different than to conduct them on other integer data
types. To begin with, only addition and subtraction operations are allowed to
be conducted, the others make no sense in the world of pointers. But both
addition and subtraction have a different behavior with pointers according to
the size of the data type to which they point.
When we saw the different data types that exist, we
saw that some occupy more or less space than others in the memory. For example,
in the case of integer numbers, char occupies 1 byte, short occupies
2 bytes and long occupies 4.
Let's suppose that we have 3 pointers:
char *mychar;
short *myshort;
long *mylong;
and that we know that they point to
memory locations 1000,
2000 and 3000
respectively.
So if we write:
mychar++;
myshort++;
mylong++;
mychar,
as you may expect, would contain the value 1001.
Nevertheless, myshort would
contain the value 2002, and
mylong would contain 3004.
The reason is that when adding 1 to
a pointer we are making it to point to the following element of the same type
with which it has been defined, and therefore the size in bytes of the type
pointed is added to the pointer.
This is applicable both when adding
and subtracting any number to a pointer. It would happen exactly the same if we
write:
mychar = mychar +
1;
myshort = myshort + 1;
mylong = mylong + 1;
It is important to warn you that
both increase (++) and decrease (--)
operators have a greater priority than the reference operator asterisk (*),
therefore the following expressions may lead to confussion:
*p++;
*p++ = *q++;
The first one is equivalent to
*(p++) and what it does is to increase
p (the address where it points to - not the value
that contains).
In the second, because both increase operators (++)
are after the expressions to be evaluated and not before, first the value of
*q is assigned to *p
and then both q and
p are increased by one. It is equivalent to:
*p = *q;
p++;
q++;
Like always, I recommend you use
parenthesis () in order to
avoid unexpected results.
Pointers to pointers
C++ allows the use of pointers that
point to pointers, that these, in its turn, point to data. In order to do that
we only need to add an asterisk (*)
for each level of reference:
char a;
char * b;
char ** c;
a = 'z';
b = &a;
c = &b;
this, supposing the randomly chosen
memory locations of 7230,
8092 and 10502,
could be described thus:
(inside the cells there is the
content of the variable; under the cells its location)
The new thing in this example is variable
c, which we can talk about in three different ways,
each one of them would correspond to a different value:
c
is a variable of type (char **) with a value of 8092
*c is a variable of type (char*) with a value of 7230
**c is a variable of type (char) with a value of'z'
void pointers
The type of pointer void is a
special type of pointer. void pointers can point to any data type, from
an integer value or a float to a string of characters. Its sole limitation is
that the pointed data cannot be referenced directly (we can not use reference
asterisk * operator on them),
since its length is always undetermined, and for that reason we will always
have to resort to type casting or assignations to turn our void pointer
to a pointer of a concrete data type to which we can refer.
One of its utilities may be for passing generic
parameters to a function:
// integer increaser
#include <iostream.h>
void increase (void* data, int type)
{
switch (type)
{
case sizeof(char) : (*((char*)data))++; break;
case sizeof(short): (*((short*)data))++; break;
case sizeof(long) : (*((long*)data))++; break;
}
}
int main ()
{
char a = 5;
short b = 9;
long c = 12;
increase (&a,sizeof(a));
increase (&b,sizeof(b));
increase (&c,sizeof(c));
cout << (int) a << ", " << b << ", " << c;
return 0;
}
|
6, 10, 13
|
sizeof
is an operator integrated in the C++ language that returns a constant value
with the size in bytes of its parameter, so, for example,
sizeof(char) is 1,
because char type is 1
byte long.
Pointers to functions
C++ allows operations with pointers
to functions. The greatest use of this is for passing a function as a parameter
to another function, since these cannot be passed dereferenced. In order to
declare a pointer to a function we must declare it like the prototype of the
function except the name of the function is enclosed between parenthesis
() and a pointer asterisk (*)
is inserted before the name. It might not be a very handsome syntax, but that
is how it is done in C++:
// pointer to functions
#include <iostream.h>
int addition (int a, int b)
{ return (a+b); }
int subtraction (int a, int b)
{ return (a-b); }
int (*minus)(int,int) = subtraction;
int operation (int x, int y, int (*functocall)(int,int))
{
int g;
g = (*functocall)(x,y);
return (g);
}
int main ()
{
int m,n;
m = operation (7, 5, addition);
n = operation (20, m, minus);
cout <<n;
return 0;
}
|
8
|
In the example,
minus is a global pointer to a function that has
two parameters of type int,
it is immediately assigned to point to the function subtraction,
all in a single line:
int (*
minus)(int,int) = subtraction;
|