C Pointers Explained, Really

While I was in college, a friend of mine complained that he was confused while programming in C, struggling to learn the syntax for pointers.

He gave the example of something like: *x=**p++ being ugly and unreadable, with too many operations layered on each other, making it hard to tell what was happening.  He said he had done a bit of programming with assembly language, but he wasn’t accustomed to the nuances of C.

I wrote the following explanation on our student message board, and I got a lot of good feedback.  Some people said that they had been programming in C for years, but not until they read my post did they finally understand pointers.  So here it is, unearthed from my backups and slightly edited.  I hope it helps someone again…


Message 1956 (8 left): Thu Jan 25 1990 2:44am
From: Bill! (easterb@ucscb)
Subject: Okay

Well, if you know assembly, you have a head start on many of the CS freshpersons here. You at least know about memory maps: RAM is a long long array of bytes. It helped me to learn about pointers if I kept this in mind. For some reason, books and instructors talking about pointers want to overlook this.

When I have some code:

main()
{
  int n;
  int *p;

There is a place in my memory that looks like this:

            :
Address:    :
         |-----|
0x5100   |     | n is an integer, one machine word big
         |-----|
0x5104   |     | p is a pointer, also one word big
         |-----|
0x5108   |     | other unused memory
         |-----|
            :
            :

Let’s give these variables some values. I set n to be the number 151.

n = 151;

I set the pointer p to point to the integer n.

p = &n;

That says, “the value of the variable p is assigned the address of the variable n”.

           :
Address:   :    Value at that address:
         |----|
0x5100   | 151| n
         |----|
0x5104   |5100| p
         |----|
0x5108   | ?  |
         |----|
           :
           :

Now I want to print out the value of n, by two ways.

printf("n is %d.n", n);
printf("n is %d.n", *p);

The * operator says, “give me the object at the following address.” The object’s type is the type that the pointer was declared as. So, since we declared “int *p”, the object pointed at will be _assumed_ by C to be an int. In this case, we were careful to make this coincide with what we were pointing at.

Now I want to print out the memory address of n.

printf("n is located at $%x.n", &n);
printf("n is located at $%x.n", p);

The & operator says, “tell me the address where the following object starts.” In this case, it is hex 5100 (I put a ‘$’ before it, to conform to the Assembly notation I am used to). Notice the _value_ of p is an address.

Hm. Does p have an address? Sure. It is a variable, and all variables have their own address. The address of p is hex 5104.

printf("p is located at $%x.n", &p);

Here we are taking the address of a pointer variable, using the & operator.

main()
{
char name[] = "Bill";
char *p;
int *q;

Now we have an array to play with. Here’s how memory looks now:

       |---|
0x5100 |'B'| "name" is an address constant that has value hex 5100
       |---|
0x5101 |'i'| char: 1 byte
       |---|
0x5102 |'l'| char: 1 byte
       |---|
0x5103 |'l'| char: 1 byte
       |---|
0x5104 |   | char: 1 byte
       |---|
0x5105 |   | p is a pointer: 1 word
       |---|
0x5109 |   | q is a pointer: 1 word
       |---|
p = name;

We set p to the value of name. Now p has value hex 5100 too. We can use the * dereferencing operator on p, and get the character ‘B’ as a result.

Now what happens if I do this:

++p;

The pointer p is incremented. What value does it have now? Hex 5101. Pretty simple.

Now let’s try something irresponsible:

q = name;

But q is a pointer to int! If we dereference q, it will take the word (typically 4 bytes) beginning at address “name” (which is hex 5100) and try to convert it to an int. ‘B’, ‘i’, ‘l’, ‘l’ converted to an int will be some large number, dependant on the bit-ordering algorithm on your machine’s architecture. On ucscb, it becomes 1114205292. (to see how, line up the binary representation of the ascii values for those 4 characters, and then run the 32 bits together, and convert that resultant binary number as an integer.)

What we have just seen here is a key issue of pointers that I mentioned earlier: C assumes that what they are pointing at is an object of the type that the pointer was designed to point at. It is up to the programmer to make sure this happens correctly.

++q;

The int pointer is incremented. What value does it have now? Hex 5104. Huh?!? The answer is simple if you accept the above paragraph. It gets incremented by the size of the object it _thinks_ it is pointing at. It’s an int pointer, so incrementing it makes it advance a number of bytes equal to the size of an int.

Now print the dereferenced value of q (i.e. the value of the object q is pointing to). Well, it’s pointing at a null byte, and then the first 3 bytes of the char *p. Now we’re all messed up. Nice going. Try to convert _that_ to an integer representation. Well actually, C will do it happily. But it’ll be another weird number.

main()
{
  int n;

  n = 151;
  f(n);
}

f(x)
int x;
{
  printf("%d.n", x);
}

Here is a simple program that passes an int “by value”. That is, it copies the value of n into the new variable x!

       |---|
0x5100 |151| n is an integer
       |---|
0x5104 |151| x is another integer
       |---|

When we mention x, we are using the value at location 5104, and we can change it, read it, whatever, and it won’t affect n, the int at location 5100.

But what if we want to have f() modify the value and then have that new value be available in main()? C does this by passing the variable “by reference”.

main()
{
int n;

  n = 151;
  f(&n);
}

f(x)
int *x;
{
  printf("%d.n", *x);
  *x = 451;
}

Pass the _address_ of n, and declare x as a _pointer_ to int. Actually, this is still passing by value, but the value being passed is the address, not the number.

       |----|
0x5100 | 151| n is an integer
       |----|
0x5104 |5100| x is a pointer to int
       |----|

Now if f() when we make use of *x, we are referring to the value at location 5100. This is the location of n. After the assignment “*x = 451;”, this is what we have:

       |----|
0x5100 | 451| n is an integer
       |----|
0x5104 |5100| x is a pointer to int
       |----|

x still points to location 5100, but we have changed the value of the object at that location.

Well, those are the basics. You mentioned things like “*x=**p++” being ugly and unreadable. Well, yeah, but here is a diagram that may help:

       |----| here is a word in memory with initial value 0. 
0x5100 | 0  | no variable name
       |----|
0x5104 | 12 | here is a value, a word in memory. no variable name.
       |----|
0x5108 |5104| Here is an int pointer, pointing at the previous word.
       |----|
0x511c |5108| here is p, a pointer to int pointer.
       |----|
0x5120 |5100| here is x, a pointer. guess where it's pointing.
       |----|

First let’s see what p and x were declared as:

int *x; /* pointer to int */
int **p; /* pointer to pointer.

The subordinate pointer is a pointer to int.*/

You should know now what “*x” means. It means, “the value of location 5100.”

And you know what “*p” means, “the value of location 5108”. Now that value is another address! Okay, let’s dereference that address: “**p” and we find (by the declaration) an int.

Now “*x = **p” looks like, “this int at 5100 gets the value of that int at 5104.”

And what does “**p++” mean? Well, ++ binds tighter than *, so this is equivalent to: *( *( p++ ) )

Or, “pointer to pointer to int, and by the way, after we’re done, p has been incremented. But we looked where it was pointing before it got incremented, so we don’t care. Let the next statement worry about it.”


This content is copyright 2012 by Bill Karwin.  I’ll share it under the terms of the Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0 Unported.

http://creativecommons.org/licenses/by-nc-sa/3.0/

Posted

in

by

Tags:

Comments

28 responses to “C Pointers Explained, Really”

  1. Bushinji Avatar

    Great post!
    I finally got it, thanks a bunch! 🙂

  2. Marin Todinov Avatar

    Dear Programming Guru,

    You are an absolute legend, ive been programming for 4 years and i have a masters in computer science, your explanation of pointers has helped me increase my efficiency in recursive functions and made a map in my breain of how these basic fundamental structers.

    Not only is your explanation clear, but it is exellant, Thank you So much for your sharing of this fundamental knowledge, i will repay the favour some day and teach someone else like you have me, its the least i can do.

    Thank you again,

  3. Marin Todinov Avatar

    This comment has been removed by the author.

  4. quantumdude Avatar

    You should use "%p" for printing pointers, not "%x", because this can cause problems where the size of an unsigned integer is less than the size of a pointer (i.e. 64-bit Intel PCs). Also, for complete portable, the pointers will need to be cast to void pointers (but this usually isn't a problem and can be ignored).

  5. Gogus Avatar

    Hi,
    Great explanation. I would add only a few words about pure arrays (passing arrays as function arguments).

    Thanks,
    Bogdan

  6. i.love.it Avatar

    i think you are wrong when u said:
    "the first 3 bytes of the char *p"

    p is a character type and only has 1 byte?

    would q actually overlap its own bytes coming after p?

  7. Bill Karwin Avatar

    Hi i.love.it,

    p is not a character, p is a pointer.

    A pointer must be large enough to hold an address of memory. At the time I wrote this article, pointers were 4 bytes. Look at the addresses, p is stored in the 4 bytes starting at 0x5105, and q is stored in the 4 bytes starting at 0x5109.

    So incrementing q did cause it to point to a space of 4 bytes starting at 0x5104, which includes the null at the end of the string, and part of the pointer p.

  8. danmux Avatar

    Great explanation. I would only add something about the variable names themselves don't actually exist when the program runs, that they are just handy readable tokens for the human, and the compiler maintains a symbol table to convert them to addresses. This has been a common source of confusion in my experience.

  9. Chase S Avatar

    Really, no one teaches it like this?

    I figured this out many years ago, and have tried explaining it to many of my fellow students, perhaps I will just link them here from now on.

    Good work!

  10. Bill Karwin Avatar

    @danmux, thanks, that's a good point. I originally wrote that post for someone who said he was already familiar with assembly programming, so I figured that point about identifiers not being real in the compiled code was understood.

    @Chase S, thanks! You may even copy and distribute this post if you wish, under the license terms I mention at the end.

  11. Gathogo Avatar

    Very very well explained!!
    Thanks

  12. Anthony Cesaro Avatar

    First of all, thanks for the clear and concise explanation! I've been finding a lot of different attempts to better explain the concept and use of pointers and they all provide a good alternative to the way they are described in an academic context.

    In your final diagram that explains how *x=**p++ works, shouldn't the last two addresses be 0x510c and 0x5110 instead of 0x511c and 0x5120? I guess it doesn't matter if those are just arbitrary addresses, but to be consistent with the addresses above it I would think you would just add 4 bytes to them since they are pointers (4 bytes at the time of you writing this article as mentioned).

    I'm still learning this stuff on the side, as I am not a full-time programmer by trade (Unix systems engineer), but I'm trying to beef up my understanding of such lower level concepts to improve my troubleshooting and debugging abilities. Let me know if what I suggested as a correction isn't correct. 🙂

  13. Bill Karwin Avatar

    @Anthony Cesaro:

    Yes, you're right, that's a mistake on my part. If the addresses were contiguous, the next one 4 bytes after 0x5108 would be 0x510c. Good catch.

    But in practice, variables might not be allocated contiguously. They might be, but it's a detail handled by the operating system, and it might be implemented differently on another system. So you shouldn't assume variables are contiguous.

  14. Anonymous Avatar
    Anonymous

    OMG! Finally I found well-explained one. Thank you so much!

  15. akmal niazi khan Avatar

    This blog awesome and i learn a lot about programming from here.The best thing about this blog is that you doing from beginning to experts level.

    Love from

  16. Priyansh Ramnani Avatar

    Thanks a lot sir! 🙂
    I had a great difficulty understanding pointers
    This made my concept crystal clear!!

  17. Bill Karwin Avatar

    Thanks for the link Prateek, that looks like a good article. It has nice illustrations too. My article suffers from the fact that I originally wrote it a long time ago in a text-only online community.

  18. Prateek Pandey Avatar

    Yes, Bill it is.

    Consider mentioning it in your article

  19. Michael Fulton Avatar

    Thanks for this excellent read. I'm going to let the Qt API introduce me to cpp (which I know isn't "real" cpp). C and cpp remind me of Node JavaScript in the sense that it will let you shoot yourself in the foot if you don't make good choices. Pointers are something I haven't dealt with before and this made it crystal clear. Fwiw I appreciated original formatting. 🙂

  20. Sujitkumar Avatar

    Nice tutorial. Thanks for sharing the valuable info about c Training. it’s really helpful. Who want to learn c language this blog most helpful. Keep sharing on updated tutorials…..

  21. Mukul Avatar

    Thanks a lot .

  22. Jun Zhou Avatar

    Hi, thanks so much for your excellent tutorial. I am new to c programming and trying to understand pointers with the following simple test, but stucked

    int x=5,y=6, *p;
    p = &y;

    printf("1. address of y is %dn",p);
    printf("2. address of y is %dn",&y);
    printf("3. address of x is %dn",&x);

    with codeblock compiler (16.01) running on a 64bit pc I got the returns as

    1. address of y is 2686740
    2. address of y is 2686740
    3. address of x is 2684744

    curiously I would like to see what "p+1" would be

    printf("4. p+1 = %dn",p+1);

    and the result is

    4. p+1 = 2686744

    as this is the address for "x" I then go bit further to see what is the value of *p now and
    is it equals to x

    printf("5. *p = %dn", *p);
    printf("6. x = %dn", x);

    and the returns are

    5. *p = 6
    6. x = 5

    this makes me realized that the address p+1 is nothing to do with *p so I tried

    int *p1;
    *p1 = *(int *)(p+1);

    printf("the value of *p1 = %dn",*p1);

    the compiling is OK but the run gives me this

    Process returned -1073741819 (0xC0000005)

    I am lost, so please help.

    thnaks

  23. Michael Fulton Avatar

    Hi Jun,

    Maybe I can explain what p + 1 actually does in this case. When you add 1 to p, you add the size (in bytes) of 1 element that p is pointing to. For example, since p is pointing to an Integer, and integer is 4 bytes, the memory address of p will be advanced by 4 bytes when you add 1. If it was "long" data type, like `long *p;` then adding one would advance it 8 bytes (or whatever it is on your machine).

    Hope this helps.

  24. Jun Zhou Avatar

    Hi Michael,

    Thank you very much.

    I have no problem in understanding "p+1" has the value of "2686744" which seems to me is the
    address of "x". what make me lost is the last bit of the test with *p1, what I am expecting
    is that *p1 is a pointer pointing to x.

    Cheers,

  25. Michael Fulton Avatar

    (p + 1) -> pointer equivalent to x…
    Casting it to pointer (why?) With (int *)…
    Then dereferencing with *, so now you have the value of x (which is 6)… Then assigning that 6 to the *p1 which should be a memory address.

    If you want *p1 to be memory address of x, why not "int *p1 = p + 1"? Or if you want value of x, int X2 = *(p + 1) so X2 == 6.

  26. Bill Karwin Avatar

    Jun Zhou, In your last test, you seem to be trying to set the value of the memory that p1 points to.

    In your first lines of code, you initialize the pointer p to the address of y (p = &y). This is okay because y has a real location where an integer value is stored.

    But in your last test, p1 is declared a pointer, but you give it no address of an int variable. Therefore p1 is uninitialized, and it is not guaranteed that it points to a valid location where any int can be stored. It's like it's pointing to a location off the edge of the world.

    Then you set the value at *p1, it's undefined what happens. Is the value set? Can it be set? There's no telling. And your test seems to show that it's invalid.

    Pointers don't implicitly get a space to store the value they point to. That has to be allocated by a concrete variable.

    I would also comment that it's risky using pointer arithmetic to increment a pointer to point to a separate discrete variable. I don't think it's guaranteed that compilers will always store variables contiguously or in the same direction of increasing memory addresses. I wouldn't rely on that, except if I'm advancing a pointer through elements of an array.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.