Programming in C for pets: Arrays and pointers

2026-04-04

The remaining blind spot in our example is char* argv[]:

#include <stdio.h>

int main(int argc, char* argv[])
{
    for (int i = 0; i < argc; i++) {
        puts(argv[i]);
    }
    return 0;
}

Similar to int, char is a type name for characters. However asterisk that follows it tells us that argv is not a character, it's a pointer to... no, not to a character. Brackets after argv stand for array, so argv is a pointer to an array of pointers to character.

It seems pet can't do without a drawing.

The memory of a computer consists of cells one byte each. They are numbered from zero up to the last gigibyte (four, for example):

Cell number  0 1 2 3 4 5 6 7 8      4294967295
            ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬···─┬─┐
Content     │9│4│0│2│6│8│0│3│1│    │5│
            └─┴─┴─┴─┴─┴─┴─┴─┴─┴···─┴─┘

Cell number is its address, i.e. just a number which, for example, can be printed:

#include <inttypes.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
    printf("%" PRIuPTR "\n", (uintptr_t) argv);
    return 0;
}

We won't dig into details of printf and inttypes.h, just compile and run:

cc --std=gnu23 -Wall -Wextra -pedantic -Werror -Wno-unused-parameter -O3 -o myprogram myprogram.c
./myprogram
140721166966376

The address can be different with each run. What is important that it points somewhere. That's why humans call it a pointer.

An attentive reader might ask: how much memory does your computer have? 140721166966376 bytes is 140721166966 kilobytes, or 140721166 megabytes, or 140721166 gigabytes, or 140721166 terabytes... wow, how much at last!!!???

Just 16 gigibytes.

140721166966376 - it's a virtual address. The address space on 64-bit CPUs can be up to 64 bit wide and the max value for address is 2 to the power of 64 minus one. It's like in the parable of the grains on the chessboard - 18446744073709551615. The operating system translates virtual addresses to physical using page tables and caches results in TLB.

In reality most CPUs use 48 or 56 bits for the address to reduce the size of page tables. Anyway, that's a lot.

Addresses are usually represented in hexadecimal, and printf function from the standard library has a special %p conversion specifier for that:

#include <stdio.h>

int main(int argc, char* argv[])
{
    printf("%p\n", (void*) argv);
    return 0;
}

Let's compile and run:

0x7ffc98a2f9a8

0x is a prefix used in C for hexadecimal numbers. 0x7ffc98a2f9a8 in decimal is 140722869303720. Slightly different than previously, but pet has told that it can change.

So, argv is a pointer. Precisely, a pointer to an array of pointers. The bit width of pointers usually matches CPU bitness. On 64-bit CPUs pointers are 64 bit, on 32-bit CPUs they are 32 bit accordingly. But that's not always true. For example, early x86 CPUs were 16-bit, but the address space was 20 bits.

To make our first example work on any CPU we used uintptr_t. It's an integer type that can hold any value of the address. If we disregard type conversion we can easily assign 64-bit address to a 32-bit integer and lose a half of bits. We'll study this aspect later, but now let's modify our program to print all pointers:

#include <stdio.h>

int main(int argc, char* argv[])
{
    printf("argv = %p\n", (void*) argv);
    for (int i = 0; i < argc; i++) {
        printf("argv[%d] = %p\n", i, (void*) argv[i]);
    }
    return 0;
}

If we run it with arguments we'll get:

./myprogram one two three
argv = 0x7ffce432a3c8
argv[0] = 0x7ffce432c56b
argv[1] = 0x7ffce432c577
argv[2] = 0x7ffce432c57b
argv[3] = 0x7ffce432c57f

Let pet draw a picture, but this time pet draws the memory not in a single row, instead, pet draws it in 8 colums because 64-bit pointer takes 8 bytes and one byte has 8 bits as we know:

Address: 0
        ┌──┬──┬──┬──┬──┬──┬──┬──┐
        │  │  │  │  │  │  │  │  │
        ·  ·  ·  ·  ·  ·  ·  ·  ·
        ·  ·  ·  ·  ·  ·  ·  ·  ·
        ·  ·  ·  ·  ·  ·  ·  ·  ·
Address: 0x7ffce432a3c8
        ├──┼──┼──┼──┼──┼──┼──┼──┤
argv  → │00│00│7f│fc│e4│32│c5│6b│
        ├──┼──┼──┼──┼──┼──┼──┼──┤
        │00│00│7f│fc│e4│32│c5│77│
        ├──┼──┼──┼──┼──┼──┼──┼──┤
        │00│00│7f│fc│e4│32│c5│7b│
        ├──┼──┼──┼──┼──┼──┼──┼──┤
        │00│00│7f│fc│e4│32│c5│7f│
        ├──┼──┼──┼──┼──┼──┼──┼──┤
        ·  ·  ·  ·  ·  ·  ·  ·  ·
        ·  ·  ·  ·  ·  ·  ·  ·  ·
        ·  ·  ·  ·  ·  ·  ·  ·  ·
Address: 0x7ffce432c56b
        ·  ·  ·  ┼──┼──┼──┼──┼──┤
                 │ .│ /│ m│ y│ p│
        ├──┼──┼──┼──┼──┼──┼──┼──┤
        │ r│ o│ g│ r│ a│ m│
        ├──┼──┼──┼──┼──┼──┼  ·  ·

Address: 0x7ffce432c577
        ·  ·  ·  ·  ·  ·  ·  ┼──┤
                             │ o│
        ├──┼──┼──┼──┼──┼──┼──┼──┤
        │ n│ e│
        ├──┼──┼  ·  ·  ·  ·  ·  ·

Address: 0x7ffce432c57b
        ·  ·  ·  ┼──┼──┼──┼  ·  ·
                 │ t│ w│ o│
        ·  ·  ·  ┼──┼──┼──┼  ·  ·

Address: 0x7ffce432c57f
        ·  ·  ·  ·  ·  ·  ·  ┼──┤
                             │ t│
        ├──┼──┼──┼──┼──┼──┼──┼──┤
        │ h│ r│ e│ e│
        ├──┼──┼──┼──┼  ·  ·  ·  ·
        ·  ·  ·  ·  ·  ·  ·  ·  ·
        ·  ·  ·  ·  ·  ·  ·  ·  ·

So, at the address 0x7ffce432a3c8 (i.e. argv) we have an array of elements of the same pointer type and each takes 8 bytes. There are 4 pointers there and this number is stored in argc. Pointers are pointers to strings, we get these pointers accessing array elements by index, and we pass these pointers to puts function:

puts(argv[i]);

which prints the string to the screen.

Relationship between arrays ans pointers

Actually, argv points to the first element of the array. If we add eight to argv it will point to the second element. If we add eight once again it will point to the third. And so on.

We can modify our example like this:

#include <stdio.h>

int main(int argc, char* argv[])
{
    for (int i = 0; i < argc; i++) {
        puts(argv[0]);
        argv++;
    }
    return 0;
}

So, on each iteration we print the first element, but as long as we increment argv, it actually points to the i-th element.

"Wait" - you say - ++ increments by one but pointers take 8 bytes.

There's no mistake here. In C, the pointer arithmetic takes element size into account and automagically multiplies integer value by element size. Otherwise it would be a pain to write a program that could work without modifications on any CPU with any bit width of addresses.

However, if we treat argv as a pointer, not as an array, it would be more natural to declare it as a pointer to a pointer:

#include <stdio.h>

int main(int argc, char** argv)
{
    for (int i = 0; i < argc; i++) {
        puts(*argv);
        argv++;
    }
    return 0;
}

Declarations char* argv[] and char** argv are equivalent. We can leave puts call as is, puts(argv[0]);, but it would be natural to use pointer dereferencing operation in this case. Dereference means getting a value stored at the given address: puts(*argv);

The last touch is merging dereferencing with increment:

#include <stdio.h>

int main(int argc, char** argv)
{
    for (int i = 0; i < argc; i++) {
        puts(*argv++);
    }
    return 0;
}

The first operation in *argv++ expression is dereferencing. After that the pointer is incremented. This is the most used construct in C. It can be found in any program.

And if we're talking about increment, it can be prefixed, i.e. *++argv. Here the pointer is incremented first, and then it is used to get the value. So, if we want to skip the program file name, we can do this way:

#include <stdio.h>

int main(int argc, char** argv)
{
    for (int i = 1; i < argc; i++) {
        puts(*++argv);
    }
    return 0;
}

for loop revisited

When pointers are used in for loop, this construct becomes suboptimal. The counter i takes one CPU register. The pointer also takes one register. The value to compare counter with, it takes either register or memory cell.

An optimal iterator with pointers is when the pointer is compared against the end value, without using the counter 'i':

#include <stdio.h>

int main(int argc, char** argv)
{
    for (char** end = argv + argc; argv < end;) {
        puts(*argv++);
    }
    return 0;
}

Careful with arguments

It's not a good practice to modify function arguments. Arguments can be used many times and it's better not to touch them.

As an exclusion, only very short functions can do that.

As our examples.

But in most cases it's better to use a separate pointer, for example arg instead of argv:

#include <stdio.h>

int main(int argc, char** argv)
{
    for (char **arg = argv, **end = argv + argc; arg < end;) {
        puts(*arg++);
    }
    return 0;
}

Stylistics

In the declaration char **arg = argv, **end = argv + argc asterisks belong to variables, not to char, so for each variable they are their own.

Such a declaration would be incorrect: char** arg = argv, end = argv + argc, because end would have char type, not char**.

It's better to avoid such complicated declarations. Do you see where thinking leads if we give it the freedom to declare everything in one place? It leads to a complicated, unreadable, and error-prone code.

KISS means keep it simple, stupid. This is one of fundamental design principles and it's better to stick to it.

Let's rewrite our loop:

#include <stdio.h>

int main(int argc, char** argv)
{
    char** arg = argv;  // use a separate variable instead of argv
    char** end = argv + argc;  // final value for arg
    while (arg < end) {
        puts(*arg++);
    }
    return 0;
}

The principle is simple: each variable is declared on a separate line. First, this leaves plenty of space for comments. Single-line comments start with //. When the compiler encounters such a sequence it skips to the next line.

Second, and most important, the ambiguity with asterisks is eliminated. We attach them to the type, not to a variable. This is in line with type aliases declared with typedef:

typedef char** char_ptr_ptr;
char_ptr_ptr arg = argv, end = argv + argc;

Do you see? Both variables in this example are of type char**.

The fact that asterisks are related to variables and not to the type is rather a misconception than a feature. Pets must avoid this ambuguity.


Meow.

Pet has limited explanatory capabilities and might miss something. Litte kitties may ask questions on Mastodon and pet will improve this article.