Arrays
Introduction
An array is a set of values of the same type, stored together in order. Arrays are represented by variable names followed by square brackets, inside which is the number of members of the array.
int scores[100];
The above example declares an array scores
containing 100 members, each of which is of type int
.
Note that when declaring an array, the size of the array must be given.
The members of an array are numbered from 0
, so the array scores[100]
would be numbered from member 0 all the way to member 99, with the last member numbered 1
less than the length of the array.
The member can be referenced by using square brackets after the array name to specify the number. It is also possible to assign a value to that position by that means.
scores[0] = 13;
scores[99] = 42;
The above example assigns a value to the first and last positions of the array scores
.
Note that it does not report an error if you reference a non-existent array member (i.e. accessing the array out of bounds), so you must be very careful.
int scores[100];
scores[100] = 51;
In the example above, the array scores
has only 100 members, so the location scores[100]
does not exist. However, referencing this location does not report an error and will run normally, causing the area of memory immediately after scores
to be assigned a value that is actually the area of the other variables, and therefore unknowingly changing the value of the other variables. This can easily lead to errors and is difficult to detect.
Arrays can also be declared with curly brackets while assigning a value to each member.
int a[5] = {22, 37, 3490, 18, 95};
Note that when using curly brackets for assignment, you must assign the value at the time of the array declaration, otherwise an error will be reported at compilation.
int a[5];
a = {22, 37, 3490, 18, 95}; // error reported
In the above code, the array a
is declared and then assigned in curly brackets, resulting in an error.
The reason for the error is that the C language states that once an array variable is declared, the address to which the variable refers cannot be modified, as explained later. For the same reason, it is also not allowed to modify the value in curly brackets after the array has been assigned.
int a[5] = {1, 2, 3, 4, 5};
a = {22, 37, 3490, 18, 95}; // error reported
In the above code, it is also not allowed to reassign the array a
after it has been assigned with curly brackets.
When using curly brackets, the value inside the curly brackets cannot be longer than the length of the array, otherwise the compiler will report an error.
If the value inside the curly brackets is less than the number of members of the array, then the unassigned members are automatically initialised to 0
.
int a[5] = {22, 37, 3490};
// Equivalent to
int a[5] = {22, 37, 3490, 0, 0};
If you want to set every member of the entire array to zero, the easiest way to write this is as follows.
int a[100] = {0};
When the array is initialized, you can specify which positions to assign values to the members.
int a[15] = {[2] = 29, [9] = 7, [14] = 48};
In the above example, positions 2, 9 and 14 of the array are assigned values and the values of the other positions are automatically set to 0.
The assignment of the specified positions can be done out of order, and the following is written in a way that is equivalent to the above example.
int a[15] = {[9] = 7, [14] = 48, [2] = 29};
Assignment at specified positions and sequential assignment can be used in combination.
int a[15] = {1, [5] = 10, 11, [10] = 20, 21}
In the example above, numbers 0, 5, 6, 10 and 11 are assigned values.
C allows the number of array members inside square brackets to be omitted, in which case the length of the array will be determined automatically based on the number of values inside the curly brackets.
int a[] = {22, 37, 3490};
// Equivalent to
int a[3] = {22, 37, 3490};
In the above example, the length of the array a
will be determined as 3
based on the number of values inside the curly brackets.
When the number of members is omitted, the length of the array will be the largest specified position plus 1 if the assignment of the specified position is also used.
int a[] = {[2] = 6, [9] = 12};
In the above example, the maximum specified position of the array a
is 9
, so the length of the array is 10.
Array length
The sizeof
operator returns the length in bytes of the entire array.
int a[] = {22, 37, 3490};
int arrLen = sizeof(a); // 12
In the example above, sizeof
returns the byte length of the array a
as 12
.
Since the array members are all of the same type, the byte length of each member is the same, so the byte length of the array as a whole is divided by the byte length of a particular array member to get the number of members of the array.
sizeof(a) / sizeof(a[0])
In the above example, sizeof(a)
is the byte length of the whole array, and sizeof(a[0])
is the byte length of the array members, divided by that is the number of members of the array.
Note that the data type of the return value of sizeof
is size_t
, so the data type of sizeof(a) / sizeof(a[0])
is also size_t
. For the placeholder inside printf()
, use %zd
or %zu
.
int x[12];
printf("%zu\n", sizeof(x)); // 48
printf("%zu\n", sizeof(int)); // 4
printf("%zu\n", sizeof(x) / sizeof(int)); // 12
In the above example, sizeof(x) / sizeof(int)
gives the number of array members 12
.
Multi-dimensional arrays
C allows arrays of multiple dimensions to be declared, with as many dimensions as there are square brackets, e.g. two square brackets for a two-dimensional array.
int board[10][10];
The above example declares a two-dimensional array with 10 members in the first dimension and also 10 members in the second dimension.
A multidimensional array can be understood as if each member of the upper dimension is itself an array. For example, in the above example, each member of the first dimension is itself an array with 10 members, so the whole two-dimensional array has 100 members (10 x 10 = 100).
A three-dimensional array would be declared using three square brackets, and so on.
int c[4][5][6];
To refer to each member of a two-dimensional array, two square brackets are used, specifying both dimensions.
board[0][0] = 13;
board[9][9] = 13;
Note that board[0][0]
cannot be written as board[0, 0]
because 0, 0
is a comma expression that returns a second value, so board[0, 0]
is equivalent to board[0]
.
Like one-dimensional arrays, the first member of each dimension of a multidimensional array is numbered from 0
.
Multidimensional arrays can also use curly brackets to assign values to all members at once.
int a[2][5] = {
{0, 1, 2, 3, 4},
{5, 6, 7, 8, 9}
};
In the above example, a
is a two-dimensional array, and this assignment is written as if each member of the first dimension were written as an array. This way you don't have to assign a value to each member, and the missing members are automatically set to 0
.
Multidimensional arrays can also be assigned initial values by specifying the location.
int a[2][2] = {[0][0] = 1, [1][1] = 2};
In the above example, the values at positions [0][0]
and [1][1]
are specified, and the other positions are automatically set to 0
.
Regardless of how many dimensions the array has, it is stored linearly inside memory, with a[0][0]
followed by a[0][1]
, a[0][1]
followed by a[1][0]
, and so on. Thus, multidimensional arrays can also be assigned using a single level of curly braces, and the following statement is exactly equivalent to the assignment statement above.
int a[2][2] = {1, 0, 0, 2};
Variable-length arrays
When arrays are declared, the length of the array can be used as a variable in addition to a constant. This is called a variable-length array (VLA for short).
int n = x + y;
int arr[n];
In the above example, the array arr
is a variable-length array because its length depends on the value of the variable n
, which the compiler cannot determine in advance, but only knows at runtime what n
is.
The fundamental feature of a variable-length array is that the length of the array can only be determined at runtime. The advantage of this is that the programmer does not have to specify an arbitrary estimate of the length of the array at development time; the program can assign an exact length to the array at runtime.
Any array whose length needs to be determined at runtime is a variable-length array.
int i = 10;
int a1[i];
int a2[i + 5];
int a3[i + k];
In the above example, the lengths of all three arrays need to be known by running the code, and the compiler does not know their lengths, so they are all variable-length arrays.
Variable-length arrays can also be used for multi-dimensional arrays.
int m = 4;
int n = 5;
int c[m][n];
In the above example, c[m][n]
is the two-dimensional variable-length array.
Addresses of arrays
An array is a sequence of consecutive stored values of the same type, and by obtaining the starting address (the memory address of the first member), the addresses of the other members can be deduced. See the following example.
int a[5] = {11, 22, 33, 44, 55};
int* p;
p = &a[0];
printf("%d\n", *p); // Prints "11"
In the above example, &a[0]
is the memory address of the first member of the array a
, 11
, which is also the starting address of the whole array. In turn, from this address (*p
), the value of the first member 11
can be obtained.
Since the start address of an array is a common operation and &array[0]
is a bit cumbersome to write, C provides a convenient way to write it, where the array name is equivalent to the start address, i.e. the array name is a pointer to the first member (array[0]
).
int a[5] = {11, 22, 33, 44, 55};
int* p = &a[0];
// equivalent to
int* p = a;
In the above example, &a[0]
and the array name a
are equivalent.
In that case, if the array name is passed into a function, it is equivalent to passing in a pointer variable. Inside the function, the entire array is then available through this pointer variable.
The function accepts the array as an argument, and the function prototype can be written as follows.
// Write one
int sum(int arr[], int len);
// Write two
int sum(int* arr, int len);
In the above example, passing in an array of integers is the same thing as passing in a pointer to an integer, and the array symbol []
is interchangeable with the pointer symbol *
. The next example sums over members by means of an array pointer.
int sum(int* arr, int len) {
int i;
int total = 0;
// Assume the array has 10 members
for (i = 0; i < len; i++) {
total += arr[i];
}
return total;
}
In the above example, the function is passed a pointer to arr
(also the array name) and the length of the array, and the pointer is used to get each member of the array to sum.
The *
and &
operators can also be used on multi-dimensional arrays.
int a[4][2];
// fetch the value of a[0][0]
*(a[0]);
// is equivalent to
**a
``''
In the above example, since `a[0]` itself is a pointer to the first member of the second dimensional array, `a[0][0]`. So, `*(a[0])` takes out the value of `a[0][0]`. As for `*a`, it is two `*` operations on `a`, the first taking out `a[0]` and the second taking out `a[0][0]`. Similarly, `&a[0][0]` of a two-dimensional array is equivalent to `*a`.
Note that the address pointed to by the array name cannot be changed. When declaring an array, the compiler automatically allocates a memory address for the array, which is bound to the array name and cannot be changed; the following code will report an error.
```c
int ints[100];
ints = NULL; // error reported
In the above example, re-assigning the array name and changing the original memory address will report an error.
This also results in not being able to assign an array name to another array name.
int a[5] = {1, 2, 3, 4, 5};
// Write one
int b[5] = a; // error reported
// Write two
int b[5];
b = a; // report an error
``''
Both of the above writes will change the address of the array `b`, resulting in an error being reported.
## Addition and subtraction of array pointers
Inside C, array names can be added and subtracted, which is equivalent to moving backwards and forwards between array members, i.e. from the memory address of one member to the memory address of another. For example, `a + 1` returns the address of the next member, and `a - 1` returns the address of the previous member.
```c
int a[5] = {11, 22, 33, 44, 55};
for (int i = 0; i < 5; i++) {
printf("%d\n", *(a + i));
}
In the above example, the array is traversed by a pointer move, with each round of the loop for a + i
pointing each time to the address of the next member, and *(a + i)
taking out the value of that address, which is equivalent to a[i]
. For the first member of an array, *(a + 0)
(i.e. *a
) is equivalent to a[0]
.
Since array names are equivalent to pointers, the following equation always holds.
a[b] == *(a + b)
The above code gives two ways of accessing array members, one using the square brackets a[b]
and the other using the pointer *(a + b)
.
If the pointer variable p
points to a member of the array, then p++
is equivalent to pointing to the next member, and this method is often used to traverse the array.
int a[] = {11, 22, 33, 44, 55, 999};
int* p = a;
while (*p ! = 999) {
printf("%d\n", *p);
p++;
}
In the above example, the variable p
is made to point to the next member by p++
.
Note that the address pointed to by the array name cannot be changed, so in the above example, you cannot directly self-increment a
, i.e. a++
is written incorrectly; you must assign the address of a
to the pointer variable p
, and then self-increment p
.
Traversing an array is generally done by comparing the length of the array, but it can also be done by comparing the start address and the end address of the array.
int sum(int* start, int* end) {
int total = 0;
while (start < end) {
total += *start;
start += *start; start++;
}
return total;
}
int arr[5] = {20, 10, 5, 39, 4};
printf("%i\n", sum(arr, arr + 5));
In the above example, arr
is the start address of the array and arr + 5
is the end address. As long as the start address is less than the end address, it means that the end of the array has not been reached yet.
In turn, by subtracting the array, you can tell how many array members are between the two addresses. See the following example to implement a function to calculate the length of an array yourself.
int arr[5] = {20, 10, 5, 39, 88};
int* p = arr;
while (*p ! = 88)
p++;
printf("%i\n", p - arr); // 4
In the above example, the address of an array member, minus the array start address, tells you how many members there are between the current member and the start address.
For multi-dimensional arrays, the addition and subtraction of array pointers means different things for different dimensions.
int arr[4][2];
// Pointer to arr[1]
arr + 1;
// pointer to arr[0][1]
arr[0] + 1
In the above example, arr
is a two-dimensional array, and arr + 1
is moving the pointer to the next member of the first-dimensional array, arr[1]
. Since each member of the first dimension, itself, contains another array, i.e. arr[0]
is a pointer to the second dimensional array, arr[0] + 1
is meant to move the pointer to the next member of the second dimensional array, i.e. arr[0][1]
.
When pointers to two members of the same array are subtracted from each other, the distance between them is returned.
int* p = &a[5];
int* q = &a[1];
printf("%d\n", p - q); // 4
printf("%d\n", q - p); // -4
In the above example, the variables p
and q
are pointers to array position 5 and position 1 respectively, and they are subtracted to equal 4 or -4.
Copying of arrays
Since array names are pointers, copying arrays cannot simply be done by copying the array names.
int* a;
int b[3] = {1, 2, 3};
a = b;
The way the above is written, instead of copying the array b
to the array a
, the result is that a
and b
point to the same array.
The easiest way to copy an array is still to use a loop and copy the array elements one by one.
for (i = 0; i < N; i++)
a[i] = b[i];
In the above example, the assignment of the array b
is achieved by copying the members of the array a
to the array a
one by one.
Another way is to use the memcpy()
function (defined in the header file string.h
) to take the section of memory where the array is located and copy another copy directly.
memcpy(a, b, sizeof(b));
The above example copies the section of memory where the array b
is located, to the array a
. This method is faster than copying the array members in a loop.
As arguments to a function
Declaring arrays of arguments
Arrays as arguments to functions are generally passed both the array name and the length of the array.
int sum_array(int a[], int n) {
// ...
}
int a[] = {3, 5, 7, 3};
int sum = sum_array(a, 4);
In the above example, the first argument to the function sum_array()
is the array itself, which is the array name, and the second argument is the length of the array.
Since the array name is a pointer, if only the array name is passed, the function will only know the address of the start of the array, not the end, which is why the array length needs to be passed in as well.
If the function's argument is a multi-dimensional array, then the length of the other dimensions needs to be written into the function's definition, except for the length of the first dimension, which can be passed into the function as an argument.
int sum_array(int a[][4], int n) {
// ...
}
int a[2][4] = {
{1, 2, 3, 4},
{8, 9, 10, 11}
};
int sum = sum_array(a, 2);
In the above example, the argument to the function sum_array()
is a two-dimensional array. The first argument is the array itself (a[][4]
), which can be written without the length of the first dimension, as it is passed into the function as the second argument, but must be written with the length of the second dimension 4
.
This is because what the function gets internally is just the starting address of the array a
and the number of members of the first dimension 2
. If you want to calculate the end address of the array correctly, you must also know the length in bytes of each member of the first dimension. Write int a[][4]
and the compiler knows that each member of the first dimension is itself an array containing 4 integers, so the length in bytes of each member is 4 * sizeof(int)
.
Variable-length arrays as arguments
Variable-length arrays are written slightly differently when used as function arguments.
int sum_array(int n, int a[n]) {
// ...
}
int a[] = {3, 5, 7, 3};
int sum = sum_array(4, a);
In the above example, the array a[n]
is a variable-length array whose length depends on the value of the variable n
and can only be known at runtime. Therefore, when the variable n
is used as an argument, the order must precede the variable-length array so that the length of the array a[n]
can be determined at runtime, otherwise an error will be reported.
Because function prototypes can omit parameter names, the variable name can be omitted from the prototype of a variable-length array by using *
instead of the variable name.
int sum_array(int, int [*]);
int sum_array(int, int []);
Both of the above prototypes of variable-length functions are written in a way that is legal.
One advantage of using variable-length arrays as function arguments is that multi-dimensional arrays can be declared with their arguments, leaving out the later dimensions.
// The original way of writing
int sum_array(int a[][4], int n);
// Variable-length array written
int sum_array(int n, int m, int a[n][m]);
In the above example, the argument to the function sum_array()
is a multidimensional array, and as originally written, the length of the second dimension must be declared. But with the variable-length array writeup, there is no need to declare the second dimensional length, as it can be passed into the function as an argument.
Array literals as arguments
The C language allows array literals to be passed into functions as arguments.
// Array variables as arguments
int a[] = {2, 3, 4, 5};
int sum = sum_array(a, 4);
// array literals as arguments
int sum = sum_array((int []){2, 3, 4, 5}, 4);
In the above example, the two ways of writing are equivalent. The second way of writing omits the declaration of the array variable and passes the array literal directly into the function. {2, 3, 4, 5}
is the array value literal, and (int [])
is similar to a forced type conversion, telling the compiler how to interpret the set of values.