Learning Objectives
- Explain why the use of functions is desirable and write C++ programs that make use of them.
- Explain the different types of variable scope in C++ and declare and use them appropriately.
- Explain the difference between passing arguments by value and reference and use this in C++ programs
- Implement recursive and iterative solutions to problems and compare and contrast them with regards to efficiency and simplicity
Previously we described how conditional and iterative control structures are implemented in C++, and in this chapter we list the third type of control structure, subprograms.
Often, to perform the same of a similar sequence of operations in different parts of our program rather than enter the same statements multiple times, it is better to reuse a single piece of code. This is considered good practice because it can make code:
A function is a sequence of statements defined by the programmer that is identified by name. The components of a function are:
Return_type function_name (argument_list)
{
statements
}
To illustrate the use of a function, consider the following example program which calculates the standard error of a data sample ( $ \frac{standard \ deviation}{\sqrt{sample \ size}} $ ) using a function named std_err
:
#include <iostream>
#include <math.h>
using namespace std;
// function prototype
double std_err (double std_dev, int n);
int main() {
const double std_dev = 3.0;
for (int i = 1; i < 30; i++)
cout << "Std dev = " << std_dev
<< ", Sample Size = " << i
<< ", Std err = "
<< std_err(std_dev,i)
<< endl;
return 0;
}
// function signature
double std_err (double std_dev, int n)
{
return std_dev / sqrt(n);
}
The fourth line of the program contains double std_err (double std_dev, int n);
which is the function prototype. This tells the compiler that the std_err
function exists and it will be fully defined later. Although it is possible to include the entire function definition before the main
function definition or even in header files.
The last four lines of the code above represent the function signature and it contains the following information:
double
std_err
double std_dev, int n
return std_dev / sqrt(n);
The function call denotes where the function is called, and in this case it is << std_err(std_dev,size) << endl;
. The function call will return a value of the return type.
Note that in C++ functions only return a single value but there are ways around this.
With larger programs, code readability can be improved if related code/functions are separated into separate source files which can also promote code reuse. It is typical to store:
#include
can be used to make sure that the appropriate code is available.
Default values can also be specified for function arguments, making it possible to omit the argument when the function is called - the missing argument will take on the specified value, for example consider this alternate definition of std_err
:
double std_err (double std_dev, int n=10)
{
return std_dev / sqrt(n);
}
Here the =10
after the argument n
means that if omitted the argument n
will take on the value 10. This function can still be called using both arguments, for example:
float se1 = std_err (sd, 5); // n takes a value of 5
float se2 = std_err (sd); // n takes default value of 10
When calling a function, its argument types must match the values passed. Consider a function:
Pretty_print(int x);
int a = 33;
pretty_print(a);
With an integer input, the types match and function completes without errors.
char c = 'x';
pretty_print(c);
char
can be converted to int
using ASCII form, and the function will still work.
string str = "hello";
pretty_print(str);
string
cannot be cast into int
- this will lead to a compilation error.
Function overloading is where multiple functions with the same name but different argument types can be built. Consider the following example:
int main() {
int a = 33;
pretty_print(a);
char c = 'x';
pretty_print(c);
string str = "hello";
pretty_print(str);
return 0;
}
void pretty_print (int x) {
cout << "Integer: " << x << endl;
}
void pretty_print (string x) {
cout << "String: " << x << endl;
}
The first two pretty_print()s call the int version of the function, whereas the third calls the string version. the varying overloaded functions can even have different numbers of arguments.
As C++ performs static type checking the compiler will decide which of the overloaded functions to use for each function call.
Many variables can only be used in a limited part of the overall program. This is known as the scope of the variable. there are 3 types of scope in C++:
A variable declared outside of all functions, usually at the beginning of the program will have global scope (ie. between the using namespace ---
and int main()
). Variables with global scope are accessible from anywhere in the program, after the declaration.
A variable which is a function argument or declared inside a function body will have Function scope. Function scope variables are only accessible inside the function, after the declaration.
A variable declared inside a code block (within the {…….}) with have block scope. Block scope variables are only accessible inside the block, after their declaration.
Note that, if at any point there are two variables with the same identifier in scope, any use of that identifier will access the ‘closest’ one in scope, i.e. block, then function, then global.
So far, when arguments have been passed to functions they have been passed by value. This is where the value of the variable is copied into a new variable inside the function. Any changes made to the new variable inside the function will not apply to the original variable.
Instead of copying the value of the variable, we pass a reference to the variable itself into the function. Any changes to the new variable inside the function will apply to the original variable.
This example will illustrate the difference between the two methods.
#include <iostream>
void test_by_value (int);
void test_by_reference (int&);
int main () {
int x;
x = 5;
cout << "x = " << x << endl ;
test_by_value (x);
cout << "x = " << x << endl;
test_by_reference (x);
cout << "x = " << x << endl;
}
void test_by_value (int n ){
n++
}
void test_by_reference (int& n){
n++
}
This program contains two functions with identical bodies. The only difference is that the first uses pass-by-value and the second uses pass-by-reference.
The &
after the argument data type tells the compiler to use pass-by-reference in the function definition. The output of this program will look like:
x = 5
x = 5 // pass-by-value
x = 6 // pass-by-reference
The test by value does not change the value of x, where the test by reference increments its value. Effectively we are extending the scope of the variable by passing the reference into the function so that the function can access it.
Pass-by-reference is useful when values are swapped over in functions affecting their global scope. Another possible reason for using pass-by-reference is program efficiency. If we are making many function calls in a program, it can be inefficient to copy large amounts of data from one place to another. Using pass-by-reference can improve efficiency in such cases.
Recursion is the ability of a function to call itself. For example, the factorial function can be defined as follows:
In mathematical terms, we can say that this recursive definition consists of two parts: the anchor (n=0) and the inductive step (n>0).
The anchor condition is the point at which no recursive calls will be made. It is sometimes called the ground case. With each iteration of a recursive there should be some progress towards this anchor condition.
This is where the function calls itself, and arguments passed should make some progress towards the anchor condition.
...
int main() {
int n;
cout << "Enter number:" << endl;
cin >> n;
cout << n << "th Fibonacci number is " << Fib(n) << endl;
return 0;
}
int Fib (int n) {
if ((n == 0) || (n == 1))
return 1;
else
return (Fib(n-1) + Fib(n-2));
}
The if statement, if ((n == 0) || (n == 1))
represent the anchor condition. The two recursive calls, return (Fib(n-1) + Fib(n-2));
represents the inductive step.
Recursive implementations can appear strange at first. To answer this it is useful to examine what exactly happens when a function is called.
When a function is called, the operating system needs to store information about that function:
All this information is collectively referred to as the environment of the function.
If recursive calls are being made the compiler needs to store and access all of the separate environments for the different function calls. Compilers use a data structure which can be visualised and is known as a stack. The program stack is a list of values accessed using last-in-first-out (LIFO) principle as illustrated in the table below. Two operations are defined on the stack:
Push/Pop | Contents of stack |
---|---|
Push 6 | 6 |
Push 11 | 11 6 |
Pop | 6 |
Push 8 | 8 6 |
Push 5 | 5 8 6 |
Pop | 6 8 |
When a function call is made, the current environment is pushed onto the stack and creates a new environment. When the function terminates, the compiler pops the saved environment off the stack, replacing the function’s environment. This process happens every time a function call is made, so if multiple (recursive) calls are made we will end up with multiple environments on the program stack. For example consider the following program:
#include <iostream>
using namespace std
int f1 (int);
int f2 (int);
int main () {
int x = 3;
cout << "f1(3) = " << f1(x) << endl;
return 0;
}
int f1 (int a) {
int p = a * 3;
return f2(p + 2);
}
int f2 (int b) {
int q = b * 2;
return q - 1;
}
The state of the program stack during execution of the program shown:
Push/Pop | Contents of stack |
---|---|
Push x=3 | x=3 |
Push a=3, p=9 | a=3 p=9 x=3 |
Pop | p=9 x=3 |
Pop | x=3 |
Creating environments and pushing and popping them onto and from the stack takes time. Recursive implementations typically make a lot more function calls than equivalent iterative implementations, and for this reason they tend to be a lot less efficient. Therefore, from an efficiency perspective, it is generally better to use iteration.
However, some solutions are much more elegantly and simply expressed in recursive form, so they can often be a trade-off between efficiency and simplicity.
All recursive functions contain at least one call to themselves. However there are a number of different types of recursive functions, based on how many recursive calls there are, and how many recursive calls there are, and where in the function they occur.
The recursive call is the last statement in the function.
The recursive call is the first statement in the function.
The recursive call occurs in the middle of the function.
This refers to the case where there are more than one recursive calls in a function, for example a recursive call in a function to create the Fibonacci sequence called fib(n) ( Fib (n -1) + Fib (n -2));
.
Functions X and Y are mutually-recursive when X calls Y and Y also calls X.
Tail recursion is considered the most efficient form of recursion (almost comparable in terms of efficiency with iteration) after the recursive call the storage can be immediately deallocated.
If a function is called a very large number of times in a program in which efficiency is a prime concern, it can become a problem. In such cases the compiler should be instructed to create an inline
function.
With inline
functions, rather than treating a function call in the normal way (i.e. using the program stack), the function body code is inserted/copied into the program to replace the function call. This means more code in the program, and hence longer compilation time, but fewer overheads at run-time. The example function below illustrates the definition of an inline
function:
inline double sdt_err (double std_dev, int n)
// compute standard error of mean given sample
// standard deviation std_dev and sample size n
{
return std_dev / sqrt(n);
}