SCROLL DOWN 

Contents


C++ is used by programmers across the globe. Stemmed from C, this object orientated code is the building blocks for many complex systems. Learn the basics here. The current chapters are:

 1 • Introduction   2 • Control Structures   3 • Functions   4 • Advanced Data Types   5 • Object-Oriented Programming   6 • Inheritance   7 • Object-Oriented Design   8 • Operator Overloading   9 • Templates   10 • Memory Management 

1
Introduction

 OPEN STANDALONE 

Learning Objectives

  • Explain the difference between interpolated and compiled programming languages.
  • Describe some potential advantages of object-orientated programming over other programming paradigms.
  • Write simple C++ programs making good use of variables, constants, expressions, assignment statements I/O, basic data types, comments and pre-defined functions.

Introduction to Programming

All programming languages can also be classified as either interpreted or compiled:

Interpreted Perl Python Javascript Matlab

Interpreted languages, execute a program straight away after it has been written in real time. However with compiled languages an intermediate step, called compilation is performed.

Compiled C C++ Java

Compilation refers to the process of translating a ‘high-level’ computer program into machine code, which is the ‘low-level’ language that the computer’s central processing unit (CPU) can execute. MatLab can be either interpreted or compiled, but is mostly used as an interpreted language.

Compilation translates the high-level (C++) code, which is contained in the source files, into object files. These object files contain machine-readable instructions. However these instructions are not machine-executable unless they are linked to a library/other parts of a program enabling the machine to understand what to do with the files. This thinking can be paralleled with a lego car box set. Without instructions anything can be made with the lego bricks. But only with the specific parts/library of instructions can the lego bricks be made into specific model of car as shown on the box.

Paradigms

Another major difference between programming languages lies in their programming paradigms:

Procedural Matlab C Fortran

Historically the dominant paradigm. Procedural programs work by executing a sequence of instructions provided by the programmer.

Declarative ML Prolog

This is a lesser known paradigm. With declarative programs, statements are declared about/how a problem can solved, regardless of their order.

Object-Oriented Java C++ Smalltalk

In the late 1980s there was a shift in the dominant paradigm from procedural programming to object-oriented programming. Object orientated programs allow the programmer to break down problems into objects. Where objects are self-contained entities consisting of both data and operations on the data.

C++ extends the existing procedural programming language, C. Therefore, C++ is not a pure OOP language, but rather a mixture of procedural and OOP.

In OOP languages, instead of writing sequences of instructions, the programmer defines objects with attributes and behaviours. The objects communicate with each other, sending data, and requesting certain behaviours to be carried out. All OOP languages have the following three features:

1. Encapsulation

Also known as information hiding, it separates the logical properties of a data structure (what it does) from implementation details (how it does it).

It is a common technique that the human mind uses to tackle complex problems: break the problem down into smaller, simpler, well-defined sub-problems and solve each individually. By specifying the required inputs and outputs of a sub-problem we are abstracting away from the main problem.

2. Inheritance

This is a way of reusing code from existing objects. One object can inherit all of the attributes from existing objects, then more can be added.

Inheritance can lead to more reliable and easily understandable software

3. Polymorphism

Polymorphism describes taking on many different forms. In OOP, it refers to objects having type dependent attributes/behaviours.

Dynamic binding is a related concept. It means waiting until run-time before deciding which type-dependent operation to use on an object.

Object-Oriented design

Many OO features are aimed at allowing greater code reuse. This is beneficial for the following reasons:

  • Faster code development
  • Easy to understand programs
  • Easy to debug programs
  • Easy to modify in future

OO design involves forming a model of the problem domain. The problem domain consists of objects with relationships between them. There are three types of relationships:

  • Has-a e.g. The heart has a blood flow
  • Is-a e.g. The heart is an organ
  • Uses-a e.g. The heart uses blood vessels

Most Integrated Development Environments (see next section) come with a useful debugger to help find errors in the code, and they help with the following:

  • Breakpoints
  • Step through code
  • Watch variables

C++ is uses static type checking, whereas matlab uses a dynamic form of type checking.

Dynamic type checking matlab

The validity of operations are based on data types of operands and is checked at run-time.

Static type checking C++

The validity of operations are checked at compile time.

Getting Started

There are several different integrated development environments (IDE) available to code, build and run C++ programs. These software packages define projects consisting of source code, specify compilation options and run the compiled code. Users running Windows/Linux should download Code::Blocks and Mac users should install CodeLite:

Note: Remember to check that you have a compiler installed!

The default program that loads in most IDE’s when you start a new project is the hello world program. This is the base template for all C++ code and syntax:

main.cpp
// hello world program
#include <iostream>
using namespace std;
int main()
{
  cout << "Hello world" << endl;
  return 0;
}

This program can be broken down into a few key components:

  • main () {} is all code that is to be executed is contained in the function called main, which is delimited by {…}. NOTE: Function arguments : go inside the brackets ()
  • int refers to the ‘return type’ of the main function (ie. the data type of the output)
  • ; unlike in Matlab where the semicolon only suppressed the line, C++ uses the semi-colon to identify the end of the line. NOTE: All commands must end in semi-colons
  • cout sends characters to standard output which is in most cases the screen
  • endl is an end-line character and send the cursor to the next line (not dissimilar to \n)
  • return 0 this line is a convention that indicates that the program has terminated successfully
  • // indicates a comment (the C++ equivalent of % in Matlab). It can be used over multiple lines using /* …..*/
  • #include causes another source file to be included. The functions defined by other source file can be used without compilation errors. In this case the iostream file includes details of the cout function
  • using namespace '---' tells the compiler the particular namespace, in this case std (standard) is used which includes cout, endl

Preprocessing step is where the #include files are added. It occurs before compilation and includes all the required source files.

Expressions and Variables

The code below demonstrates a more complex program which includes variables and expressions:

// arithmetic program
#include <iostream>
#include <math.h>
using namespace std;
int main() {
  int x, sq_x;
  cout << "Enter a number:" << endl;
  cin >> x;
  sq_x = x*x;
  cout << "sq. root=" << sqrt(x) << endl;
  cout << "square = " << sq_x << endl;
  cout << "cube = " << sq_x*x << endl;
  return 0;
}
  • #include math.h defines some mathematical functions
  • int x, sq_x; defines the return type of the data as integer and the variables as x and sq_x. Unlike Matlab all variables must be declared and their type specified
  • cin statement reads from standard input (the keyboard)

In the arithmetic program above various operators are used, and they are summarised in the following table, with their precedence listed in the table below:

Arithmetic operators  
Basic + - * /
Modulus (remainder) %
Increment/decrement ++x x++ --x x--

Note: the difference between the prefix and postfix versions of increment/decrement operation lies in whether the value is returned before (posfix) or after (prefix).

Operator precedence  
First increment, decrement
Second multiplication, division, modulus
Third addition, subtraction

C++ Data types

In c++ we must explicitly declare the type of all variables. The basic types are:

  • (unsigned) int, short, long
  • char
  • bool
  • double, float, long double

Be careful with using operations with different data types, you can’t use modulo on bool or char. In reality we cant add bool and char, however if you attempt to in C++ a bool will be converted into 1 or 0 and char into their respective ASCII values.

Note: Integer arithmetic can be different to floating point arithmetic. i.e. 1/3 evaluates to 0 because two integer inputs will force an integer output. To fix this use a float input: 1.0/3 equates to 0.33

Constants

const declares an identifier that it wont change its value:

const double pi = 3.1415926

Pre-defined functions

The #include statement can be used to include external files that make use of pre-defined functions and a few examples re given in the table below:

Header Function Arg Operation
stdlib.h abs(i) int Absolute (int)
math.h cos(x) float Cosine(x) (radians)
math.h fans(x) float Absolute (float)
math.h pow(x,y) float xy
math.h sin(x) float Sine(x) (radians)
math.h sqrt(x) float Square root of x
math.h tan(x) float Tangent(x)



Written by Toby Morris & Tobias Whetton

2
Control Structures

 OPEN STANDALONE 

Learning objectives

  • Write C++ programs making good use of conditional statements
  • Write C++ programs making good use of iteration statements

Types of Control Structure

Programs are said to have a flow of control, the order in which the program statements are executed. Normally in procedural programs, the flow of control is sequential, i.e the first statement is executed then the second and so on. Control structures are statements that allow the programmer to alter the normal flow of control. There are three basic types of control structure:

  1. Conditional
  2. Iterative
  3. Subprograms

Conditional Statements

These allow conditional execution. A variable can be tested with a particular condition. If this condition is met then the flow of control can be changed to an alternate set of statements.

if statements

In the Matlab module if statements were introduced and they have a similar syntax in C++. Below (temperature > 37.5) represents the conditional expression, which evaluates to either true or false. It is grouped together with statements in curly brackets to form a compound statement.

if (temperature >37.5) {
    hyperthermic = true;
    cout << "Warning - patient in danger!" << endl;
}
else {
    hyperthermic = false;
    cout << "Patient normal" << endl;
}

There is also an optional second part to the C++ if statement. If the conditions aren’t satisfied in the if statement an else statement can be used to carry out a different function.

Conditional Expressions

The C++ if-else statement requires conditional expressions to be specified. Any of the following comparison operators can be used in C++ conditional expressions:

Comparison Operators Logical Operators
== equal to && and
!= not equal to || or
> greater than ! not
< less than  
>= greater than or equal to  
<= less than or equal to  

Note the different inequality operator in C++ compared to Matlab. Matlab uses ~= for inequality whereas C++ uses !=.

All of the logical operators mentioned above are subject to rules concerning operator precedence when evaluating complex expressions in the same way that arithmetic operators are. The following table shows the order of operator precedence for both arithmetic and logical operators:

Precedence Operators
1st ++ -- !
2nd * / %
3rd + -
4th < <= > >=
5th == !=
6th &&
7th ||
Nesting Control Structures

It is possible to nest control structures in C++, i.e. include one control structure statement inside another. e.g.

if (temperature > 37.5) {
    hyperthermic = true;
    hypothermic = false;
}
else {
    if (temperature < 35)
        hypothermic = true;
    else
        hypothermic = false;
    hyperthermic = false;
}

Note how in the example above {} are omitted. Curly brackets are only optional when a single statement will be executed after the conditional expression.

Make sure the nested statements are indented correctly. Properly indenting nested statements significantly improves readability.

switch statements

A switch statement offers an easy way of coding situations where the same variable needs to be checked against a number of different values.

switch (expression) {
  case constant-1:
    statement-a;
    statement-b;
    break;
  case constant-2;
    statement-c;
    statement-d;
    break;
  case constant-3;
    statement-e;
    statement-f;
    break;
}

The switch statement is used only for equality tests - it cannot be used for other comparisons (e.g. >, <, etc). The effect of the break statement is to transfer control of the statement immediately following the switch statement. The default label is an optional case which is executed if no case label is matched.

switch statement example
char c;
  cout << "Enter letter:\n";
  cin >> c;
  switch (c)
  {
  ​  case ‘A’:
  ​  case ‘a’:
  ​    cout << "Letter A\n";
  ​    break;
  ​  case ‘B’:
  ​  case ‘b’:
  ​    cout << "Letter B\n";
  ​    break;
...
  ​  default:
  ​    cout << "Not a letter\n";
};

Note that in the above example, the break statement is omitted from certain case statements, in order for the same piece of code to be executed for more than one label i.e. if both case ‘A’ and ‘a’ are matched. Also notice the use of \n which is the newline character and has the same effect as endl.

Iteration Statements

The second type of control structure is the iterative statement. Iterative statements are useful to execute the same piece of code a number of times.

for statements

The for statement will repeat a section of code for a specified number of iterations. They have basic form:

for (init-stmnt; condition; update-stmnt)
    body-statement;

Usually, init-stmt is used to initialise a control variable, which changes its value in each iteration of the for statement. The iteration will continue until the condition becomes false. update-stmnt is generally used to update the value of the control variable. The following example shows the for statement in action:

Square of numbers 1 - 10
int i;
for (i = 1;  i <= 10;  i++)
    cout << i * i <<endl;

while Statement

while loops will repeat a section of code until a condition is satisfied. They should be used when the number of iterations is unknown. They have basic form:

while (condition)
      statement;

The statement is executed so long as the expression evaluates to true. The following example uses a while statement to read in a sequence of numbers from the user and print out the square of each one:

Squares of numbers
int i;
  cout << "Enter numbers, Ctrl-D to terminate" << endl;
  while (cin >> i)
    cout << "Square of " << i << " is " << i * i << endl;

do Statement

The do statement is similar to a while loop, as it should be used when the number of iterations is unknown. With the do loop the statement is guaranteed to be executed at least once as the expression is only checked after the first iteration. The general form of a do loop is:

do
  statement;
while (condition);

The following code uses a do loop to print out a sequence of numbers, each half the value of the previous, until the number becomes less than 0.01.

Do loop example
float i = 1.0;
do {
  cout << i << endl;
  i /= 2;
} while (i > 0.01);

Note the use /= operator. The statement i /= 2 is just short for i = i / 2. This short form can be used for any of the 4 main arithmetic operators i.e. *=, +=, -=, /=.

Jump Statements

Another way of altering the flow of control of a program is to use a jump statement. The effect of a jump statement is to unconditionally transfer control to another part of a program. C++ provides three different jump statements: break, continue and goto.

Break statement

A break statement terminates the directly enclosing while, do, for or switch block. Execution resumes immediately following the terminated block.

Continue statement

The continue terminates the current iteration of the directly enclosing while, do or for loop. Execution resumes with evaluation of the loop control evaluation.

Goto statement avoid using!

The goto statement transfers control unconditionally to any location marked with a label. goto mylabel will transfer control to this statement, mylabel: "some statement". It is not encouraged and should be avoided because it reduced readability and makes the code far more confusing.



Written by Toby Morris & Tobias Whetton

3
Functions

 OPEN STANDALONE 

Learning Objectives

  • Explain why the use of functions is desirable and write C++ programs that make use of them.
  • Explain the different types of variable scope in C++ and declare and use them appropriately.
  • Explain the difference between passing arguments by value and reference and use this in C++ programs
  • Implement recursive and iterative solutions to problems and compare and contrast them with regards to efficiency and simplicity

Code reuse

Previously we described how conditional and iterative control structures are implemented in C++, and in this chapter we list the third type of control structure, subprograms.

Often, to perform the same of a similar sequence of operations in different parts of our program rather than enter the same statements multiple times, it is better to reuse a single piece of code. This is considered good practice because it can make code:

  • less work to write
  • less work to modify
  • less prone to error
  • more readable, helping others to make changes

Functions

A function is a sequence of statements defined by the programmer that is identified by name. The components of a function are:

  • Arguments or values that are passed to the function from the program (the types of each must be specified at compile time).
  • Return type or the value of a specified type which is returned.
  • Body which is where the programmer defines the sequence of statements to be executed.
Basic function syntax
Return_type function_name (argument_list)
{
  statements
}

To illustrate the use of a function, consider the following example program which calculates the standard error of a data sample ( $ \frac{standard \ deviation}{\sqrt{sample \ size}} $ ) using a function named std_err:

Standard Error Program
#include <iostream>
#include <math.h>
using namespace std;

// function prototype
double std_err (double std_dev, int n);

int main() {
  const double std_dev = 3.0;
  for (int i = 1; i < 30; i++)
    cout << "Std dev = " << std_dev 
         << ", Sample Size = " << i
         << ", Std err = "
         << std_err(std_dev,i)
         << endl;
  return 0;  
}

// function signature
double std_err (double std_dev, int n)
{
  return std_dev / sqrt(n);
}

The fourth line of the program contains double std_err (double std_dev, int n); which is the function prototype. This tells the compiler that the std_err function exists and it will be fully defined later. Although it is possible to include the entire function definition before the main function definition or even in header files.

The last four lines of the code above represent the function signature and it contains the following information:

  • Return type - double
  • Function name - std_err
  • Argument list - double std_dev, int n
  • Function body - return std_dev / sqrt(n);

The function call denotes where the function is called, and in this case it is << std_err(std_dev,size) << endl;. The function call will return a value of the return type.

Note that in C++ functions only return a single value but there are ways around this.

Developing subjects with multiple source files

With larger programs, code readability can be improved if related code/functions are separated into separate source files which can also promote code reuse. It is typical to store:

  • Function prototypes in a “.h” header file
  • Function bodies in a “.cpp” source file

#include can be used to make sure that the appropriate code is available.

Default values for arguments

Default values can also be specified for function arguments, making it possible to omit the argument when the function is called - the missing argument will take on the specified value, for example consider this alternate definition of std_err:

double std_err (double std_dev, int n=10)
{
  return std_dev / sqrt(n);
}

Here the =10 after the argument n means that if omitted the argument n will take on the value 10. This function can still be called using both arguments, for example:

float se1 = std_err (sd, 5); // n takes a value of 5
float se2 = std_err (sd); // n takes default value of 10

Function arguments

When calling a function, its argument types must match the values passed. Consider a function:

Pretty_print(int x);
int input
int a = 33;
pretty_print(a);

With an integer input, the types match and function completes without errors.

char input
char c = 'x';
pretty_print(c);

char can be converted to int using ASCII form, and the function will still work.

string input
string str = "hello";
pretty_print(str);

string cannot be cast into int - this will lead to a compilation error.

Function overloading

Function overloading is where multiple functions with the same name but different argument types can be built. Consider the following example:

int main() {
  int a = 33;
  pretty_print(a);
  char c = 'x';
  pretty_print(c);
  string str = "hello";
  pretty_print(str);
  return 0;
}
void pretty_print (int x) {
  cout << "Integer: " << x << endl;
}
void pretty_print (string x) {
  cout << "String: " << x << endl;
}

The first two pretty_print()s call the int version of the function, whereas the third calls the string version. the varying overloaded functions can even have different numbers of arguments.

As C++ performs static type checking the compiler will decide which of the overloaded functions to use for each function call.

Variable scope

Many variables can only be used in a limited part of the overall program. This is known as the scope of the variable. there are 3 types of scope in C++:

1. Global scope

A variable declared outside of all functions, usually at the beginning of the program will have global scope (ie. between the using namespace --- and int main() ). Variables with global scope are accessible from anywhere in the program, after the declaration.

2. Function scope

A variable which is a function argument or declared inside a function body will have Function scope. Function scope variables are only accessible inside the function, after the declaration.

3. Block scope

A variable declared inside a code block (within the {…….}) with have block scope. Block scope variables are only accessible inside the block, after their declaration.

Note that, if at any point there are two variables with the same identifier in scope, any use of that identifier will access the ‘closest’ one in scope, i.e. block, then function, then global.

Passing variables into functions

Pass-by-value

So far, when arguments have been passed to functions they have been passed by value. This is where the value of the variable is copied into a new variable inside the function. Any changes made to the new variable inside the function will not apply to the original variable.

Pass-by-reference &

Instead of copying the value of the variable, we pass a reference to the variable itself into the function. Any changes to the new variable inside the function will apply to the original variable.

What’s the difference

This example will illustrate the difference between the two methods.

#include <iostream>

void test_by_value (int);
void test_by_reference (int&);

int main () {
  int x;
  x = 5;
  cout << "x = " << x << endl ;
  test_by_value (x);
  cout << "x = " << x << endl;

  test_by_reference (x);
  cout << "x = " << x << endl;
}

void test_by_value (int n ){
  n++
}

void test_by_reference (int& n){
  n++
}

This program contains two functions with identical bodies. The only difference is that the first uses pass-by-value and the second uses pass-by-reference.

The & after the argument data type tells the compiler to use pass-by-reference in the function definition. The output of this program will look like:

x = 5
x = 5 // pass-by-value
x = 6 // pass-by-reference

The test by value does not change the value of x, where the test by reference increments its value. Effectively we are extending the scope of the variable by passing the reference into the function so that the function can access it.

Pass-by-reference is useful when values are swapped over in functions affecting their global scope. Another possible reason for using pass-by-reference is program efficiency. If we are making many function calls in a program, it can be inefficient to copy large amounts of data from one place to another. Using pass-by-reference can improve efficiency in such cases.

Recursion

Recursion is the ability of a function to call itself. For example, the factorial function can be defined as follows:

In mathematical terms, we can say that this recursive definition consists of two parts: the anchor (n=0) and the inductive step (n>0).

Anchor condition

The anchor condition is the point at which no recursive calls will be made. It is sometimes called the ground case. With each iteration of a recursive there should be some progress towards this anchor condition.

Inductive step

This is where the function calls itself, and arguments passed should make some progress towards the anchor condition.

Recursion example
...
int main() {
  int n;
  cout << "Enter number:" << endl;
  cin >> n;
  cout << n << "th Fibonacci number is " << Fib(n) << endl;
  return 0;
}

int Fib (int n) {
  if ((n == 0) || (n == 1))
    return 1;
  else
    return (Fib(n-1) + Fib(n-2));
}

The if statement, if ((n == 0) || (n == 1)) represent the anchor condition. The two recursive calls, return (Fib(n-1) + Fib(n-2)); represents the inductive step.

Recursive implementations can appear strange at first. To answer this it is useful to examine what exactly happens when a function is called.

Environment of a function

When a function is called, the operating system needs to store information about that function:

  • Values of arguments
  • Local variables (within function scope)
  • Memory slot allocated to hold the returned value
  • What originally called the function so that execution can continue from where it left off before the function.

All this information is collectively referred to as the environment of the function.

Program Stack

If recursive calls are being made the compiler needs to store and access all of the separate environments for the different function calls. Compilers use a data structure which can be visualised and is known as a stack. The program stack is a list of values accessed using last-in-first-out (LIFO) principle as illustrated in the table below. Two operations are defined on the stack:

  • Push refers to a an environment being added to the stack
  • Pop refers to an environment being removed from the stack
Push/Pop Contents of stack
Push 6 6
Push 11 11 6
Pop 6
Push 8 8 6
Push 5 5 8 6
Pop 6 8

When a function call is made, the current environment is pushed onto the stack and creates a new environment. When the function terminates, the compiler pops the saved environment off the stack, replacing the function’s environment. This process happens every time a function call is made, so if multiple (recursive) calls are made we will end up with multiple environments on the program stack. For example consider the following program:

#include <iostream>
using namespace std

int f1 (int);
int f2 (int);

int main () {
  int x = 3;
  cout << "f1(3) = " << f1(x) << endl;
  return 0;
}

int f1 (int a) {
  int p = a * 3;
  return f2(p + 2);
}
int f2 (int b) {
  int q = b * 2;
  return q - 1;
}

The state of the program stack during execution of the program shown:

Push/Pop Contents of stack
Push x=3 x=3
Push a=3, p=9 a=3 p=9 x=3
Pop p=9 x=3
Pop x=3

Creating environments and pushing and popping them onto and from the stack takes time. Recursive implementations typically make a lot more function calls than equivalent iterative implementations, and for this reason they tend to be a lot less efficient. Therefore, from an efficiency perspective, it is generally better to use iteration.

However, some solutions are much more elegantly and simply expressed in recursive form, so they can often be a trade-off between efficiency and simplicity.

Types of Recursion

All recursive functions contain at least one call to themselves. However there are a number of different types of recursive functions, based on how many recursive calls there are, and how many recursive calls there are, and where in the function they occur.

Tail recursion

The recursive call is the last statement in the function.

Head recursion

The recursive call is the first statement in the function.

Middle recursion

The recursive call occurs in the middle of the function.

Multi recursion

This refers to the case where there are more than one recursive calls in a function, for example a recursive call in a function to create the Fibonacci sequence called fib(n) ( Fib (n -1) + Fib (n -2));.

Mutual recursion indirect recursion

Functions X and Y are mutually-recursive when X calls Y and Y also calls X.

Tail recursion is considered the most efficient form of recursion (almost comparable in terms of efficiency with iteration) after the recursive call the storage can be immediately deallocated.

Efficiency

If a function is called a very large number of times in a program in which efficiency is a prime concern, it can become a problem. In such cases the compiler should be instructed to create an inline function.

With inline functions, rather than treating a function call in the normal way (i.e. using the program stack), the function body code is inserted/copied into the program to replace the function call. This means more code in the program, and hence longer compilation time, but fewer overheads at run-time. The example function below illustrates the definition of an inline function:

inline double sdt_err (double std_dev, int n)
// compute standard error of mean given sample
// standard deviation std_dev and sample size n
{
  return std_dev / sqrt(n);
}



Written by Toby Morris & Tobias Whetton

4
Advanced Data Types

 OPEN STANDALONE 

Learning Objectives

  • Describe the use of pointers.
  • Describe how one- and multi-dimensional arrays are defined and implemented.
  • Define and manipulate string variables.
  • Define and use new data types using struct and enum.
  • Perform file input/output operations.

Type checking

There are two main differences between C++ and Matlab with regard to data types, and these are summed up in the table below:

Language Type Definition Type Checking
Matlab implicit
works out the type of variable from the value it is assigned
dynamic
type consistency is checked at run time
C++ explicit
stated by the programmer in the variable declaration
static
type consistency is checked at compile time

Pointers *

If we think of simple data types as containers that hold a value of the specified type. Then pointers are the addresses to the locations of the containers. Just like ordinary data types, pointer variables still have an associated data type, for example, a variable may have the type ‘pointer to an int’ , or ‘pointer to a char’. They are defined using the * symbol, before the variable name in the variable declaration (however do not confuse this with dereferencing!). For example:

int *p1, *p2;  // pointers to 'int' values
char *cp;      // Pointer to a 'char' value

Declaring a pointer variable does not mean that it points to a valid value in memory - it must be initialised to point to something. For example:

int val = 1000;
p1 = &val;

The & symbol means the address of, in this case p1 points to an area of memory which holds the int value 1000. Consider another example below:

char *c = new char;
*c = 'x';

This code creates a variable called c that points to a char type. The new keyword can be used to allocate memory without first defining a variable as in the previous example, where int val = 1000;. Similarly we can delete the space pointed to:

delete c;

This statement frees up memory allocated by the new statement, enabling the compiler to make use of it for something else. This means that c no longer points to a valid value and should not be accessed. It is always good practice to delete unused pointers.

It is possible to have multiple pointers printing to the same memory space, for example:

int *p1 , *p2 ;
int val = 1000;
p1 = &val ;
p2 = &val ;

When more than one pointer points to the same area of memory, if the value of the variable is changed via one pointer, it is changed via the other pointer as well. For example:

*p1 = 500;
cout << "p1 -> " << *p1 << ", p2 -> " << *p2 << endl;

The output produced this code would be p1 -> 500, p2 --> 500. The * before a variable name dereferences the pointer (i.e. it returns the value pointed to), and this is effectively the opposite of &.

Arrays

A 1-D array is a sequence of values of the same type (e.g. an array of 10 integers). The values are commonly referred to as elements. Higher-dimensional arrays are also possible such as 2-D arrays where every element is itself an array.

One-Dimensional Arrays

In C++ array variables must be explicitly declared and the array size be specified and fixed at compile-time (i.e in the array variable declaration). Consider the following example:

int x[10]; // defines array of 10 integers
x[9] = 3; // assign to last one

The array size is specified in the square brackets [ ] after the variable name. Elements of an array are undefined until they are initialised. Square brackets are also used to access array elements, and note that array indices start at 0.

To assign an entire array in one statement when the array variable is declared, use curly brackets, e.g.

int x[10] = {1,2,3,4,5,6,7,8,9,10};

Note that in C++ commas are required as a delimiter, spaces cannot be used to separate the element values.

When assigning an array it is possible to omit the array size and let the compiler work out the size itself, editing the previous example:

int x[] = {1,2,3,4,5,6,7,8,9,10};

Once an array has been declared only individual elements can be assigned and accessed, not the array as a whole. Also, it is impossible to display an entire array in a single statement, only individual array elements can be displayed using the count statement. For example, to display the array variable x declared in the previous example the following code is required:

for (int xind = 0; xind < 10; xind++)
    cout << x[xind] << " ";
cout << endl;

Higher-Dimensional Arrays

Multi-dimensional arrays can be defined in C++, for example declaring a 2D array:

char a[10][10]; // 10 by 10 array of chars
int b[2][2] = {{2, 3}, {1, 4}}; // 2x2 array of ints

The size of the second dimension is specified in a second set of square brackets after the first. If the elements are initialised then the size of the array has to be considered, in this case two sets of curly brackets are used.

Higher dimensional arrays are accessed in the same way as 1-D arrays, the second array index is just added inside the square brackets after the first, for example:

int d = b[0][0] * b[1][1] - b[0][1] * b[1][0];

Arrays and Pointers

Arrays are implemented by the C++ compiler using pointers. For instance, an array of integers is implemented as a pointer to an integer, where the value pointed to is the first element of the array and the other elements are in continuous memory spaces following the first element.

Consider the following example:

int p1[3] = {1000, 500, 750}

The array variable p1 (which is in fact a pointer to an int) points to the area of memory containing the first element (100). The second and third elements are contained in the areas of memory immediately following this first element:

1000 p1
500
750

So any array element can be accessed by dereferencing the pointer and moving forward a certain number of blocks of memory. In fact, the square bracket notation is simply a short-hand way of doing this. The same concept is applied for 2-D arrays, for example:

int p2[2][3] = {{1000, 500, 750}, {100, 200, 300}};

Therefore in this 2-D case the graphical illustration of array implementation:

1000 p2
500
750
100
200
300

This explains why, the size of higher-dimensional arrays need to be known at compile-time. From looking at the above graphical illustration for a 2-D array alone we would know if p2 was a 2 x 3 or a 3 x 2 array (or indeed a 1-D array of 6 elements). Therefore, this information needs to be specified in the array declaration, and will be remembered by the compiler so it can access the array elements correctly.

Passing Arrays as Function Arguments

Arrays can be passed as arguments to functions, just like any other variable. However, all array arguments to functions are treated as pass-by-reference, as arrays are essentially pointers. The syntax for passing array arguments to functions is:

void func (int x[]) {...}

For 1-D arrays, the array size doesn’t need to be specified when defining the function header. But for 2-D arrays at least the second dimension needs to be specified, although both can be specified:

void function (int x[2][3]){...}

The following example illustrates the passing of an array variable to a function. The code displays a frequency table of true positive (TP), false positive (FP), true negative (TN), false negative (FN) values.

#include <iostream>
#include <iomanip>
#include "freq_table."
using namespace std;

void dispFreqTable(int freq[2][3])
{
  int rsum[2] = {0,0}, csum[2] = {0,0}, tot = 0;
  cout << "         | GT +ve | GT -ve | Total" 
       << endl;
  cout << "---------|--------|--------|------"
       << endl;
  for (int r = 0; r < 2; r++) {
      cout << "Test +ve |";
      for (int c = 0; c < 2; c++) {
          cout << setw(8) << freq[r][c] << "|";
          rsum[r] += freq[r][c];
          csum[c] += freq[r][c];
          tot += freq[r][c];
      }
      cout << setw(6) << rsum[r] << endl;
  }
  cout << "---------|--------|--------|------"
       << endl;
  cout << "Total    |" << setw(8) << setw(8)
       << csum[0] << "|" << setw(8) << csum[1]
       << "|" << setw(6) << tot << ends;
}

In the above program the dispFreqTable function takes a 2 x 2 array of integers as an argument. The first 2 in the argument type is optional as it is only required to specify the second array dimension in the argument type. The setw function sets the width of the output of the next item in a cout. To use setw the source file #include <iomanip> is called.

The string Library

An array of characters is known as a string. The most common way of using them to #include the standard <string> library. Consider the following program:

#include <iostream>
#include <string>
using namespace std;

int main()
{
  string greeting = "hello", name;
  cout << "What's your name? ";
  cin >> name; 
  string a = greeting + " " + name;
  cout << a << endl;
  int n = name.length();
  cout << "Your name has " << n << " letters"
       << ends;
  return 0;
}

The string library defines new versions of built in C++ operators with its own version which can take strings as arguments i.e. overloading. Here a few definitions from the string library:

  • = assignment
  • + string concatenation
  • cin input
  • cout output
  • == != > < >= <= string comparison operators (performed by character)
  • getline gets a line of text from standard input

The above program also illustrates the use of a special function that is associated with a string variable: name.length(). Here, name is a string variable and the function call length() is appended to it, separated by a full stop. This function call returns the number of characters in name. These special functions are known as member functions. Other member functions can be called in the same way, using string variables:

  • find finds an instance of a substring within a string
  • replace replaces a substring within a string by another string

The following program demonstrates the use of the find and replace member functions with string variables:

#include <iostream>
#include <string>
using namespace std;

int main() 
{
  string str ("Brazil are the best team in the world.");
  
  // can search for a constant string
  size_t found = str.find("the");
  if (found != string::npos)
     cout << "'the' found at: " << found << endl;
 
  found = str.find("the", found + 1);
  if (found != string::npos)
     cout << "second 'the' found at: " 
          << found << endl;

    // can search for another string variable
  string str2 = "England";
  found = str.find(str2);
  if (found != string::npos)
     cout << "'England' found at: " << found << endl;
	
  // can search for another string variable
  string str2 = "Brazil";
  found = str.find(str2);
  if (found != string::npos)
     cout << "'Brazil' found at: " << found << endl;
  
  // replace a substring with another string 
  str.replace(str.find(str2), str2.length(), "England");
  cout << str << endl;
  
  return 0;  
}

The find function takes a single argument, which can be another string or a string constant. It returns a number indicating the array index where the substring starts. The type of the returned value is size_t which is just an unsigned integer, however it has the special property of guaranteeing to be big enough to refer to the largest amount of memory the machine has.

In the case of the substring not being found i.e. ‘England’ in the above program. Then find returns a special value from the string library called string::npos. This means the constant npos from the string library. The :: symbol is called the scope operator.

The replace function takes 3 arguments, the start and end indices of the substring to be replaced, and the string to replace it with. The output of the program when run would be:

'the' found at: 11
second 'the' found at: 37
'Brazil' found at: 0
England are the best team in the world.

The struct statement

Structures are another way of defining new data types. Whereas array types are used for storing a collection of values of the same type, structures are used for storing a collection of values of different types. Each component of a structure is called a member.

Consider the following example for defining a data type to store information about patients:

struct PatientData {
  string firstName;
  string lastName;
  unsigned int age;
  double bloodPressure;
};

This defines a new data type, called PatientData. Variables of type PatientData contain four values: two string, an unsigned int and a double. Variables can be declared of type PatientData just as we can for built in C++ types, as the following code illustrates:

PatientData p1;
cout << "Enter patient's name (first last):";
cin >> p1.firstName >> p1.lastName;
cout << "Enter " << p1. firstName << " " 
     << p1.lastName << "'s age:";
cin >> p1.age;
cout << "Enter " << p1.firstName << " "
     << p1.lastName << "'s blood pressure:";
cin >> p1.bloodPressure;

struct provides a great mechanism for creating new types that group data together.

The enum statement

Enumeration types are another way of creating new data types. A variable that is declared as an enumeration type can take any one of a pre-defined number of symbolic values.

For example, the following code creates a new data type to store information about chess pieces:

enum ChessPiece {Pawn, Rook, Knight, Bishop, King,
                 Queen, Empty};
enum Colour {White, Black, None};
struct Square {
  ChessPiece piece;
  Colour colour;
};

Here we have defined 3 new data types:

  • ChessPiece variables can take any one of these symbolic values: Pawn Rook Knight Bishop King Queen Empty
  • Colour variables can take any of these symbolic values: White Black None
  • Square variables contain two values: a Chesspiece and a Colour

Based on these newly defined data types, we can then go on to declare a chess board, and to start to fill it up with pieces:

Square b[8][8];
b[0][0].piece = Rook;
b[0][0].colour = White;

The variable b represents the chess board, and is a 2-D (8 x 8) array of Sqaure. We have initialised the square index by [0][0] to be a white rook. The use of enum types is good practice if the data is symbolic i.e. categorised data, non-numeric with no ordering.

File Input/Output

Reading and writing data from and to external files occurs much in the same way with advanced data types as standard ones. Consider the following example program:

#include <iostream>
#include <fstream> 
using namespace std;

int main()
{  
  ifstream inFile;
  inFile.open("data.txt");
  if (!inFile) {
    cerr << "Error opening file: data.txt" << endl;
    return 1;
  } 
  int ages[5];
  for (int i = 0; i < 5; i++)
    inFile >> ages[i];
  inFile.close();
  return 0;
}

In the above program, #include <fstream> allows the program to use any file input/output operations. Variables can then by declared of type ifstream (for input files) or ofstream (for output files).

All files (input or output) must be opened before use with the open() function in this case. If open() returns a false value the file is not successfully opened (i.e doesn’t not exist or is locked).

The ifstream variable, inFile, can be used like cin to input data. cerr is an alternative output statement that sends data to standard error rather that standard output. It is good practice to separate normal program output from error messages in this way. All the files (input and output) must be closed after use.

Although not illustrated in the above example, the same principles apply to the ofstream variables just like cout. In addition there are a number of other file input/output functions that we can make use of:

  • get gets a single character from the input file
  • put puts a single character into the output file
  • getline gets an entire line from the input file, i.e. it will read all the data until the next newline character (i.e any space, tab or newline)

The following example shows the use of getline

#include <iostream>
#inlcude <fstream>
using namespace std;

int main()
{
  ifstream namesFile;
  namesFile.open("names.txt");
  if (!namesFile) {
    err << "Error opening file: names.txt"
        << endl;
    return 1; 
  }
  string names[10];
  for (int i = 0; i < 10; i++)
    getline (namesFile, names[i]);
  namesFile.close();



Written by Tobias Whetton

5
Object-Oriented Programming

 OPEN STANDALONE 

Learning Objectives

  • Understand what is meant by the terms, encapsulation and information hiding.
  • Define classes in C++.
  • Describe the role of constructors and destructors in classes, and use them.
  • Differentiate between inspectors, mutators and facilitators.
  • Explain the difference between classes and instances.
  • Describe the static and const keywords when applied to data members and member functions.
  • Make appropriate use of class composition.
  • Write class diagrams in UML (Unified Modelling Language).

Object-Oriented Design

The last few chapters were focused on the procedural side of C++, the rest of the chapters will now focus on the OOP side. To recap in OOP languages, instead of writing sequences of instructions, the programmer defines objects with attributes and behaviours. The objects communicate with each other, sending data, and requesting certain behaviours to be carried out. Recall that all OOP languages have the following three features:

  1. Encapsulation
  2. Inheritance
  3. Polymorphism

In this chapter the concept of encapsulation will primarily be explored.

Encapsulation

Encapsulation is the grouping together of data (i.e. attributes) and code in order to operate on the data (i.e. methods/behaviours). However it is better defined as a way of restricting access to attributes/methods. The concept of this is often referred to as information hiding.

Encapsulation/information hiding allows the programmer to separate the logical properties (i.e. what something does) form its implementation details (i.e how it does it). Encapsulation allows us to break down complex problems into simpler sub-problems. By viewing a complex problem in terms of its logical properties we can specify its behaviour purely in terms of its inputs and outputs (i.e the public interface), and temporarily ignore the details of the algorithms used to achieve its desired behaviour.

Consider a surgeon panning across a medical image. The surgeon doesn’t need to be able to code to move the image, they just need to see the result. This is done by viewing a complex problem in terms of its logical properties, by specifying its behaviour only in terms of its inputs and outputs.

In OOP this encapsulation is implemented by using the class.

Classes

A class is a structure containing both:

  • data members (variables)
  • member functions (operations/behaviour)

Here is the important difference between OOP and procedural languages. In traditional procedural languages there is a clear division between data and the operations which process the data. In OOP this division is not so clear. Rather than having separate data and functions, instead we have objects, which consists of data together with operations on that data. In C++, we calls these objects classes.

Classes can be thought of as a special kind of data type, as when a variable from these data types, an object or an instance of the class is formed. Consider the following example program (contained in three source files), which stores information about, and solves quadratic equations.

The first source file, quadratic.h contains the class definition:

quadratic.h
class quadratic {
// data members
private :
  float _a, _b, _c;
// member functions
public :
  void set(float a, float b, float c);
  int solve(float& r1, float& r2);
};

The class keyword is followed by the class name quadratic. The list of data members and member functions of the class then follows, enclosed within curly brackets { }. The public and private keywords denote whether the following data members and member functions are visible from outside the class. This is the way C++ implements encapsulation, by making a data member private, it cannot be accessed from outside the class.

Also notice how all the private data member names start with an underscore, _a. This is a good convention as it shows someone reading the code what is a private data member and what is not.

Function bodies for the member functions can be specified in the class definition, but more commonly (to make things easier to read), they are specified outside. In this example, they are stored in the second source file, quadratic.cpp:

quadratic.cpp
#include <math.h>
#include "quadratic.h"

void quadratic::set(float a, float b, float c)
// assign values of equation coefficients
{
  _a = a ;
  _b = b;
  _c = c;
}

int quadratic::solve(float& r1, float& r2)
// solve equation, putting roots in r1 and r2
// return value indicates real roots (return=0)
// or complex roots (return=1)
{
  float x = _b * _b - 4 * _a * _c;
  if (x < 0) // complex roots
    return 1;
  else { // real roots
    r1 = (-_b + sqrt(x)) / (2 * _a);
    r2 = (-_b - sqrt(x)) / (2 * _a);
    return 0;
    }
}

The :: means scope operator which specifies that these function bodies are for class member functions. Member functions can automatically access any data member, even private ones.

The third source file, is the main.cpp:

main.cpp
# include <iostream>
# include "quadratic.h"
using namespace std;

int main ()
{
  quadratic q ; // define quadratic instance
  float a, b, c, root1, root2;
  cout << " Enter quadratic equation coefficients a b c: ";
  cin >> a >> b >> c;
  q.set(a, b, c);
  cout << "Equation: " << a << "x^2 + " << b << "x + "
       << c << endl;
  if (q.solve(root1, root2) == 0)
    cout << "Roots = " << root1 << " and " << root2
         << endl;
  else
    cout << "Complex roots" << ends;
  return 0;
}

Note the difference between the class quadratic, and the object/instance q. There is one class definition for quadratic, but we could have many instances of this class. The terms class and instance/object are analogous to data type and variable when talking about ‘normal’ (i.e. non-class) data.

In this driver program, we create an instance of the quadratic class, tell it which coefficients to use (by calling the set member function), and then tell ask it to solve the equation (by calling the solve member function).

Note that the syntax for member function calls (i.e. separating the object name from the function call by a full stop .) is the same as that seen for the length() function in the string library in Chapter 4. In fact string is just a class like quadratic, with its own data members and member functions

UML Class Diagrams

Classes can be represented graphically using notation known as UML. The UML class diagram for the quadratic class is shown below:

Chap_5_img_1

In UML, each class is represented by a rectangular box. The box is divided into three sections separated by horizontal lines:

  • Class Name, top section.
  • Details of data members of the class, middle section. Before the members, + indicates public, - indicates private domain.
  • Details of member functions, bottom section. Here the return type of the member function is indicated after the colon symbol.

Types of Member Function

To introduce some more complex C++ concepts, the following program is a more complex example. This particular program creates a database of student information. A C++ class is defined to store and display the student information. In this simple example only two students are created in the database.

student.h
#include <string.h>
using namespace std;

enum FullPart {Fulltime, Parttime};

class Student {

// data members
private:
  string _firstName;
  string _lastName;
  FullPart _programme;
 
// member functions
public:
  Student();
  Student(string fn, string ln, FullPart p);
  ~Student();
  void setName(string fn, string ln);
  void setProg(FullPart p);
  FullPart getProg() const;
  void Print() const;
};

The Student class contains three private data members: an enumeration type and two strings. Notice how there is a special member function that has the same name as the class Student. These are called constructors, and they are called whenever a new instance of a class is created.

Here we have two constructors, so we can say that they are overloaded. One of the overloaded constructors takes no arguments, and this is known as the default constructor. The other takes 3 arguments which are assigned in the student.cpp source file below.

There is another special function called ~Student(), and this is called a destructor. However unlike constructors, there can only be one destructor for a class and it should always take no arguments. It is called whenever an instance is destroyed, i.e. it goes out of scope and is discarded.

student.cpp
#include <iostream>
#include "student.h"
using namespace std;

// constructor
Student::Student() {
  cout << "Student Constructor 1" << endl;
  _firstName = " ";
  _lastName = " ";
  _programme = Fulltime;
}

// constructor
Student::Student(string fn, string ln, FullPart p) {
  cout << "Student Constructor 2" << endl;
  _firstName = fn;
  _lastName = ln;
  _programme = p;
}

// destructor
Student::~Student() {
  // empty for now
  cout << "Student Destructor" << endl;
}

// mutator
void Student::setName (string fn, string ln) {
  _firstName = fn;
  _lastName = ln;
}

// mutator
void Student::setProg (FullPart p) {  
  _programme = p; 
}

// inspector 
FullPart Student::getProg () const {  
  return _programme; 
}

// facilitator
void Student::Print () const {  
  cout << "----------" << endl;
  cout << "Name: " << _firstName
       << " " << _lastName << endl;
  if (_programme == Fulltime)
     cout << "Full-time\n";
  else
     cout << "Part-time\n";
  cout << "---------" << endl;
}

Note that all classes automatically have a default constructor, even if it isn’t defined by the programmer. However, if a constructor is defined with arguments as above, the default constructor is no longer available, unless explicitly defined by the programmer

Other member functions of Student can be categorised as:

  • Inspectors: report the value of a data member (usually a private one)
  • Mutators: changes or set the value of a data member. (mutators are often necessary if we have private data members as access to them is not permitted from outside the class definition)
  • Facilitators: cause an instance to perform some action or service.

Also look how Print() and getProg() have the keyword const after them. This tells the C++ compiler that none of the data member’s values in the functions will change, otherwise it will result in a compilation error. This isn’t strictly necessary but it helps to highlight bugs that involve accidentally changed data members.

In the main() function below, the instances of the Student class is created in two ways. Either by defining initial values, Student s ("John", "Smith", Full-time); or not Student s2;. One of the two overloaded constructors is called when a new instance is created, depending on whether the initial values are specified or not.

main.cpp
#include <iostream>
#include <string.h>
#include "student.h"
using namespace std;

int main() {
  Student s ("John", "Smith", Full-time);
  s.Print();
  
  Student s2;
  s2.Print();
  s2.setName ("Josephine", "Bloggs");
  s2.setProg (Parttime);
  s2.Print();
  
  return 0;
}

Static Data Members and Member Functions

In C++ classes, all objects/instances of a particular class have their own copies of data members. However by making a data member static, only one copy of the data member is stored for all instances, irrespective of how many objects of the class are created. All objects access the same copy. Static data members must be initialised at the beginning of a program, outside the class definition, using the class scope operator ::.

To illustrate a static data member let’s modify the previous example program (see below). By adding the possibility of storing a count of the total number of students in the database. Each time a new Student object is created/destroyed, one is added/subtracted to the count. This is accomplished by defining an extra public static data member called _count in the Student class. This is incremented in the Student constructors and decremented in the Student destructor.

student.h count modification
...
class Student {

// data members
private:
  ...
  static int _count;

// member functions
public:
  ...
  static int Count() {return _count;}
};

The new inspector member function static int Count() {return _count;} is marked as static. A static member function can be called even if there are no instances of a class yet. This is necessary for inspectors of static data members. The corresponding function body for Count() is defined inside the class definition. This is always allowed even for normal member functions, but if we have too many long function bodies it can make the class definition hard to read.

student.cpp count modification
int Student::_count = 0;
...

// constructor
Student::Student() {
  ...
  _count += 1;
}

// constructor
Student::Student(string fn, string ln, FullPart p) {
   ...
   _count += 1;
}

// destructor
Student::~Student() {
  ...
  _count -= 1;
}

Normally, private data members are not visible outside of the class. Initialising a static data member is an exception to this rule, (in this case private data member _count).

main.cpp count modification
int main ()
{
   ...
   count << "No. students = " << Student::Count()
         << endl;
   ...
}

When accessed from outside of the class definition, static data members and member functions must be accessed using the scope operator ::.

Class Composition

All the examples in this chapter have used single classes. However, often more complex C++ programs will involve defining multiple classes. Each class represents an object in the problem domain and these objects may have relationships between them. The next example illustrates such a case.

This program defines classes to store and manipulate 2-D vectors. A vector consists of two points: the start point of the vector and the end point. A point is represented by its two co-ordinates.

vector.h
class Point {
private:
  float _x, _y;
public:
  Point() {_x = _y = 0;} // constructor
  ~Point() {};           // destructor
  void Set(float x, float y) {_x = x; _y = y;}
  float GetX() const {return _x;}
  float GetY() const {return _y;}
};

class Vector {
private:
  Point _start, _end;
public:
  // constructors
  Vector() {};
  Vector(Point s, Point e) {_start = s; _end = e;}
  // destructor 
  ~Vector() {};
  void Set(Point s, Point e) {_start = s; _end = e;}
  Point GetStart() const {return _start;}
  Point GetEnd() const {return _end;}
  void Print() const;
  float DotProd(Vector v2) const;
}; 

Point _start, _end;, is a prime example of class composition where a class can be a member of another class. Class composition is appropriate for a particular type of relationship in which one class consists of, or has the other. In this case, Vector consists of two Points.

In Vector(Point s, Point e) {_start = s; _end = e;} all data members are copied from one instance to the other.

vector.cpp
#include <iostream>
#include "vector.h"
using namespace std;

void Vector::Print() const {
  cout << "(" << _start.GetX() << ", " << start.GetY()
       << ")->(" << _end.GetX() << ", " << _end.GetY()
       << ")" << endl;
}

float Vector::DotProd(Vector v2) const {
  float x = _end.GetX() - _start.GetX();
  float y = _end.GetY() - _start.GetY();
  float x2 = v2.GetEnd().GetX() - v2.GetStart().GetX();
  float y2 = v2.GetEnd().GetY() - v2.GetStart().GetY();
  return (x * x2 + y * y2);
}

The following main() function illustrates the use of these classes:

main.cpp
int main() {
  Point p1, p2, p3;
  p1.Set(0,0);
  p2.Set(3,2);
  p3.Set(2,4);
  Vector v1(p1,p2);
  Vector v2(p1,p3);
  cout << "v1:" << endl;
  v1.Print();
  cout << "v2:" << endl;
  v2.Print();
  cout << "v1 . v2: " << v1.DotProd(v2) << endl;
  
  Vector v3 = v1;
  cout << "v3:" << endl;
  v3.Print();
  
  return 0;
}

Note that when a Point argument is supplied to a Vector constructor (Vector v1(p1,p2);), it is a pass-by-line i.e. a copy of the Point is made to create the new Vector instance.

A UML Class diagram illustrating the relationship between the Point and Vector classes is shown below:

Chap_5_img_1

The line between Vector and Point indicate a relationship between the classes, and the filled diamond indicates that it is a composition relationship.

Copy Constructors

In the previous point/vector example, when an instance was assigned to another instance (Vector v3 = v1;), it was actually invoking the default copy constructor. A copy constructor for a class is any constructor that takes another class instance as its argument. It is called whenever an instance is created using another class instance as an argument, or if an instance is declared and assigned in a single statement. If we don’t define a copy constructor, a default copy constructor will be created for us, which will assign all corresponding data members between two instances.

A copy constructor can be defined if specific behaviour is desired, as the code below illustrates:

vector.h copy constructor defined
class Point {
...
public:
Point(const Point& p) {
  _x = p.GetX();
  _y = p.GetY();
  cout << "Copy constructor ..." << endl;
}
...

Note that in this case the new copy constructor doesn’t do anything differently to the default copy constructor, it just copies all data members.

The copy constructor is like a normal Point constructor except that its only argument has the type const Point&. This is called a const reference type.



Written by Tobias Whetton

6
Inheritance

 OPEN STANDALONE 

Learning Objectives

  • Represent inheritance hierarchies in UML and C++ and decide when to use inheritance and when to use composition
  • Explain how to use public, protected and private inheritance for code reuse in C++
  • Make good use of constructor chaining in C++ inheritance hierarchies
  • Explain the meaning of the OOP terms polymorphism and dynamic binding and describe how they are implemented in C++
  • Make use of abstract base classes and (pure) virtual functions in C++ inheritance hierarchies
  • Make use of member function hiding in C++ inheritance hierarchies
  • Define multiple inheritance hierarchies in C++ and use virtual base classes when appropriate.

Implementing Inheritance

By using inheritance, we can cause a class to inherit data members and/or member functions from another class. In C++, we refer to the class:

  • inheriting from as the base class or super class
  • inheriting to as the derived class or sub class

To for an introductory example let’s define a single class to store information and perform calculations about different geometric shapes.

shape.h
class Shape {
  public:
    Shape() {};
    float GetArea() { return _area; }
    float GetPerimeter() { return _perimeter; }
  private:
    float _area;
    float _perimeter;
};

The Shape case has one default constructor (Shape() {};) and two public functions, as well as two private data members. As this class stands, there is no way of computing the area or perimeter of the shape, since this detail depends on the specific shape being presented. Therefore, this class is a ‘general’ class for describing shapes and we probably wouldn’t want to define any instances of it as it stands.

Now suppose that we want to define classes for specific shapes, such as:

  • A circle with a radius but also an area and a perimeter (i.e. circumference)
  • A square with a side length but also an area and a perimeter

Therefore, two of the data members are the same for both circles and squares. Rather than duplicating the code, it can be reused by inheriting from Shape when defining the two new ca=lasses, Square and Circle. The code can be modified as so:

shape.h with additions
class Shape {
  public:
    Shape() {};
    float GetArea() const { return _area; }
    float GetPerimeter() const 
    { return _perimeter; }
  protected:
    float _area;
    float _perimeter;
};

class Circle: public Shape {
  public:
    Circle(float r=0) {_radius = r;}
    void ComputeArea();
    void ComputePerimeter();
  protected:
    float _radius;
};

class Square: public Shape {
  public:
    Square(float s) {_side = s;}
    void ComputeArea();
    void ComputePerimeter();
  protected:
    float _side;
};

The line class Circle: public Shape is the location where inheritance is defined for the Circle class. In essence it means “the Circle class publicly derives from the Shape class”. Inheritance is indicated by the colon character : after the derived class name. The same applies to the Square class.

Note that the two data members in Shape are now protected rather than private. The difference between protected and private members is that private members in a base class are never inherited by the derived class, whereas protected members are. Apart from this they are identical.

Types of Inheritance

The public keyword in the inheritance means that we are using public inheritance. Alternatives to public inheritance are private or protected inheritance, but these are not widely used. The precise definitions of the different types of inheritance are:

  • Public inheritance keeps all of the public members of the base class public in the derived class
  • Protected inheritance makes all of the public members of the base class protected in the derived class
  • Private inheritance makes all of the public members of the base class private in the derived class.

UML Notation for Inheritance

Inheritance can be represented using UML class diagrams. In the example below, the inheritance from Circle and Square to Shape is indicated by arrows joining the derived class to the base class.

UML DIAGRAM

Public and protected members contained in the base class, although not shown in the derived classes, are available to those classes. A # symbol is used to denote protected members in UML (recall that + denotes public and - denotes private)

Constructor Chaining

Let’s consider another example, which contains classes for storing information about doctors and consultants at a hospital. The gender, name, title and basic salary information needs to be stored about doctors. For consultants, the same information needs to be stored but in addition to their specialism. There is a clear relationship between doctors and consultants as they share four of the same data members. Rather than duplicating code in the two classes, inheritance can be used to reuse code.

doctor.h
enum GenderType {Male, Female};

class Doctor {
  public:
    Doctor(GenderType g, string n)
    {
      _gender = g;
      _name = n;
      _title = "Dr";
      _basicSalary = 30000;
    }
    void Display () const
    {
      cout << _title << "  " << name << endl;
      if (_gender == Male)
        cout << "Male" << endl;
      else
        cout << "Female" << endl;
      cout << "Basic salary = "
           << _basicSalary << endl;
    }
  protected:
    GenderType _gender;
    string _name;
    string _title;
    float _basicSalary;
};

In this example, the following Consultant class inherits (or derives) from Doctor using public inheritance.

consultant.h
class Consultant : public Doctor
{
  public:
    Consultant(GenderType g, string n, 
               string s): Doctor(g, n)
    {
      _specialism = s;
      if (_gender == Male_)
          _title = "Mr";
      else
          _title = "Ms";
      _basicSalary = 70000;
     }
     void Display()
     {
       Doctor::Display();
       cout << "Specialism = "
            << _specialism << endl;
     }
  protected:
    string: _specialism;
};  

The new concept constructor chaining is illustrated above, Consultant( ... ) : Doctor( ... ). After the Consultant constructor function header, there is a colon followed by a call to the Doctor constructor. This causes the base class (i.e Doctor) constructor to be called whenever the derived class (i.e . Consultant) constructor is called. In order words, the calls to the class constructors are ‘chained’ up the inheritance hierarchy. Constructor chaining happens anyway for default constructors (i.e those with no arguments), but if we want it to happen for other constructors (i.e. those with arguments) we must explicitly define constructor chaining like this.

The below UML diagram shows the relationship between the Doctor and Consultant classes.

UML DIAGRAM

Member Function Overriding

The earlier example, also illustrated another new concept called member function overriding or member function hiding. See how there are definitions for the Display() member function in both the Doctor class and the Consultant class. Normally, the Doctor Display() function would be inherited by the Consultant, but in this case it would be incorrect as it doesn’t display the consultant’s _specialaism. Therefore Consultant defines its own version of Display(), which hides the base class version.

Note that we can still call the base class version from the derived class version using the scope operator ::, (Doctor::Display();).

Abstract Base Classes and Concrete Classes

In this chapter there have been two different examples of inheritance from Circle/Sqaure to Shape, and from Consultant to Doctor. Although clearly the use of inheritance is appropriate in both of these case, they are slightly different. In the Consultant/Doctor example, it is possible, even likely, that we will want to create instances of both Consultant and Doctor, since they are both types of clinician at the hospital. However, for the Shape example we will never want to create an instance of the Shape class - it only exists so that we can derive real specific shapes from it. Based on this distinction, now let’s introduce a couple of new terms:

  • Abstract base class only exists to derive new classes form it (The Shape class is an example of an abstract base class)
  • Concrete class exists to create instances of itself (The Square Circle Doctor Consultant classes are all examples of concrete classes)

Virtual Functions

An inheritance hierarchy can be thought of as defining a type/subtype relationship between classes, e.g. a circle is a type of shape. A virtual function defines a type dependent operation within an inheritance hierarchy.

To illustrate the use of virtual functions, let’s modify the shape example introduced earlier:

shape.h modified
class Shape {
  public: 
    Shape() {};
    float GetArea() const { return _area }
    float GetPerimeter() const
    { return _perimeter; }
    // pure virtual functions
    virtual void ComputeArea() = 0;
    virtual void ComputePerimeter() = 0;
  protected:
    float _area;
    float _perimeter;
};

...

The keyword virtual specifies that the ComputeArea() and ComputePerimeter() functions are virtual functions, which represent type dependent operations in the inheritance hierarchy. In fact, these two functions are pure virtual functions. A pure virtual function is specified by adding an assignment to zero after the virtual function signature. A pure virtual function, as well as being a type dependent operation, means that the class that contains it will be an abstract base class, i.e. it will not be possible to create an instance of it. Attempting to create an instance of it will result in a compilation error.

In addition, all classes that derive from a class containing a pure virtual function will also be abstract base classes unless they provide an implementation of the virtual function. In the shapes example, it is now a requirement for both Circle and Square to implement ComputeArea() and ComputePerimeter(), otherwise it will not be possible to create any Circle or Square instances.

The below UML diagram shows how the Shape.h relationships change now we have defined pure virtual functions. Displaying the member function names in italics in Shape indicates that these are virtual functions. Furthermore, the text <<abstract>> above the class name indicates that this is an abstract base class.

UML DIAGRAM

Virtual functions are one way in which the concept of polymorphism is implemented in C++.

Overriding Virtual vs. Non-Virtual

Virtual functions provide a way of specifying different implementations for a single member function at different points in an inheritance hierarchy. However, similar functionality can be produced by overriding a member function without using the virtual key word. i.e. if we override an ‘ordinary’ (non-virtual) base class function in the derived class it will hide the base class function.

For example, the two implementations of an inheritance hierarchy are below, the first using virtual functions and the second using member function overriding. In both implementations an instance of the base or derived class will call their ‘own’ version of the function fn.

class base {  
  ...
  public:
    virtual float fn();
}

class derived {
  ...
  public:
    float fn();
}

When a non-virtual member function is overridden the choice of which implementation from the inheritance hierarchy to use is always made at compile-time. With virtual functions this choice can be made at run-time.

class base {
  ...
  public:
    float fn();
}

class derived {
   ...
   public:
     float fn();
}

The choice of which implementation to use is known as binding. If binding is made at run-time, it is known as dynamic binding. Dynamic binding is used whenever a virtual member function is called through a pointer. The following code illustrates the use of dynamic binding:

Shape *sq = new Square(4.5);
Shape *c = new Circle(2.7);
sq->ComputeArea();
c->ComputeArea();

Note that -> is shorthand for dereferencing a pointer then selecting a member, e.g. the following are the same: (*sq).ComputeArea() and sq->ComputeArea()

Here, the instances sq and c both have a static (i.e. compile time) of Shape*. However, since Square and Circle both derive from Shape, the dynamic type of sq and c could b either Shape*, Square* or Circle*. Precisely which one of these three possibilities they are will not be decided until run-time. In other word, binding of sq and c is dynamic.

Dynamic binding is only used if the virtual function is accessed through a pointer as in the example above. If sq and c had types Square and Circle rather than Shape* then dynamic binding would not be used. In this case they would only have a static type and no dynamic binding would be necessary.

Polymorphism is C++ is related to binding and can be split up into two separate terms:

  • Run-time polymorphism is where the choice of operation being made at run-time and involves dynamic binding (e.g. virtual functions)
  • Static polymorphism is where the choice of operation being made at compile-time (e.g. member function overriding)

Inheritance vs Composition

Composition refers to making one class a data member of another class. Inheritance also results in the members of one class (the base class) being made available to another class (the derived class). All the examples in this chapter could have been implemented using composition rather than inheritance, by including a data member of type Shape in the Circle and Square classes, or add a Doctor member to the Consultant class.

To decide between using inheritance and composition the following rules can be applied. If the relationship between two classes can be described as an:

  • is-a relationship, then it is best to use inheritance.
  • has-a relationship, then it is best to use composition.

For example, in the shapes example, a Triangle is a Shape, so inheritance is the best option. Whereas, with the vectors example, we cannot say that a Vector is a Point, but we can say that a Vector has a Point, so in this case composition is the appropriate mechanism to use.

Single and Multiple Inheritance

In all of the examples so far, each class has only inherited from a single base class. This is know as single inheritance. In C++ it is permissible to inherit from multiple base classes: this is referred to as multiple inheritance.

To illustrate this concept of multiple inheritance, let’s use a student database program example, which stores information about both overseas and UK students. First look at the UML class diagram below. There are three base classes: UKCitizen Student OverseasCitizen. Each has their own data members and corresponding mutator and inspector member functions. The UKStudent class derives from both Student and UKCitizen, and OverseasStudent derives from both Student and OverseasCitizen. This is logical since a UK student is both a student and a UK citizen, and a overseas student is both a student and an overseas citizen.

UML DIAGRAM

The Student class has a pure virtual function called Fees(): all students must pay fees, but the way the fees are computed varies. Therefore, Student is an abstract base class. However, UKCitizen and OverseasCitizen are not - they are normal base classes. An excerpt from the code that implements this inheritance hierarchy is shown below:

Student.h
class OverseasStudent: public OverseasCitizen,
                       public Student { 
...
}

class UKStudent: public UKCitizen,
                 public Student { 
...
}

Multiple inheritance is specified by simply listing multiple vase classes when inheriting, and separating them by commas. Using this inheritance hierarchy is an efficient way to represent the different types of information stored about a student in the database, and permits good code reuse.

Virtual Base Classes

Sometimes multiple inheritance can cause problems. This can be illustrated by the following example, which contains a number of classes to represent different types of animal.

Look at the UML diagram below. There is a base class called Animal, which contains a protected data member called _expectedLifeSpan. Two other classes derive from Animal: Predator Endangered. Both of these classes inherit the _expectLifeSpan data member. The SnowLeopard class multiply inherits from both Predator and Endangered, since snow leopards are endangered predators. Therefore, SnowLeopard would normally inherit two data members called _expectedLifeSpan. This would make any reference to _expectedLifeSpan in SnowLeopard ambiguous, an cause a compilation error.

UML CLASS DIAGRAM

This answer to this problem lies in the use of a virtual base class. If we need to define a hierarchy with such ambiguous inheritance we can define the root base class as a virtual base class, and then only copy one of each multiply-inherited member will be created. To make an Animal a virtual base class we simply add the keyword virtual when defining inheritance from it. For example consider the code below:

Animal.h
class Animal
{
  public:
    Animal() {};
    int GetExpectedLifeSpan()
    { return _expectedLifeSpan; }
    void SetExpectedLifeSpan(int val)
    { _expectedLifeSpan = val; }
  protected:
    int _expectedLifeSpan;
};

class Predator: public virtual Animal
{
  public:
    Predator() {};
    string GetMainPrey()
    { return _mainPrey; }
    void SetMainPrey(string val)
    { _mainPrey = val; }
  protected:
    string _mainPrey;
}

class Endangered: public virtual Animal
{
  public:
    Endangered() {};
    int GetNumLeft() { return _numLeft; }
    void SetNumLeft(int val) { _numLeft = val; }
  protected:
    int _numLeft;
}

class SnowLeopard: public Endangered,
                   public Predator
{
  public:
    SnowLeopard() {};
}

Note that virtual is only added to the Animal base class, not when SnowLeopard is inheriting from Endangered and Predator.

This inheritance problem will only occur if the inheritance hierarchy forms a ‘diamond-like’ shape. Fortunately such cases are quite rare, but it is good to know about this potential problem.



Written by Tobias Whetton

7
Object-Oriented Design

 OPEN STANDALONE 

Learning Objectives

  • Explain the basic steps involved in an object-oriented design process
  • Explain the difference between is-a, has-a and uses-a relationships and explain how to implement each type
  • Perform an object-oriented design given a set of test requirements and produce as a result and appropriate UML class diagram
  • Implement a UML class diagram

Object-Oriented Design Process

The typical object-oriented design process can be divided into four main stages:

  1. Identify the objects
  2. Determine the relationships between the objects
  3. Determine the attributes/behaviours of the objects
  4. Design the driver

Each of these stages is described in the following sections.

1. Identify the Objects

A key stage in object-oriented design is to identify objects given a program specification, or a description of the problem domain. The programmer should work out what things the program will need to deal with and note down potential classes. As well as identifying whether certain objects will allow the program to accomplish some of its objectives.

2. Determine the Relationships

Once a candidate list of potential objects has been identified, the relationships that link these objects can be thought about. There are three main types of relationship:

  • is a: if B is a subtype of A (e.g. a UK student is a student)
  • has a: if A has B and B cannot exist without A. This can also be thought of as A owns B, (e.g. the two real numbers in a complex number do not have any meaning/use outside of the complex number, so this is a has a relationship)
  • uses a: If A has B and B can exist without A (i.e B’s existence does not depend on A’s existence)

To illustrate a use a relationship, consider an a theoretical program which represents information about simultaneous linear equations of the form:

In this problem domain, all the equations have an order (an integer). It should be possible to display all equations. The linear equations should have three floating point coefficients (e.g. 2, 1 and 7 for the first equation shown above). Simultaneous equations should consist of 2 linear equations. It should be possible to display such a set of simultaneous linear equations in the form shown above, and also to solve the equations, i.e. display the values of x and y which satisfy the equations. A UML diagram which satisfies the conditions of this problem would look like so:

UML DIAGRAM

In this UML diagram a simultaneous equation has a linear equation (in fact it has two). To implement this has a relationship, composition is used. However simultaneous equations can also use a linear equation, if we want allow the program to possibly calculate more than just the scenario given.

Aggregation is used to implement a uses a relationship. Aggregation is similar to composition, but rather than making one class a data member of the other, we make a pointer to an instance of the contained class a data member of the contained class. The pointer typically points to an instance of the contained class that already exists outside of the the scope of the containing class. Therefore, if the containing class is destroyed, the contained class can continue to exist.

To show the distinction between composition and aggregation consider the following two alternative implementations of the simultaneous equations problem domain.

equation.h using composition
class SimultaneousEquations
{
  public:
    SimultaneousEquations(){}
    void SetEquations(LinearEquation e1,
                      LinearEquation e2)
    { _e1 = e1; _e2 = e2; }
    void Display()
    { _e1.Display(); _e2.Display(); }
    void Solve();
  protected:
    LinearEquation _e1, _e2;
}
equation.h using aggregation
class SimultaneousEquations
{
  public:
    SimultaneousEquations(){}
    void SetEquations(LinearEquation *e1,
                      LinearEquation *e2)
    { _e1 = e1; _e2 = e2; }
    void Display()
    { _e1->Display(); _e2->Display(); }
    void Solve();
  protected:
    LinearEquation *_e1, *_e2;
}

The main difference between these two implementations is when using composition the type of data members is LinearEquation whereas when using aggregation it is LinearEquation*. This difference means that a SimultaneousEquations instance does not have a LinearEquation, it uses one that must already exist outside of the SimultaneousEquations class. In other words, the existence of the LinearEquation instances does not depend upon any SimultaneousEquations instance of which they are part.

For example, consider the following excerpt from the main() function that makes use of these classes when implemented using aggregation:

SimultaneousEquations s;
LinearEquation 11, 12;
l1.SetCoeffs(2.0, 1.0, 7.0);
l2.SetCoeffs(3.0, -1.0, 8.0);
s.setEquations(&l1, &l2);
s.Display();
s.Solve();

SimultaneousEquations s2;
LinearEquation 13;
l3.SetCoeffs(2.0, 4.0, -5.0);
s2.setEquations(&l1, &l3);
s2.Display();
s2.Solve();

Note that the & operator is used when passing the LinearEquation instances into the SimultaneousEquations instances. This is because the SetEquations member function expects pointer arguments

The l1 instance of LinearEquation is used in both the s and s2 instances of SimultaneousEquations. This means that any change to l1 will result in a change to both s and s2. If composition was used instead, then copies of l1 would have been made and passed into s and s2 instances, so any subsequent changes to l1 would not affect s and s2.

UML CLASS DIAGRAM

Notice in the above UML class diagram, aggregation is represented with an unfilled diamond, compared to the earlier composition which is represented by a filled diamond

3. Identify the Attributes and Behaviours

The interactions between objects are defined by their public interfaces, which is their corresponding set of public attributes and behaviours. The public interface of an object should be carefully considered, and whether it should be available to the entire program, restricted to the class hierarchy or restricted to the individual class itself. To illustrates the process of identifying attributes and behaviours, let’s return to the simultaneous equations example again.

The attributes are fairly straightforward, and were mentioned when the problem was set up. By logical thinking about the problem, the following behaviours can be identified:

  • Solve() a public member function of SimultaneousEquations
  • GetA() GetB() GetC() a public inspector function in LinearEquation that provide access to the coefficients of the equation.
  • Display() a public type dependent member function in the Equation inheritance hierarchy
  • Display() a public member function in the SimultaneousEquations class

Note that candidate objects can be eliminate whilst identifying attributes/behaviours

4. Design the Driver

The driver can be thought of as the glue that binds the objects together, or the main algorithm of the program that makes use of the objects. In C++ the driver algorithm of the program corresponds to the main() function.

UML Notation

BASIC UML DIAGRAM

The basic format of class boxes in UML is illustrated below. Remember <<abstract>> above the class names indicates that the class is an abstract base class. All other classes will be concrete classes. The symbol to the left of each data member or member function indicates its visibility, i.e.

  • + public
  • # protected
  • - private

UML DIAGRAM

The scope of attributes and behaviours can also be indicated using UML. static data members are common to all instances of the class (i.e. the have class scope), whereas non-static members are specific to an instance of a class (i.e. they have instance scope). In UML terminology, C++ static members are known as classifier members, and non-static members are known as instance members. The UML notation for classier members is to underline as illustrated below.

UML DIAGRAM

UML notation for generalisation relationships is an arrow on the line from the derived class to the base class. The notation for has a (i.e. composition) relationships is a filled diamond on the lien from the containing class to the contained class, with the diamond being at the containing class end. The notation for uses a relationship (i.e. aggregation) is an unfilled diamond instead of a filled diamond. Finally for a virtual function (a type dependent operation in an inheritance hierarchy) the UML notation is italics.



Written by Tobias Whetton

8
Operator Overloading

 OPEN STANDALONE 

Learning Objectives

  • Define overloaded operator functions in C++
  • Make an appropriate decision about whether an overloaded operator function should be global or a member function
  • Make good use of friends in C++
  • Explain in what situations the assignment operator should be overloaded in C++, and make good use of the ‘this’ pointer when implementing an overloaded assignment operator

Definition of Overloading

Overloading is where multiple functions are defined with the same name but have different prototypes.

There are some pre-defined operators for common basic data types. For example the simple arithmetic operators (+, -, etc.) are defined for the int,float and char data types. Multiple ‘overloaded’ implementations of these basic operators is known as operator overloading. These basic operators are in many ways similar to functions as they have names, take arguments and return values. However, they way in which they are applied differs. If a C++ function called plus is written to perform addition, it would be used as follows:

x = plus (y, 2);

However, using the built-in operator + the same operation can be performed as follows:

x = y + 2;

In other words, the operator symbol + appears between two arguments, instead of before them. The + is a binary infix operator, as it takes two arguments and the operator appears in between them. The + - * / operators are all binary infix operators. The ++ and -- operators are unary operators as they take a single argument. They can be used either as prefix (before the argument) or postfix (after the argument) operators.

These built-in operators in can be overloaded but they are only defined for use with basic data types, i.e int, char, bool, float, long int, short and double. If a new class is defined then it will not be possible to use these operators with instances of the new class.

For example, if a new class Rational is created to store information about and perform calculations with rational numbers (i.e. numbers that can be written as ratio of two integers, e.g. 3/7). Now with this new class it would be convenient if the built-in arithmetic operators could be used to perform calculations, as in the following:

Rational r1 (2, 5); //define rational number 2/5 2 
Rational r2 (5, 7); //define rational number 5/7 3 
Rational r3;
r3 = r1 * r2; // use built-in multiply operator

However as the built-in operators are only defined for the basic data types, this isn’t possible. Operator overloading is the C++ mechanism allowing the use of built-in operators for classes that have been created from scratch.

Operator Overloading Implementation

To illustrate the use of operator overloading in C++, lets consider the following scenario.

Real, rational and complex numbers are all types of number in the domain of numbers. All numbers can be negated but precisely what negation means is dependent on the type of number. Similarly, it should be possible to display all numbers but exactly how they are displayed will vary. It should not be possible to create an instance of a number, only real, rational or complex numbers. A real number is represented by a single floating point value. A complex number comprises two real numbers, one for real (i.e. non-imaginary) and one for the imaginary part. You should be able to compute the conjugate of a complex number (i.e just negate the imaginary part). A rational number consists of two integers, one for the numerator and one for the denominator.

First lets draw a UML class diagram of this scenario, using the OO design steps mentioned in the last chapter:

UML DIAGRAM

Using the UML class the following header file for rational numbers can be written:

Rational.h
class Rational : public Number 
{
  public:
    Rational () {}
    int GetN() const { return _n; } 
    void SetN(int val) { _n = val; } 
    int GetD() const { return _d; }
    void SetD(int val) { _d = val; }
    void Negate () { _n = -n; }
    void Display () const
    {cout << _n << " / " << _d;}
  protected:
    int _n; 
    int _d;
};

In order to be able to use the basic arithmetic operators such as * with instances of the Rational class we must overload them so that they can take Rational instances as arguments. The following code, which can be added to the rational.h file, achieves this aim.

Rational operator* (const Rational& r1, const Rational& r2)
{
   Rational ret; 
   int n = r1.GetN()*r2.GetN();
   int d = r1.GetD()*r2.GetD();
   ret.SetN(n);
   ret.SetD(d);
   return ret;
}

See how the new overloaded operator function is not a member function of Rational (although normally it can be as we’ll see later). The name of an operator function is the word operator followed by the symbol for the operator we want to overload, e.g. operator+ for the addition operator, operator[] for the array subscript operator, etc.

The overloaded operator function returns a single value of type Rational. Recall that the * operator is a binary operator: therefore, the over-loaded operator* function takes two arguments, both of type Rational. Since the arguments should not be modified as a result of the operation, both are specified as const arguments.

Reference Types

In the previous example both arguments to operator* were defined as pass-by-reference parameters i.e. using the & symbol. When appended to a type (e.g. Rational&), the & symbol creates a new type called “a reference to” the type. A reference type is like a pointer except that it doesn’t need to be dereferenced to access the value pointed to. Also, when a reference type is const it means that we can’t change the object itself (for a const pointer it’s just the pointer that can’t be changed).

In Rational.h we could have defined both arguments as being of type Rational rather than const Rational&. A const reference type was used purely for efficiency reasons but this is very common when overloading operators. If the arguments had a Rational type, then copies of the Rational instances would be created each time the * operator was used; when using a const reference type only the reference is passed but still the value pointed to cannot change.

Friends

In the earlier example, the overloaded operator function had to use the inspector functions provided by the Rational class (i.e. GetN() and GetD()) to access its data members. In this simple example, this is ok but for more complex classes this may become cumbersome. Also, because the overloaded operator function is specific to the Rational class, it seems natural that it should have access to private and protected members. As the implementation stands this is not possible, however C++ provides a mechanism for such access: the friend keyword. Consider the following modified implementation of the Rational class:

rational.h
class Rational : public Number 
{
  public:
    Rational () {}
    int GetN() const { return _n; } 
    void SetN(int val) { _n = val; } 
    int GetD() const { return _d; }
    void SetD(int val) { _d = val; }
    void Negate () { _n = -n; }
    void Display () const
    {cout << _n << " / " << _d;}
    friend Rational operator* (const Rational& r1,
                               const Rational& r2);
  protected:
    int _n; 
    int _d;
};

Rational operator* (const Rational& r1, const Rational& r2)
{
   Rational ret; 
   int n = r1._n*r2._n;
   int d = r1._d*r2._d;
   ret.SetN(n);
   ret.SetD(d);
   return ret;
}

By adding the prototype of the operator* function inside the Rational class definition, preceded by the friend keyword, gives the operator* function all of the access privileges that come with being a class member. Therefore when accessing the data members of Rational (e.g. int n = r1._n*r2._n;), they can be referred to directly rather than using the inspector and mutator functions.

Note: in the case of the * operator it would have been possible to implement the operator overloading by making the operator* function a member function. More on this later.

Overloading the Output Operator

Another example of operator overloading is overloading the output operator (i.e. <<). Consider the following addition to the previous example code:

rational.h addition
ostream& operator<< (ostream& os, const Rational& r)
{
  os << r._n << "/" << r._d;
  return os;
} 

Note that because operator<< accesses the data members of Rational directly it will need to be made a friend of Rational

The overloaded << operator function takes two arguments: an stream reference and an instance of the class (which is a const reference again for efficiency reasons). ostream is actually a type of out: recall that << is a binary infix operator and when used to send data to standard output its left-handed argument should be cout; the right-hand argument should be the data sent. The ostream reference argument can be used just like count in the function body of operator<<. Now the Rational class can be used as follows:

Rational r;
...
cout << r;

For the use of <<, the input arguments are cout (ostream instance) and r (Rational instance). The return type of the operator<< is another ostream instance which means it is possible to chain multiple << operators in the same statement, e.g.

cout << "Number: " << r << endl;

Note that ofstream is derived from the stream, so the overloaded << can also be used for output to external files.

Overloading the Assignment Operator

The assignment operator is already defined for classes that are defined by the programmer. For example, it is possible to use the assignment operator on instances of the Rational class as defined above. The result of this built-in assignment operator is to perform member-by-member assignment of all data members of the class. Therefore, so ling as the assignment operation is defined for all of these data members we normally don’t need to overload the assignment operator ourselves.

Such member-by-member assignment is sometimes not appropriate for classes that have one or more pointers as data members. As the values pointed to will not change, only the pointers themselves. If this is not the desired behaviour, then the assignment operator must be overloaded to implement the required behaviour. Let’s look at the Rational class once more:

Rational.h
class Rational {
...
  public:
    Rational& operator= (const Rational &r)
    {
      // do the copy
      _d = r._d;
      _n = r._n;
      
      // return the existing object
      return *this;
      }
...
};

The overloaded assignment operator Rational& operator= (const Rational &r) { ... } is a member function of the Rational class. Note that the assignment operator is a binary operator (i.e. it takes two arguments), and that it has to change the value of one of these arguments (the one on the left-hand side of the assignment).

However, although assignment is a binary operator the operator= function only takes a single argument. This is because, when an overloaded operator function is a member function of the class, the left-hand argument is always the class instance itself. Therefore, the number of arguments to the overloaded operator function should be reduced by one. The single argument of the operator= function represents the right-hand side of the assignment operation, in this case an instance of the Rational class (a const reference again).

The This Pointer

In the last example, the function body returned a special variable called this, which is a built in C++ pointer variable which is included in every class, and points to the current instance of the the class. In most cases we don’t need to use it as if we’re in a member function of a class and we refer to another member (i.e data member or member function) as it automatically refers to the one in the current instance.

However, when overloading the assignment operator we need to return the current instance as the return value of the overloaded operator function (this is a pointer so it needs dereferencing using the * symbol). Returning the current instance is required because an assignment operation does return a value; this is what makes it possible to chain assignments, e.g. x = y = 10. Therefore, the this pointer is needed to access the current instance.

Assignment Operator vs Copy Constructor

A copy constructor is a class constructor that takes another class as its argument. This is similar to the use of overloaded assignment operators, but there is a key difference. Below, the code illustrates this:

Rational r1, r2, r3;
...
Rational r1 = r4; // copy constructor used
r3 = r1;          // overloaded assignment used

The copy constructor is used when an instance is created and assigned to at the same time. The overloaded assignment operator is used when an existing instance is assigned to.

Note that both the copy constructor and the overloaded assignment operator are useful in the same situation, i.e. when a “deep copy” (i.e. copy the objects pointed to rather than the pointers themselves) is performed of pointer data members

Overloading a Unary Operator

All the operator overloading examples, we have seen so far have overloaded binary operators, i.e. those that take two arguments. For instance, for the * operator the overloaded function took two Rational arguments, and for the << operator it took a ostream and a Rational argument. The code outline below shows how to overload the ++ operator:

class Rational {
...
  public:
    Rational& operator++() // prefix
    {
      _n += _d;
      return *this;
    }
...
};

The overloaded operator function is a member function of the class, as it makes changes to its data members. The function takes no arguments as it just changes the current instance and returns the same instance with the this pointer.

List of Overloadable Operators

The following table summarises all of the operators that can be overloaded in C++, together with their normal (i.e. non-overloaded) meanings.

Operator Normal Meaning
+ Addition
_ Subtraction
* Multiplication
/ Division
++ Increment
-- Decrement
= Assignment
() Function Cell
[] Array Subscript
-> Indirect member
% Modulus
|| Logical OR
| Bitwise OR
&& Logical AND
! Logical NOT
!= Not Equal to
> Greater than
>= Greater than or equal to
< Less than
<= Less than or equal to
== Equal to
+= Add to
-= Subtract from
*= Multiply by
/= Divide by

Global or Member Functions

Most operators can be either global or member functions of a class. However, one rule is that if it is a member function, then the left-handed operand of the operator must be an instance of the class. For example, in the code shown below r1 is the left-hand operand and r2 is the right-hand operand.

Rational r1, r2;
...
r1 = r2

As the left-hand operand (i.e. r1) is Rational, it is permitted for the overloaded assignment operator to be a member function. Similarly, the overloaded * operator in the previous example could have been a member function because its left-hand operand was also Rational.

However if the left-hand operand is not of the class type (such as operator<< in which the left-hand operand is ostream&), the operator must be overloaded as a global function. In this case it should be a friend of the class if it needs access to private or protected members.

A second rule is that the assignment (=), subscript ([]), call (()), and member selection (->) operators must be overloaded as member functions. Although most operators can be overloaded as either global or member functions, it is typical (and good practice) to make an overloaded operator function a member function if it changes the data members of the class. Otherwise it should be global.

Overloaded Operator Function Prototype

Once we have decided whether to make the overloaded operator function global or a member function, the next stage is to form its prototype (i.e. decide how many arguments it should have and of which type, and which return type it should have). The following are useful guidelines to forming the overloaded function’s prototype:

Member Functions

If it’s a member function, the left-hand operand becomes the current instance (i.e. *this). All other operands become function arguments. Therefore, we should define one fewer argument to the overloaded operator function, e.g. if its a binary operator, we should define one argument representing the right hand operand. And if it’s a unary operator, we should define no arguments at all.

Operator Function Prototype
= Type& operator= (const Type&)
+= Type& operator+= (const Type&)
-= Type& operator-= (const Type&)
*= Type& operator*= (const Type&)
/= Type& operator/= (const Type&)
++ (prefix) Type& operator++ ()
-- (prefix) Type& operator-- ()
[] Type& operator[] (int)
Global Functions

If its a global function, for binary operators we should define two arguments to the overloaded operator function (as in the * example). If it’s a unary operator we should define one argument.

Operator Function Prototype
+ Type operator+ (const Type&, const Type&)
- Type operator- (const Type&, const Type&)
* Type operator* (const Type&, const Type&)
/ Type operator/ (const Type&, const Type&)
<< ostream operator<< (ostream&, const Type&)
>> istream operator>> (istream&, Type&)
== int operator== (const Type&, const Type&)
> int operator> (const Type&, const Type&)
>= int operator>= (const Type&, const Type&)
< int operator< (const Type&, const Type&)
<= int operator<= (const Type&, const Type&)
! Type operator! (const Type& b)

String Library Revisited

The file is not part of C++ functionality it is just a predefined class and it uses operator overloading to make it easy to use. For example consider the following uses of `string` objects:

string s;
s = "hello";             // overloading assignment operator
cout << s << endl;       // overloading << operator
string s2 = s + "world"; // overloading + operator
char c = s[2];           // overloading [] operator

Each of the above statements use an overloaded operator function that has already been defined in the string library. This is a good example of illustrating the point that using operator overloading can be seen as part of information hiding.



Written by Tobias Whetton

9
Templates

 OPEN STANDALONE 

Learning Objectives:

  • Explain the advantages of using function and class templates.
  • Make good use of function and class templates.
  • Explain the different types of template parameter, and make good use of them when defining C++ templates.
  • Explain the relationship between C++ templates and the object-oriented concept of polymorphism.

Introduction to Templates

Templates are used when a number of functions or classes are almost identical, except that they utilise different data types. They are used to make code shorter and easier to understand. A signal template function or class can be written, which is then instantiated to use a number of different data types. There are two types of template: function templates and class templates.

Function Templates

Function templates allow us to define a single function that can be instantiated many times for different types. To introduce the concept of function templates, consider the ‘maximum’ example mentioned below.

max.h
1 template < class t >
2 t find_max ( t a , t b ) {
3   if ( a > b )
4     return a ;
5     else
6     return b ;
7 }

To start with the function signature t find_max ( t a , t b ) is preceded by the definition (template <class t>) of type template parameter t which tells the compiler that this is a templated function. The identifier type t is a type template parameter: in other word, t will be considered to be a data type in the function, but it is not specified yet.

The type t can be any type, but the only restriction is that the comparison operator > must be defined for the type, since this is used in the function body (as the > operator is only defined for all of the simple data types).

Now let’s examine the main() function that calls the find_max function:

main()
#include <iostream>
#include "max.h"
using namespace std;

int main () 
{
  // find maximum of two integers
  int x, y;
  x = 12;
  y = 11;
  cout << "Maximum of " << x << " and " << y
       << " is " << find_max(x, y) << endl;
   
  // find maximum of two chars
  char p, q;
  p = 'w';
  q = 'c';
  cout << "Maximum of " << p << " and " << q
       << " is " << find_max(p, q) << endl;
  return 0;
}

To make use of the templated function all we need is to #include the max.h file. Only the find_max implementation for int and char will be instantiated at compile time.

Class Templates

Class templates allow us to define a single templated class that can be instantiated for different data types. To illustrate this concept consider the following example which implements a class for representing pairs of numbers, including a member function that returns the maximum of the two numbers. C

pair.h
template <class p>
class Pair{
  public:
    Pair(p val1, p val2) {
      _values[0] = val1;
      _values[1] = val2;
    }
    p getMax();
  private:
    p _values[2];
};

Note that the syntax for specifying a class template is the same as for function templates, by just adding the line template <class p>, where p is the identifier

The function body for getMax() still needs to be defined, this can be done by adding the following code to the pair.h file after the class definition:

template <class p>
p Pair<p>::getMax()
{
  if (_values[0] > _values[1])
    return _values[0];
  else
    return _values[1];
}

Note that when a member function body is defined outside of the class definition, the scope operator :: has to be used to indicate that it is a member function rather than a normal function.

The slight difference to the normal syntax for member function definitions is that now we have to include the type template parameter inside angled brackets after the class name, i.e. Pair<p>. Similarly when instances of the class are declared, the type must be defined using angled brackets like so Pair<int> for integer data types. This is shown in an example main() function below.

1 # include <iostream >
2 # include "pair .h"
3 using namespace std ;
4
5 int main ()
6 {
7   // find maximum of two integers
8   int x , y ;
9   x = 12;
10  y = 11;
11  Pair <int> myPair1 (x , y );
12  cout << "Max of " << x << " and " << y << " is "
13       << myPair1 . getMax () << endl ;
14
15  // find maximum of two chars
16  char p , q ;
17  p = ’w’;
18  q = ’c’;
19  Pair <char> myPair2 (p , q );
20  cout << "Max of " << p << " and " << q << " is "
21       << myPair2 . getMax () << endl ;
22
23  return 0;
24 }

Let’s consider another example of class templates, an array class. The motivation for defining a new array class is that arrays in C++ include no bounds checking, i.e. if the array index is out of rage then no error messages are display. A templated array class that the [] operator includes such bounds checking.

templated_array.h
template <class element, int n>
class TemplateArray {
  private:
    int _size;       // number of elements in array
    element *_array; // the elements
  public:
    // default constructor
    TemplateArray() {
      _size = n;
      _array = new element[_size];
    }
    // constructor that initialises all values to 'val'
    TemplateArray (element val)
    {
      _size = n;
      _array = new element[_size];
      for (int i = 0; i < _size; i++)
        _array[i] = val;
    }
    // overload the [] operator
    element& operator[] (int);
};

This class allows the programmer to instantiate new arrays using any data type, and also any length. Therefore, this time the class template defined in the templated_array.h file has two parameters, as indicated in the first line template <class element, int n>:

  • type template parameter element indicates the type of elements in the array.
  • value template parameter, an int parameter called n, which specifies the number of elements to be included in an array.

In fact any template definition (function or class) contains these two types of parameter

Templates and Polymorphism

The word polymorphism refers to a single object taking a number of different forms.

  • Virtual functions are an example of run time polymorphism.
  • Member function overloading is an example of static polymorphism.
  • Templates are another example of static polymorphism.

A templated function or class is a single entity in the program, but it can take on a number of different forms of data types. The different form for the function/class are instantiated at compile-time. s



Written by Toby Thomas & Tobias Whetton

10
Memory Management

 OPEN STANDALONE 

Learning objectives

  • Explain the difference between memory allocation on the stack and on the heap.
  • Make good use of the new and delete commands.
  • Recognise problems associated with memory management, such as memory leaks and avoid them.
  • Implement effective memory management.
  • Understand how new and delete can affect constructors/destructors of class instances.
  • Explain why a copy constructor and overloaded assignment operator can be necessary in C++ classes and what is meant by a deep copy

Scope and memory

Memory management refers to arranging the allocation and deallocation of memory for the purpose of storing data for the program’s use. When a variable (or instance) is declared, the compiler allocates memory for it. The space must be freed up (deallocated) once it goes out of scope. Recall that there are three types of scope

  • Global scope: Variable is declared before the main function (it can be accessed from anywhere within the program).
  • Function scope: Variable is declared inside and function (only accessible from within that function).
  • Block scope: Variable is declared inside a code block (ie. {,…}) (only accessible from within that block).

The following code illustrates a potential problem that can occur when the concepts of scope is combined with pointers:

int main()
{
  int *p;
  if (1 == 1) { // always true, just to make block scope
    int a = 10;
    p = &a; // p points to contents of a
  }
  
  // a has now gone out of scope so what does p point to
  // now? It may still work but strictly the value
  // output is undefined
  cout << *p << endl;
  
  return 0;
} 

In this code, the variable a has a block scope, so it will go out of scope as soon as the if loop finishes, at which the memory allocated for it will be deallocated (freed up). However the pointer variable p is still in scope as it has function scope, and p was assigned to point to the address of a inside the if loop. Because of this p ends up pointing to nothing, so the program’s behaviour will be undefined. The use of pointers and the rules of variable scope have cause an unexpected problem with memory management.

New and Delete Operators

The new operator allocates memory for a pointer to point to, without explicitly declaring a variable and taking its address. e.g.

int *x = new int;
*x = 5;

A useful property of new its that the memory will stay allocated until explicitly deallocated, even if the original pointer to it goes out of scope. This happens as the new operator does not allocate memory onto the program stack, instead it uses another block of memory called the heap.

Memory allocated onto the heap never goes out of scope except when the program terminates. The only way to deallocate memory using new is to use the delete operator. The delete operator deallocates the memory pointed to by an argument, e.g.

int *x = new int;
...
delete x;

Memory Leaks

However a new a problem arises when using the new operator (mind the pun). A memory leak occurs when the program does not deallocate memory that it has finished using. Usually a C++ compiler handles all allocation and deallocation of memory using the scoping rules. But by allocating memory ourselves, we run the risk of creating memory leaks if we don’t deallocate them. Memory leaks are considered to be bad because use up unnecessary memory, making programs less (space) efficient and causing them to run more slowly and in severe cases crash. TO illustrate this point consider the function below:

int *getPtr (int val)
{
  int *x = new int;
  *x = val;
  return x;
}
...
int main() {
  for (int i = 0; i < 5; i++) {
    int *ptr = getPtr(i);
    cout << *ptr << endl;
    // no delete!
  }
}

Every time around the for loop in the main function, the getPtr function is called. This allocates a block of memory of one int value on the heap, and returns a pointer to it. The value pointed to is displayed and then the next iteration begins. However, the next iteration assigns a new value to ptr but has deallocated the old block. This old block is left allocated with no pointer pointing to it, and therefore no way of freeing it up. This is a memory leak. The correct implementation is shown below:

int main() {
  for (int i = 0; i < 5; i++) {
    int *ptr = getPtr(i); // involves new operation
    cout << *ptr << endl;
    delete ptr; //  corresponding delete operation
  }
}

Ensure that for every new statement there should be exactly one delete statement, otherwise having more than one delete statement will result in a program error. On top of this, be careful to only use delete to deallocate memory if it has been allocated using new (not just a ‘normal’ pointer which has been created).

Allocating/Deallocating Variable Length Arrays

With ‘normal’ array declarations the memory for the array will be allocated on the program stack, and therefore the size of the array must be known at compile time. For example the following code is not allowed in standard C++ because the size of the array declared cannot be determined at compile time:

int n;
cout << "Enter length of array: ";
cin >> n;
int a[n];

However, arrays are actually implemented using pointers, so it is possible to handle memory allocation for arrays ourselves on the heap using a new statement. When allocating memory for arrays on the heap, the restriction of knowing the array size at compile-time does not apply. The following modified code illustrates this point:

int n;
cout << "Enter length of array: ";
cin >> n;
int *a = new int[n];
delete[] a;

Note that delete[] is used rather than the normal delete as this tells the compiler that an array of data should be deleted at the given address. This form should always be used with regard to arrays.

Here is another example, illustrating all of this, just for fun:

...
int n;
cout << "Enter length of array: ";
cin >> n;
int *a = new int[n];
for (int i = 0; i < n; i++) {
    cout << "Enter value " << i << ": ";
    cin >> a[i];
}
cout << "Array values:" << endl;
for (int i = 0; i < n; i++) {
    cout << a[i] << endl;
}
delete[] a;

Allocating/Deallocating Class Instances

It is also possible to use new to allocate class instances, for example consider the following code:

...
class Point {
  private:
    float _x, _y;
  public:
    Point() {_x = _y = 0;} // default constructor
    Point(float x, float y) // constructor
    {_x = x; _y = y;}
    ...
};

...
int main() {
  Point *p_ptr;
  if (true) {
    p_ptr = new Point(); // calls default constructor
    Point p1;            // calls default constructor
    Point p2(1.0, 2.4);  // calls second constructor 
  } // p1, p2 out of scop - desctructor called for each
  delete p_ptr; // calls destructor for *p_ptr

  return 0;
}

This code creates an instance of the Point class using the default constructor (i.e. the one with no arguments), and makes p_ptr point to it. The class destructor is called whenever the instance gets deallocated using the delete statement.

Notice that we did not define our own destructor for Point as the default destructor is created automatically by the compiler. We only need to define our own destructor, if we have used new, to perform memory management. To show this as an example, consider the following which dynamically allocates the array on the heap when the sample size is specified:

class Sample {
  public:
    Sample() { _n = 0; }
    ~Sample() { delete[] _x; }
    int Size() { return _n; }
    float Data(int i) { return _x[i]; }
    void SetSample(float val[], int n) {
      _n = n;
      _x = new float[_n];
      for (int i = 0; i < _n; i++)
          _x[i] = val[i];
    }
    float Mean() const;
    float StdDev() const;
    void Display() const;
    void Sort();
  private:
    float *_x;
    int _n;
};

In this implementation, instead of a float array of fixed length, we now have a float pointer as the data member _x. This is allocated in the SetSample member function. Therefore, the destructor for Sample must delete this allocated memory. Failure to perform this memory management would result in a memory leak in the program.

Copy Constructors

Consider the following main function which makes use of the new Sample class.

int main() 
{
  float a[8] = {18.44,14.18,19.79,15.73,15.36,
                16.17,13.91,15.35};
  Sample samp1;
  samp1.SetSample(a, 8);
  samp1.Display();
  
  if (true) {
    // copies data member by data member
    Sampler samp2 = samp1;
  } // samp2 out of scope so destructor is called
  
  // undefined output because samp2 destructor will
  // have deleted the data pointed to by samp1 ...
  samp1.Display();
  
  return 0;
}

Although at first sight this looks fine, when the Sample instance samp1 is displayed the second time it will produce an undefined output. This occurs as the pointer _x in samp2 points to the same block of memory as the pointer _x in samp1.

When samp2 goes out of its block scope, the default destructor for it is called, deallocating the memory pointed to by both samp1 and samp2. So when samp1 tries to display afterwards the block of memory containing its sample data no longer exists. An additional problem is that when samp1 goes out of scope at the end of the main function, it will try to delete memory that has already been deallocated.

We can get around this problem by overloading the copy constructor to perform a ‘deep copy’ rather than a member-by-member copy. This is illustrated in the modified implementation of Sample shown below:

class Sample {
  public:
    Sample() { _n = 0; }
    Sample(Sample& s) // copy constructor
    {
      _n = n.Size();
      _x = new float[_n];
      for (int i = 0; i < _n; i++)
          _x[i] = s.Data(i);
    }  
    ~Sample() { delete[] _x; }
    int Size() { return _n; }
    float Data(int i) { return _x[i]; }
    void SetSample(float val[], int n) 
    ...
  private:
    float *_x;
    int _n;
};

The new overloaded copy constructor allocates a block of memory using new and copies each element of the array pointed to by _x individually.

Overloaded Assignment Operator

A similar issue to that highlighted in the previous example with copy constructors, comes up when performing assignments of classes that allocate memory on the heap. First, recall when the copy constructor is used and when the assignment operation is used:

  • Copy constructor is called when either an instance is declared using another instance as an argument; or an instance is declared and assigned to in a single program statement.
  • The assignment operation is used when an existing instance is assigned to.

Just as with copy constructors, if a ‘deep’ copy is required then the assignment operator must be overloaded. For example, a modified implementation of the assignment operator for Sample is shown below:

class Sample
{
  public:
    Sample() { _n = 0; }
    Sample(const Sample&); // copy constructor
    ...  
    ~Sample() { delete[] _x; }
    int Size() const { return _n; }
    float Data(int i) const { return _x[i]; }
    ...
    Sample& operator=(const Sample& s)
    {
      // delete if already data in this instance
      if (_n > 0)
        delete[] _x;
      
      // copy
      _n = s.Size();
      _x = new float[_n];
      for (int i = 0; i < _n; i++)
          _x[i] = s.Data(i);
      return *this;
    }
  private:
    float *_x;
    int _n;
};

In the implementation for the overloaded assignment operator Sample& operator=(const Sample& s), the main difference is that it first checks to see if there is already data stored in the current instance of Sample. If there is, it must delete it. The copy constructor did not need to do this as, by definition, the instance was only just been created. The overloaded assignment operator must always return the current instance using the this pointer.



Written by Tobias Whetton