Advanced Data Types


Advanced Data Types

oop



Learning Objectives

  • Describe the use of pointers.
  • Describe how one- and multi-dimensional arrays are defined and implemented.
  • Define and manipulate string variables.
  • Define and use new data types using struct and enum.
  • Perform file input/output operations.

Type checking

There are two main differences between C++ and Matlab with regard to data types, and these are summed up in the table below:

Language Type Definition Type Checking
Matlab implicit
works out the type of variable from the value it is assigned
dynamic
type consistency is checked at run time
C++ explicit
stated by the programmer in the variable declaration
static
type consistency is checked at compile time

Pointers *

If we think of simple data types as containers that hold a value of the specified type. Then pointers are the addresses to the locations of the containers. Just like ordinary data types, pointer variables still have an associated data type, for example, a variable may have the type ‘pointer to an int’ , or ‘pointer to a char’. They are defined using the * symbol, before the variable name in the variable declaration (however do not confuse this with dereferencing!). For example:

int *p1, *p2;  // pointers to 'int' values
char *cp;      // Pointer to a 'char' value

Declaring a pointer variable does not mean that it points to a valid value in memory - it must be initialised to point to something. For example:

int val = 1000;
p1 = &val;

The & symbol means the address of, in this case p1 points to an area of memory which holds the int value 1000. Consider another example below:

char *c = new char;
*c = 'x';

This code creates a variable called c that points to a char type. The new keyword can be used to allocate memory without first defining a variable as in the previous example, where int val = 1000;. Similarly we can delete the space pointed to:

delete c;

This statement frees up memory allocated by the new statement, enabling the compiler to make use of it for something else. This means that c no longer points to a valid value and should not be accessed. It is always good practice to delete unused pointers.

It is possible to have multiple pointers printing to the same memory space, for example:

int *p1 , *p2 ;
int val = 1000;
p1 = &val ;
p2 = &val ;

When more than one pointer points to the same area of memory, if the value of the variable is changed via one pointer, it is changed via the other pointer as well. For example:

*p1 = 500;
cout << "p1 -> " << *p1 << ", p2 -> " << *p2 << endl;

The output produced this code would be p1 -> 500, p2 --> 500. The * before a variable name dereferences the pointer (i.e. it returns the value pointed to), and this is effectively the opposite of &.

Arrays

A 1-D array is a sequence of values of the same type (e.g. an array of 10 integers). The values are commonly referred to as elements. Higher-dimensional arrays are also possible such as 2-D arrays where every element is itself an array.

One-Dimensional Arrays

In C++ array variables must be explicitly declared and the array size be specified and fixed at compile-time (i.e in the array variable declaration). Consider the following example:

int x[10]; // defines array of 10 integers
x[9] = 3; // assign to last one

The array size is specified in the square brackets [ ] after the variable name. Elements of an array are undefined until they are initialised. Square brackets are also used to access array elements, and note that array indices start at 0.

To assign an entire array in one statement when the array variable is declared, use curly brackets, e.g.

int x[10] = {1,2,3,4,5,6,7,8,9,10};

Note that in C++ commas are required as a delimiter, spaces cannot be used to separate the element values.

When assigning an array it is possible to omit the array size and let the compiler work out the size itself, editing the previous example:

int x[] = {1,2,3,4,5,6,7,8,9,10};

Once an array has been declared only individual elements can be assigned and accessed, not the array as a whole. Also, it is impossible to display an entire array in a single statement, only individual array elements can be displayed using the count statement. For example, to display the array variable x declared in the previous example the following code is required:

for (int xind = 0; xind < 10; xind++)
    cout << x[xind] << " ";
cout << endl;

Higher-Dimensional Arrays

Multi-dimensional arrays can be defined in C++, for example declaring a 2D array:

char a[10][10]; // 10 by 10 array of chars
int b[2][2] = {{2, 3}, {1, 4}}; // 2x2 array of ints

The size of the second dimension is specified in a second set of square brackets after the first. If the elements are initialised then the size of the array has to be considered, in this case two sets of curly brackets are used.

Higher dimensional arrays are accessed in the same way as 1-D arrays, the second array index is just added inside the square brackets after the first, for example:

int d = b[0][0] * b[1][1] - b[0][1] * b[1][0];

Arrays and Pointers

Arrays are implemented by the C++ compiler using pointers. For instance, an array of integers is implemented as a pointer to an integer, where the value pointed to is the first element of the array and the other elements are in continuous memory spaces following the first element.

Consider the following example:

int p1[3] = {1000, 500, 750}

The array variable p1 (which is in fact a pointer to an int) points to the area of memory containing the first element (100). The second and third elements are contained in the areas of memory immediately following this first element:

1000 p1
500
750

So any array element can be accessed by dereferencing the pointer and moving forward a certain number of blocks of memory. In fact, the square bracket notation is simply a short-hand way of doing this. The same concept is applied for 2-D arrays, for example:

int p2[2][3] = {{1000, 500, 750}, {100, 200, 300}};

Therefore in this 2-D case the graphical illustration of array implementation:

1000 p2
500
750
100
200
300

This explains why, the size of higher-dimensional arrays need to be known at compile-time. From looking at the above graphical illustration for a 2-D array alone we would know if p2 was a 2 x 3 or a 3 x 2 array (or indeed a 1-D array of 6 elements). Therefore, this information needs to be specified in the array declaration, and will be remembered by the compiler so it can access the array elements correctly.

Passing Arrays as Function Arguments

Arrays can be passed as arguments to functions, just like any other variable. However, all array arguments to functions are treated as pass-by-reference, as arrays are essentially pointers. The syntax for passing array arguments to functions is:

void func (int x[]) {...}

For 1-D arrays, the array size doesn’t need to be specified when defining the function header. But for 2-D arrays at least the second dimension needs to be specified, although both can be specified:

void function (int x[2][3]){...}

The following example illustrates the passing of an array variable to a function. The code displays a frequency table of true positive (TP), false positive (FP), true negative (TN), false negative (FN) values.

#include <iostream>
#include <iomanip>
#include "freq_table."
using namespace std;

void dispFreqTable(int freq[2][3])
{
  int rsum[2] = {0,0}, csum[2] = {0,0}, tot = 0;
  cout << "         | GT +ve | GT -ve | Total" 
       << endl;
  cout << "---------|--------|--------|------"
       << endl;
  for (int r = 0; r < 2; r++) {
      cout << "Test +ve |";
      for (int c = 0; c < 2; c++) {
          cout << setw(8) << freq[r][c] << "|";
          rsum[r] += freq[r][c];
          csum[c] += freq[r][c];
          tot += freq[r][c];
      }
      cout << setw(6) << rsum[r] << endl;
  }
  cout << "---------|--------|--------|------"
       << endl;
  cout << "Total    |" << setw(8) << setw(8)
       << csum[0] << "|" << setw(8) << csum[1]
       << "|" << setw(6) << tot << ends;
}

In the above program the dispFreqTable function takes a 2 x 2 array of integers as an argument. The first 2 in the argument type is optional as it is only required to specify the second array dimension in the argument type. The setw function sets the width of the output of the next item in a cout. To use setw the source file #include <iomanip> is called.

The string Library

An array of characters is known as a string. The most common way of using them to #include the standard <string> library. Consider the following program:

#include <iostream>
#include <string>
using namespace std;

int main()
{
  string greeting = "hello", name;
  cout << "What's your name? ";
  cin >> name; 
  string a = greeting + " " + name;
  cout << a << endl;
  int n = name.length();
  cout << "Your name has " << n << " letters"
       << ends;
  return 0;
}

The string library defines new versions of built in C++ operators with its own version which can take strings as arguments i.e. overloading. Here a few definitions from the string library:

  • = assignment
  • + string concatenation
  • cin input
  • cout output
  • == != > < >= <= string comparison operators (performed by character)
  • getline gets a line of text from standard input

The above program also illustrates the use of a special function that is associated with a string variable: name.length(). Here, name is a string variable and the function call length() is appended to it, separated by a full stop. This function call returns the number of characters in name. These special functions are known as member functions. Other member functions can be called in the same way, using string variables:

  • find finds an instance of a substring within a string
  • replace replaces a substring within a string by another string

The following program demonstrates the use of the find and replace member functions with string variables:

#include <iostream>
#include <string>
using namespace std;

int main() 
{
  string str ("Brazil are the best team in the world.");
  
  // can search for a constant string
  size_t found = str.find("the");
  if (found != string::npos)
     cout << "'the' found at: " << found << endl;
 
  found = str.find("the", found + 1);
  if (found != string::npos)
     cout << "second 'the' found at: " 
          << found << endl;

    // can search for another string variable
  string str2 = "England";
  found = str.find(str2);
  if (found != string::npos)
     cout << "'England' found at: " << found << endl;
	
  // can search for another string variable
  string str2 = "Brazil";
  found = str.find(str2);
  if (found != string::npos)
     cout << "'Brazil' found at: " << found << endl;
  
  // replace a substring with another string 
  str.replace(str.find(str2), str2.length(), "England");
  cout << str << endl;
  
  return 0;  
}

The find function takes a single argument, which can be another string or a string constant. It returns a number indicating the array index where the substring starts. The type of the returned value is size_t which is just an unsigned integer, however it has the special property of guaranteeing to be big enough to refer to the largest amount of memory the machine has.

In the case of the substring not being found i.e. ‘England’ in the above program. Then find returns a special value from the string library called string::npos. This means the constant npos from the string library. The :: symbol is called the scope operator.

The replace function takes 3 arguments, the start and end indices of the substring to be replaced, and the string to replace it with. The output of the program when run would be:

'the' found at: 11
second 'the' found at: 37
'Brazil' found at: 0
England are the best team in the world.

The struct statement

Structures are another way of defining new data types. Whereas array types are used for storing a collection of values of the same type, structures are used for storing a collection of values of different types. Each component of a structure is called a member.

Consider the following example for defining a data type to store information about patients:

struct PatientData {
  string firstName;
  string lastName;
  unsigned int age;
  double bloodPressure;
};

This defines a new data type, called PatientData. Variables of type PatientData contain four values: two string, an unsigned int and a double. Variables can be declared of type PatientData just as we can for built in C++ types, as the following code illustrates:

PatientData p1;
cout << "Enter patient's name (first last):";
cin >> p1.firstName >> p1.lastName;
cout << "Enter " << p1. firstName << " " 
     << p1.lastName << "'s age:";
cin >> p1.age;
cout << "Enter " << p1.firstName << " "
     << p1.lastName << "'s blood pressure:";
cin >> p1.bloodPressure;

struct provides a great mechanism for creating new types that group data together.

The enum statement

Enumeration types are another way of creating new data types. A variable that is declared as an enumeration type can take any one of a pre-defined number of symbolic values.

For example, the following code creates a new data type to store information about chess pieces:

enum ChessPiece {Pawn, Rook, Knight, Bishop, King,
                 Queen, Empty};
enum Colour {White, Black, None};
struct Square {
  ChessPiece piece;
  Colour colour;
};

Here we have defined 3 new data types:

  • ChessPiece variables can take any one of these symbolic values: Pawn Rook Knight Bishop King Queen Empty
  • Colour variables can take any of these symbolic values: White Black None
  • Square variables contain two values: a Chesspiece and a Colour

Based on these newly defined data types, we can then go on to declare a chess board, and to start to fill it up with pieces:

Square b[8][8];
b[0][0].piece = Rook;
b[0][0].colour = White;

The variable b represents the chess board, and is a 2-D (8 x 8) array of Sqaure. We have initialised the square index by [0][0] to be a white rook. The use of enum types is good practice if the data is symbolic i.e. categorised data, non-numeric with no ordering.

File Input/Output

Reading and writing data from and to external files occurs much in the same way with advanced data types as standard ones. Consider the following example program:

#include <iostream>
#include <fstream> 
using namespace std;

int main()
{  
  ifstream inFile;
  inFile.open("data.txt");
  if (!inFile) {
    cerr << "Error opening file: data.txt" << endl;
    return 1;
  } 
  int ages[5];
  for (int i = 0; i < 5; i++)
    inFile >> ages[i];
  inFile.close();
  return 0;
}

In the above program, #include <fstream> allows the program to use any file input/output operations. Variables can then by declared of type ifstream (for input files) or ofstream (for output files).

All files (input or output) must be opened before use with the open() function in this case. If open() returns a false value the file is not successfully opened (i.e doesn’t not exist or is locked).

The ifstream variable, inFile, can be used like cin to input data. cerr is an alternative output statement that sends data to standard error rather that standard output. It is good practice to separate normal program output from error messages in this way. All the files (input and output) must be closed after use.

Although not illustrated in the above example, the same principles apply to the ofstream variables just like cout. In addition there are a number of other file input/output functions that we can make use of:

  • get gets a single character from the input file
  • put puts a single character into the output file
  • getline gets an entire line from the input file, i.e. it will read all the data until the next newline character (i.e any space, tab or newline)

The following example shows the use of getline

#include <iostream>
#inlcude <fstream>
using namespace std;

int main()
{
  ifstream namesFile;
  namesFile.open("names.txt");
  if (!namesFile) {
    err << "Error opening file: names.txt"
        << endl;
    return 1; 
  }
  string names[10];
  for (int i = 0; i < 10; i++)
    getline (namesFile, names[i]);
  namesFile.close();


return  link
Written by Tobias Whetton