Visualisation you can use depends on types of data involved. Here are three common combinations:
Common method is joint contingency table. Datasets:
Bsc | BA | |
Male | 11 | 2 |
Female | 20 | 7 |
No command available, so a function an be written for example, ’frequencyTable.m’. Content of function is not important at the moment, but how to use it is:
>> help frequencyTable
usage: [] = frequencyTable( x, y )
x,y input variables: must have the same length
output is frequency table of unique values within x and y
>> load('fortyStudentDegreeData')
>> load('fortyStudentGenderData')
>> frequencyTable( degree, gender )
Often also called a joint frequency table, and values often presented as % of data set size.
You can also combine categorical and continuous variables using a box plot, as seen in chapter 1.
This covers three commonly used methods:
Datasets: 40 students in class:
One variable on x-axis, other variable on y-axis, dot at position for each individual in class.
>> load('fortyStudentDistanceData.mat')
>> load('fortyStudentHeightData.mat')
>> scatter( heights, distance )
>> xlabel('Student height (cm)', 'fontsize', 18);
>> ylabel('Distance travelled to university (cm)', 'fontsize', 18);
After binning data can be shown as a 2D histogram. Counts per bin displayed as intensities, or heights in a 3D plot.
load('fortyStudentDistanceData.mat') load('fortyStudentHeightData.mat') heightNdistance=[ heights, distance ]; % for intensity plot
histArray = hist3( heightNdistance ); colormap( gray )
imagesc( histArray );
% for 3D plot
hist3( heightNdistance )
Useful when one variable can only have a single corresponding second variable. Commonly used when one of the variables represents time. Simply use the plot command which was used in chapter 1:
load('interestRate.mat');
plot( Irate(:,1), Irate(:,2))
xlabel('Year', 'fontsize', 18 )
ylabel('Interest rate', 'fontsize', 18 )
Two main types of research design:
Numerical value for how linearly related two variables are. The correlation coefficient, r, for paired variables xi, and yi, i = 1,…,n (where n is the sample size) is given by:
Equation to be added soon
where paired variables xi, and yi , i = 1,…,n (n is sample size):
A strong correlation does not signify cause and effect. E.g. there is a strong correlation between ice cream sales and incidences of drowning. Does ice cream consumption cause drowning? No, both are related by a much stronger factor, daily temperature.
The correlation coefficient can be greatly affected by a few outliers.