Statistical terms and methods

The Aboriginal and Torres Strait Islander Health Performance Framework (HPF) uses the following statistical terms and methods for analysis.

First Nations people and non-Indigenous population descriptors

The term ‘First Nations people’ is now the preferred term used by the AIHW when referring to Aboriginal and Torres Strait Islander people. Across the HPF website, ‘Indigenous Australians’ is also used. As content in the HPF website is updated, this will be progressively updated to ‘First Nations people’.

In most of the data presented, ‘First Nations people’ refers to people who have identified themselves or have been identified by a representative (for example, their parent or guardian), as being of Aboriginal and/or Torres Strait Islander origin. For a few data collections, such as those associated with government grants and payments, information on acceptance of a person as being of Aboriginal and Torres Strait Islander origin by their community may also be required. See also Glossary.

Note that references to ‘First Nations people’ on this website refer to First Nations people in Australia only. Where data are compared with those for indigenous people internationally, for clarity, ‘Aboriginal and Torres Strait Islander people’ will be used.

The ‘non Indigenous Australians’ descriptor is used where the data collection allows for the separate identification of people who do not identify as First Nations. The label ‘other Australians’ is used to refer to the combined data for non-Indigenous people, and those for whom Indigenous status was not stated.

Crude rates

A crude rate is defined as the number of events over a specified period (for example, a year) divided by the total population at risk of the event.

Age-specific rates

An age-specific rate is defined as the number of events for a specified age group over a specified period (for example, a year) divided by the population of that age group. Age-specific rates are useful for comparing rates across age groups when rates are strongly age-dependent.

Age-standardisation

This is a method of reducing the influence of age and therefore allowing comparisons of summary rates between two populations that have different age structures. Age-standardisation is used throughout this report when comparing First Nations people with non-Indigenous Australians for a range of variables where age is a factor and when monitoring trends over time.

There are two different methods of age-standardisation, direct and indirect. The method most commonly used for the HPF is direct age-standardisation, however, some tables presenting data from the National Hospital Morbidity Data Collection use the indirect method.

Direct age-standardisation

Age-standardised rates, based on the direct age-standardisation method, are defined as the weighted average of age-specific rates, with the weights being equal to the proportion of people in each age group of the standard population. In HPF, the 2001 Australian standard population is used for direct age-standardisation. In general, 5-year age groups up to 75+ are used (0–4, 5–9, …, 70–74 and 75+). However, depending on the availability and/or size of the data, this may vary – the 5-year age group may be combined into larger age groups, for example, 10-year age groups, up to 65+ or 55+. In some specific topics, such as smoking during pregnancy, age-standardisation is applied by 5-year age groups for women aged 15 to 44.

Indirect age-standardisation

The indirect method is recommended when age-specific numbers of events for the population being studied are not known, or when calculating rates for small populations where fluctuations in age-specific rates can affect the reliability of rates.

Rate difference and rate ratio

A rate difference is the absolute difference between two rates, and is calculated by subtracting one rate from another. For example, the rate for First Nations people with a particular characteristic minus the rate for non-Indigenous Australians with the same characteristic to show the absolute difference between the two.

A rate ratio measures the relative difference between two population groups and is calculated by dividing one rate (for example, a rate for First Nations people with a particular characteristic) by another (for example, a rate for non Indigenous Australians with the same characteristic) to show the relative difference between the two. A rate ratio greater than 1 indicates a higher rate of the characteristics in the populations of interest, and a ratio of less than 1 suggests a lower rate of the characteristic in the population of interest.

Relative standard error (RSE)

Sample surveys, particularly those conducted by the Australian Bureau of Statistics (ABS), are a major source of data for many statistics used in the HPF. The aim of sampling is to achieve ‘representation’ so that the results are the same as if the whole population had been included. When estimates are based on data from a sample that is selected from a population, rather than a full enumeration of that population, they are subject to sampling variability. This means the estimates may differ from what would have been produced if the data had been obtained from the complete population.

Standard error (SE) of the rate quantifies how much the sample estimates is expected to vary from its true value. In the HPF, the SE of a crude rate is generally calculated using the formula:

Standard error

where p=m/n, m is the number of incidences and n is the size of the population. When p is very small (for example, rare events like deaths), the term (1 – p) is close to 1 and can be omitted from the above formula.

Relative standard error (RSE) measures reliability of estimates based on sampling error and expresses the standard error as a percentage of the estimate:

Relative standard error

In the HPF, only estimates with a RSE less than 25% are considered reliable. Estimates with RSE between 25% and 50% should be used with caution. Estimates with a RSE greater than 50% are considered too unreliable for general use due to high sampling error.

Confidence intervals

Even without a variation in the underlying rate in a population, the observed rate from a sample may vary due to random chance. A confidence interval (CI) is a way to measure the precision of an estimate using standard error (SE). A 95% CI describes a span of numbers around the estimate which has a 95% chance of including the true value. A narrow CI indicates high precision or low random error. Conversely, a wide CI indicates lower precision and hence greater uncertainty in the estimate. The 95% confidence interval of an estimate p is calculated by

LCL=p-1.96×SE

and

UCL=p+1.96×SE

Where SE is the standard error, LCL and UCL stand for the lower and upper limit of the confidence interval, and 1.96 is the z-score corresponding to the 95% confidence level (p <0.05).

Annual change and per cent change over time

The annual change and percentage change over entire period are computed using two methods.

For some HPF measures, for example, in measures 3.15 Access to prescription medicines and 3.21 Expenditure on Aboriginal and Torres Strait Islander health compared to need, change over time in expenditure is calculated using the first and last year of data.

In most other measures, to use the information for all the available years, linear regression (based on ‘least squares’ method) is used to calculate the annual change and the per cent change over time. The ‘slope’ estimate of the simple linear regression line (y = a + bt) is used to determine the annual change in the data over the period. The percentage change is calculated taking the difference between the first and last points on the regression line, dividing by the first point on the line and multiplying by 100. In the HPF, these analyses are generally undertaken when there is a minimum of 5 data points, with some exceptions of 4 data points.

Significance tests

Significance tests are undertaken to determine whether the difference between two estimates (such as difference between numbers, means or proportions between First Nations and non-Indigenous, between remote and non-remote areas, or between two different years) is significantly different from 0 or not . In HPF, significance tests are conducted at the 5% level (p < 0.05).

The word ‘significant’

Significance tests determine whether the difference observed between two or more estimates for different groups or over time is likely to reflect a true/real differences or whether it could have occurred due to random variation. Statistical tests showing statistically significant difference indicate a real difference with high level of confidence. In general, differences and changes over time highlighted in the HPF are statistically significant unless otherwise stated. However, it is important to note, that ‘statistical significance’ does not mean practical significance, and should not be used outside its statistical context.

Where results are shown to be statistically significant, an * is placed alongside the statistics included in the data tables. However, not all relationships in the data tables have been tested for significance.

Statistical testing for rate differences and rate ratios

The testing hypotheses for rate difference (RD) are

and

where RD = p₁ – p₂. If RD is outside the interval of (–1.96×SE(RD),1.96×SE(RD)), H₀is rejected. In this case we say that p₁ and p₂ are significantly different at the p < 0.05 level. In other words, the rate difference RD is significantly different from 0 at the p < 0.05 level. Otherwise, if RD is located inside the interval of (-1.96×SE(RD),1.96×SE(RD)), then p₁ and p₂are not significantly different at the p < 0.05 level.

The testing hypotheses for rate ratio (RR) are

and

where RR= p₁ / p₂. If ln(RR) is outside the interval of (-1.96 × SE (ln(RR)),1.96 × SE (ln(RR))), H₀is rejected. In this case we say that the rate ratio is significantly different from 1 at the p < 0.05 level. Otherwise, the rate ratio is not significantly different from 1 at the p < 0.05 level.

Tables include an * next to the rate ratio and rate difference to indicate that rates for the First Nations and non-Indigenous populations are statistically different from each other at the p< 0.05 level. Where results of significance testing differ between rate ratios and rate differences, caution should be exercised in the interpretation of the tests.

Statistical testing for annual change

In the method where linear regression is used to calculate annual change, the parameters of the simple linear regression line (y = a + bt) are estimated using least squares, where y represents the number or rate of the characteristic under study, t the time (year), a the intercept and b the slope coefficient representing the annual change. To test whether the annual change is statistically significant, the testing hypotheses are

and

The coefficient b follows a t-distribution with n – 2 degrees of freedom, where n is the number of years for which data is available. To test the significance of b, we use t* to denote the critical value of t at level 0.05. If b is located outside of the interval (–t*×SE(b), t*×SE(b)), then H₀ is rejected. In this case we say that regression model is significant at the level p< 0.05, or the coefficient b (annual change) is significantly different from 0 at the p < 0.05 level. Otherwise, the annual change is not significantly different from 0 at the p < 0.05 level.

Suppression of small numbers

In the HPF, small numbers are suppressed due to confidentiality and reliability reasons,

Unless the data provider has a different requirement, the following rules of suppression are applied in the HPF:

If the incidence is 4 or less (but not zero), the number and rate will not be reported. The relevant cells will be filled with “n.p.” (not published).
If, after suppression, a small number still can be calculated out from the total, another number needs to be supressed. Generally, the rate related to the suppressed cell which is larger than 4 does not need to be supressed, unless the supressed cell can be calculated out from the rate.
For direct age-standardised rates, if the incidence is less than 20, the rate will also be replaced by “n.p.”.

Browse by tier or topic

Tier 1 - Health status and outcomes

Tier 2 - Determinants of health

Tier 3 - Health system performance

Data and resources

Technical appendix

Overview