russellboyle.com prose About Russell Boyle
Home Web programming courses Poetry anthologies Journal articles Order programming courses and poetry Contact russellboyle.com
Statewide assessment of students
Understanding the basics of measurement

The Practising Administrator Vol 19 No 3 1997 pp 30-31

As Commonwealth, State and Territory governments increasingly turn their attention to the measurement of student achievement, Russell Boyle examines a few of the essential requirements of the statewide assessment instruments...

Statewide assessment is the large scale educational measurement of students. All States and Territories conduct statewide assessment programs in the senior secondary years for the certification of students and for selection into post secondary education courses, tertiary courses and employment.

The Commonwealth, State and Territory governments, however, are increasingly turning their attention to the introduction and development of statewide testing programs for younger secondary students and for primary school students. Education in Australia is, without doubt, going through a measurement phase. Politicians seem intent on measuring everything they can about student achievement.

Governments view the realisation of the intellectual potential of young Australians as the precursor of an internationally competitive economy. Statewide testing programs can provide governments with reliable and valid measurements of student achievement in relation to curriculum standards. These measurements can then be used by schools to design individual student learning programs and by politicians to develop policy that targets those areas most in need of funding assistance.

This year has seen a historic agreement between the Commonwealth, States and Territories to introduce, in 1998, the large scale literacy and numeracy testing of all Year 3 and Year 5 Australian students.

In his media release dated March 14, 1997 Federal Schools Minister, Dr David Kemp, said that it was 'a national disgrace that 30 per cent of young teenagers cannot read properly. There has been no improvement in literacy standards in the past 20 years.' Later, in the same media release, Dr Kemp says that it 'is clear that our education policies are failing a large number of children. Clearly, this situation cannot continue. Parents, quite rightly, have demanded improvement.'

Many States and Territories already administer statewide testing programs. For example, the Learning Assessment Project (LAP) is becoming well established in Victorian schools. This project assesses Victorian Year 3 and Year 5 students in three learning areas. English and Mathematics are tested each year. The third learning area is different each year. This year it was Studies of Society and Environment. Schools and parents receive a LAP report that outlines achieved levels of performance and shows how students compare with other Victorian children. The Government is planning to extend the statewide testing of student achievement with the soon to be implemented Victorian Student Assessment Monitor that will test Year 7 and Year 9 Victorian students.

Measurement basics
To judge fairly the value of statewide student testing programs one needs to have some understanding of educational measurement theory and practice.

All measures of student achievement contain measurement error. Achievement is a continuous variable. Between any two values on the scale there are an infinite number of intermediate values. A more sensitive measurement instrument can produce a more accurate measurement. A measure of achievement for a particular student will therefore equal the student’s true score plus an error score. The aim of educational measurement theory is to minimise measurement error because in practice it is not possible to obtain a student’s true score.

To be useful a test needs to produce reasonably consistent or generalisable results. In this regard consider the following points:

The timing of the test can be an important consideration. Would students receive the same scores if they did an equivalent form of the test last week, next week or next month? Probably not. The longer the time between tests the greater the likely variability of results for a given student. A test measures student achievement at a particular time only. A reliable test can produce consistent results regardless of the timing.

Consistent does not mean identical. Human factors such as tiredness, illness, mood, memory lapse and subsequent learning can all contribute to inconsistent results. The extent to which a test measures consistently whatever it measures is known as test reliability. The more reliable a test the more consistent will be the results on subsequent applications of the test and the less prevalent will be measurement error. There are well-known statistical procedures for measuring test reliability.

The most important attribute of a test, and by far the most difficult one to attain, is test validity. The extent to which a test measures what it purports to measure is known as test validity. There is no such thing as a valid or invalid test. Rather there are degrees of validity. Test validity is a function of the meaningful interpretation of test results.

The validity of a test is lowered by the presence of test questions that are ambiguous or use unfamiliar language, by unclear directions, and by student uncertainty as to whether any penalties are applied for incorrect answers. Each of these factors may be controlled by thorough vetting procedures during test development.

People who develop statewide assessment programs have expertise in test construction, measurement theory and statistical analysis. However, they also need to be familiar with curriculum standards and be experienced in curriculum development and delivery. Statewide tests that fail to match test questions with curriculum content and standards will be of questionable validity.

There are other factors, far more difficult to control, that diminish test validity. For example, the extent to which the test modifies student behaviour. Some students, when faced with a statewide test, will perform normally whilst other students will be frightened by the experience and still others may by highly motivated to perform.

Lack of control over the administration of the test across the state or around the country can also lower test validity. In some schools the invigilators may be under-prepared or even worse antagonistic towards the philosophy of statewide student assessment. The possibility of students cheating and the extent to which some teachers may modify their teaching in the lead up to the test can have an impact on the meaningful interpretation of test results.

A test cannot be valid without first being reliable. Whilst reliability is a necessary condition for test validity it is not a sufficient condition. A test of high reliability may have little validity because it is measuring something quite different from what it was designed to measure. Furthermore, reliable test results can and have been used and interpreted in entirely inappropriate ways especially when reported by the media.

Towards statewide assessment
Educational measurement is an inexact science. No statewide test can ever claim to be error free, completely reliable and valid. However, many of the criticisms of statewide tests can also be directed at school based tests, assignments, essays, and research reports. Even classroom observation is vulnerable to allegations of value judgements and teacher bias. The school based grades that teachers assign to students are usually a weighted aggregate of many and varied assessment tasks. As good as these may be they can at best only compare the achievement of a particular student to that of other students in their school. Considering widespread community support for extramural activities such as The Australian Mathematics Competition it would seem that parents and students hunger for statewide comparisons and student percentile rankings.

Despite a lack of complete control over some of the factors that contribute to test validity, experience should allow test developers, governments and schools to constantly improve on the degree of validity of statewide tests of student achievement. The Victorian Learning Assessment Project is arguably a more valid measurement instrument in 1997 than it was in 1995 when it was first introduced. The continued philosophical objection to the LAP by some Victorian teachers is possibly more detrimental to its validity than any other single factor.

How can politicians develop good education policy and efficaciously distribute scarce financial resources unless they have objective data upon which to base their deliberations? Statewide student assessment has the potential to highlight how best we can meet the present and future learning needs of young Australians. To succeed in this goal, testing programs will need to be developed, implemented and further improved on within a milieu of mutual respect and cooperation between governments, school systems and teachers.

If the teaching profession fails to support positively the introduction and development of statewide assessment programs, then the status of teachers will likely be reduced from a position of educational leadership to a position of mere providers of education.

Russell Boyle Vera Poh
Copyright
P cubed  programming, poetry and prose