SEGway October 2014

SEGway October 2014

Research and Assessment News from SEG Measurement

In This Issue

Five Scary Things About Assessment

A Pile of Items is Not an Assessment

Measuring Pain

About SEG Measurement

Upcoming Conferences

31 Pheasant Run

New Hope Pennsylvania 18938 267.759.0617
(267) 759-0617

Dear Scott,

This is a very scary issue of SEGway. With the arrival of Halloween, we wanted to share some of the things that scare us most in the world of assessment and efficacy research.

We have learned a lesson or two about assessment in our work with you over the past 10 years. In this issue we share some of what you have taught us, We devote this issue to the scary things about assessment we have learned from you over the years.

With the implementation of Common Core and other new state standards and the coming PARCC and Smarter Balanced Assessments, it should come as no surprise that we are seeing a significant uptick in publishers and tech providers asking SEG for help with their assessment and effectiveness research for new and newly aligned products..

First we begin with "the five things that scare us about assessment. This is followed by a discussion of the difference between a "pile of items" and an assessment. Just creating a set of items--even if they are good items--is a far cry from developing an effective assessment. Our "Measurement Moment" this month discusses an important area of measurement" that we don't often think of.in those terms--the measurement of pain.

The fall conference season is here and we are noiw at the AACE ELearn Conference in New Orleans and then we are on to the NJEA conference in November and the Early Childhood conference in Fort Lauderdale in December. Please let us know in advance if you are attending any of these so we can meet and learn more about your work in the education marketplace.

Take a look at our website at www.segmeasurement.com, as it is continually updated with developments in the field. And, feel free to email me at selliot@segmeasurement.com. I'd love to hear what's on your mind.

Sincerely,

Scott Elliot

SEG Measurement

The Five Scariest Things About Assessment

Assessment is a great tool for learning if it is used properly. But, when assessments are developed poorly or implemented improperly it can be pretty scary. There are so many ways to make mistakes, but we are limiting our discussion to five areas to avoid scaring you too much..

Test Misuse- Number one on our list is the use of tests for purposes for which they were not intended and have not been validated. Tests are designed for a specific purpose(s). As with many things, it is impossible to be all thingsfor all people. If a test is well developed, it is likely to meet its intended purpose(s) well. The problems arise when that test is then used for a purpose other than for which it is intended.

Before you can use a test for a new purpose, you need to find out whether it is well-suited for that purpose. Measurement expers refer to this as validation; a test must be validated for its intended purpose. The test user must gather evidence supporting its legitimacy for the new purpose. Here are a few good examples of how tests can be misused:
- A college admissions test is designed to predict how students will do in their freshman year of college. While the test is shown to be valid for that purpose, a college proceeds to use that test for placement into freshman English courses, without finding out if it is useful for that purpose.
- A publisher provides a school with a formative math assessment used to provide ongoing feedback to students, that is shown to be effective for that purpose. The school then decides to use the test as a high stakes assessment for deciding on whether students should be promoted to the next grade.

In short, just becasue a test works well for one purpose, it does not necessarilyu mean it is useful for other purposes. If you think it can be used in new ways, check it out first!

Test Unreliability- Fundamentally, you need to be able to count on a test giving you consistent information from time to time and from form to form. If you can't get the same information (e.g., score) on a test from one day to the next, then how can you be confident in the information you are getting. If the test is not reliable, then you reallyare not getting useful infromation. You certainly would not want to make important decisions based on an unreliable test.
There are several ways to evaluate the reliability of a test; SEG, or any other reputable testing organization can help you make this determination. The bottom line is: make sure the test is reliable; don't just assume that it is consistent enough to use for making important decisions.
Poor Alignment of Content- All too often we see tests that are not weel aligned to the content. A test, may otherwise follow good test development practice, but fail to align properly with the content that is being taught. Again, if a test is not precisely aligned to the domain of content you are interested in, the information you get from that test is likely not telling you what you need to know.
This often happens when the developer of the test very generally identifies the domain to be covered, then rushes forward, writing items that do not really reflect what is being taught. For example, we once saw a test labeled American History that measured the range of important content from the inception of the nation through the current day. But, the text/course it was designed to measure only covered content up unitl 1900. This misalignment severely mitigated the value of this test.

The best way to avoid poor alignment is to clearly identify the content to be measured, using detailed objectives, competencies, a content outline or other definitional approach. And, clearly specified the extent to which each of these definitional elements should be covered.

Mistaking the Map for the Territory- Tests are proxies. No matter how thorough, no matter what types of test items you are using, and no matter how good it is, a test is still a representation of the underlying knowledge, skill, attitude or other underlying construct for which you are trying to get information. Tests are a useful, but imperfect, representations of knowledge or skills, much like a map is a useful, but imperfect, representation of the territory it pictures.
Rembmering this is critical. So, a studnet who gets an "80" on a test is not an "80". This information should be used in confjuction with other information in order to get a more complete picture of the students knowledge or skills.
Construct Irrelevant Variance- This is a great term to use when you want to impress people at cocktail parties. It is a fancy way of saying that some of what you are measuring with a test is the result of something other than the knowledge or skills you intended to measure. When this happens, the information you get from the test is, in part, information about something other than the content and skills you thought you were measureing.
There are many possible sources of "construct irreleavant variance". Some common examples include: speedness, or the effect of not having enough time to complete the test propoerly, misunderstanding the instructions, unintended trickiness of the questions, and a lack of familiarity with the type/style of the questions. The bottom line: You should tke reasonable steps to eliminate any extraneous influences on test scores. You want to be sure that the results are truly a reflection on the test takers knowledge and skill.

Engaging the help of a professional assessment development organization, such as SEG Measurement, you can avoid these and other pitfalls

Click here for more about assessment development

SEG has worked with many educational publishers and technology providers, from start-ups to the largest industry players, to develop high quality assessment programs. With nearly 40 years of experience in research, we know what it takes to conduct sound efficacy research. Please contact us to discuss your needs. Email us

or call us at 800.254 7670 ext. 102.

A "Pile of Items" is Not an Assessment

When developing a bank of good test items is not enough

So, you have written a great bank of test questions to accompany your new product. You did your best to avoid the pitfalls described in the previous artilcle. You made sure that you aligned them to the Standards, you had them reviewed by educators and editors. But, this does not mean you have an "assessment". There is still alot you do not know.

You need to administer the items to a set of test takers similar to those who will ultimately take the test and conduct a psychometric analysis of the items.We can learn a lot by administering the questions to a group of students and examining how students answered them. Psychometrics is a scary word to many educators. It evokes images of a mad scientist performing brain experiments on live patients. Thankfully, psychometrics is, in reality, a lot less threatening. It is the science of measuring mental constructs like student knowledge, skills, and attitudes. This science helps you evaluate the questions and the overall assessment. A proper psychometric analysis is also a professional requirement in the field of assessment and common expectation of buyers.

How do we know if the items are any good? Until we try them out on a group of students, we are really "guessing" at how good the questions are. Psychometrics supports the review of each question to find out the level of difficulty, how effectively it discriminates, whether it "fits" with the underlying construct well and other item characteristics. For example, we may find that nearly everyone is getting an item wrong, and perhaps that more students were picking one of the incorrect responses than actually got the right answer. The item looked pretty good at the outset, but now we can see that it is not very good at finding out the student's skill level and actually may be giving us little or no information.

After reviewing the test questions, we still have several things we need to know. Is the test reliable--will it give a consistent score regardless of when it is administered and what form we administer? A set of test items, without information about what the scores mean has little overall meaning. How many questions a student answers correctly (or the percentage correct) is affected not only by the test taker's ability, but by the difficulty (and other characteristics) of the test. So,consider for a moment that we are giving a test covering fourth grade math skills. If we have very hard questions, the student will likely answer few questions correctly; if we have very easy questions, the student will likely answer many of the questions correctly-- regardless of the student's ability! The number or percentage of questions the student answered correctly is not an effective way to determine the student's ability. Raw scores are simply not a very good indicator of the student's skill level

Without proper statistical analysis, we have little understanding of what a given score on the test means. A thorough psychometric analysis can help you understand the "true" meaning of these raw scores in relation to the underlying content area (construct) you are measuring. Using the proper statistical approach can let you understand the

There is much more to constructing a good test. This is just a small look at why there is a big difference between a pile of items and an effective assessment. Working with SEG, or other qualified testing organization can help you make sure that your tests are of high quality.

Let us know if we can help you gather the evidence you need to help convince your customers that your product is effective by emailing us at info@segmeasurement.com.

Measurement Moment

Its painful!...but how painful

The most familiar examples of measurement lie in the physical world, such as weight, distance, and the like. And, our work in education and assessment puts us in touch with the measurement of cognitive phenomena, such as knoweldge and attitudes. But, there is so much more important areas of measurement that affect our lives.

One area of measurement that we all experience, but few of us think of in terms of measurement, is the measurement of pain--identifying how much pain a given person has. Why is this important? Knowing a persons pain level can help medical professionals diagnose what is wrong with the person, and more specifically, determine the best way to treat the pain.

The problem is that pain is throught to be a subjective judgment. While there are some physical symptoms, it is widely believed that a subjective interpretation of pain level is the best way to assess pain. Chances are that at some point-in a medical office or hospital, you have been asked to rate the amount of pain you are experiencing. The most common approach to this is to ask the patient to rate how much pain they are in on a scale from 1-10, with 1 representing little pain and 10 rpresenting severe pain. This is often accompanied by a set of 10 line drawings of faces showing progressively more pain in the expression (Much like the happy and sad face line drawings we have all seen).

I am certainly not an expert on pain (although I have had and given my share!). But, even a cursory analysis and a quick look at the research in this area suggests that this approach may not be shedding as much light on a persons pain as we would like. Because this is treated as a subjective phenomenon, many seem content to just say its not precisely measurable, in the traditional sense. The problem comes down to the traditional issues of validity and reliability. If my concept of pain and yours is so different, than what does a pain level of 3, 5 or 7 mean--this is a validity problem. And there is little that leads us to believe that the actual pain experienced will be consistently reported by the same person at different times or by two people that are actually experiencing the same level of pain.

I agree that this is challenging and we are not likely to see the level of precsion we often see in the physical world (e.g., measuring temperature), But, I think we can do better. Somewhere between throwing up our hands and saying its subjective and the precision of measurement in the physical world, lies a better solution. Perhaps we could learn more by applying some of what we have learned about measurement in psychology and education to develop more precise measures and do a better job with helping patients with their pain management.

About SEG Measurement

SEG Measurement conducts technically sound product efficacy research for educational publishers, technology providers, government agencies and other educational organizations, and helps organizations build better assessments. We have been meeting the research and assessment needs of organizations since 1979. SEG Measurement is located in New Hope, Pennsylvania and can be accessed on the web at

www.segmeasurement.com.

SEG At Upcoming Conferences

Let's Meet!

We were pleased to see many of our colleagues at Head Start's 12th National Research Conference. Interest in educational products and services at the district and state level are increasing and we are seeing a wave of new innovations. We look forward to seeing you at the upcoming conferences we will be attending.

If you would like to meet with a representative from SEG Measurement to discuss how we might help you with your assessment and research needs, please contact us at info@segmeasurement.com.

Newsletter:
- 2014