Dear STAR,
We first met two years ago, I was fresh out of a relationship with MAP, that stalwart older brother of yours that had taken up hours of my 5th graders time. They took their time and the results were ok; sometimes, at least we thought so but we were not sure. But oh the time MAP and I spent together that could have been used for so many better things.
So when I heard about you, STAR, and how you would give me 46 reading skills in 11 different domains in just 30 or so questions, I was intrigued. After all, 34 timed questions meant that most of my students would spend about 15 or so minutes with you. You promised me flexibility and adaptation to my students with your fancy language where you said you “…combine computer-adaptive technology with a specialized psychometric test design.” While I am not totally sure what psychometric means, I was always a sucker for fancy words. Game on.
With your fast-paced questions, I thought of all the time we would save. After all, tests should be quick and painless so we can get on with things, right? Except giving my students only 120 seconds to read a question and answer it correctly meant they got awfully good at skimming, skipping lines, and in general being more worried about timing out than being able to read the whole text. (Fun fact, a fellow teacher timed out of most of her questions when she took the test in training and still received above 11th-grade level). For vocabulary, all they get is 60 seconds because either they know it or they don’t, never mind that some of my kids try to sound words out and double-check their answer all within those precious seconds, just like I have taught them to do. I watched in horror as students’ anxiety grew. In fact, your 120 second time limit on reading passages meant that students started to believe that being a great reader was all about speed. Nevermind that Thomas Newkirk’s research into reading pace tells us that we should strive for a comfortable pace and not a fast one. So yes, being a slow reader= bad reader. Thanks, STAR.
And yet, maybe it was just my first year with you. After all, we all have growing pains. But this year, it didn’t get better, it just got worse. Students whose scores dropped 4 grade levels and students whose scores jumped 4 grade levels. Or how about those that made no growth at all. I didn’t know what to take credit for. Was it possible that I was the worst teacher ever to have taught 7th grade ELA or perhaps the best? You confused me, STAR, on so many occasions. So when students significantly dropped, they sometimes got to re-test, after all, perhaps they were just having a bad day? And sure, sometimes they went up more than 250 points, all in the span of 24 hours, but other times they dropped that amount as well. That is a lot of unmotivated or “bad day” students apparently. And yet, you tell me that your scores are reliable. Yet, I guess they aren’t always, after all, at the 7th-grade reading level you only got a score of .82 retest reliability which you say is really good but to me doesn’t sound that way. 0.82 – shouldn’t it be closer to 1.0? In fact, when your company compared you to other recognized standardized tests it dropped to 0.70 for 7th grade, but perhaps it was because of the small sampling size, just 3, 924 students? Who knows? I suppose I could email you to ask for more updated results like it says in the very small footnote.
Yet through all of this, you have dazzled me with your data. With all of the reports that I could print out and pour over. Perhaps you were not accurate for all of my students, but certainly, you had to be for some. It wasn’t until a long night spent pondering why some of my students’ scores were so low that I realized that in your 0.81 reliability lies my 0.19 insecurity. After all, who are those kids whose scores are not reliable? I could certainly guess but the whole point of having an accurate assessment means that I shouldn’t have to. So it doesn’t feel like you are keeping up your end of the deal anymore, STAR test. In fact, I am pretty sure that my own child will never make your acquaintance, at least not if we, her parents, have anything to say about it.
So dear STAR test, I love data as much as the next person. I love reliable, accurate data that doesn’t stress my students out. That doesn’t make them really quiet when they realize that perhaps they didn’t make the growth. I love data that I can rely on and it turns out STAR, I just don’t think you fit that description. Perhaps I should have realized that sooner when I saw your familial relationship with Accelerated Reader. Don’t even get me started on that killer of reading joy. You even mention it yourself in your technical manual that there may be measurements errors. You said, Measurement error causes students’ scores to fluctuate around their “true scores”. About half of all observed scores are smaller than the students’ true scores; the result is that some students’ capabilities are underestimated to some extent.” Granted it wasn’t until page 81. So you can wow me with all of your data reports. With all of your breakdowns and your fancy graphs. You can even try to woo me with your trend scores, your anticipated rate of growth and your national percentile rankings. But it is not enough, because none of that matters if I can’t count on you to provide me with accurate results. It doesn’t matter if I can’t trust what you tell me about my students.
So I wish I could break up with you, but it seems we have been matched for the long run for now. All I can be thankful for is that I work for a district that sees my students for more than just one test, for more than just their points because does anyone actually know what those points mean? I can be so thankful that I work in a district that encourages us to use STAR as only one piece of the data puzzle, that chooses to see beyond it so we can actually figure out a child’s needs. But I know I am lucky, not everyone that is with you has that same environment. So dear STAR, I wish you actually lived up to all of your fancy promises, but from this tired educator to you; it turns out I don’t need you to see if my students are reading better because I can just ask them, watch them, and see them grow as they pick up more and more books. So that’s what I plan on doing rather than staring at your reports, because in the end, it’s not really you, it’s me. I am only sorry it took me so long to realize it.
Best,
Pernille
PS: I am grateful that Renaissance Learning did reach out to me to discuss my post, here is their response:
Renaissance Learning is deeply committed to teacher success in the classroom. I am the STAR Product Marketer and read your blog regarding our product. I welcome the opportunity to talk with you about your concerns and help you get the best experiences with Renaissance!
I captured two primary issues from your blog:
- STAR Reading Time Limits
- Reliability
STAR Reading Time Limits
I wanted to make sure you know that you can set an extended time preference in the software to help reduce students’ test anxiety and frustration. The instructions for doing so are on page 217 in our STAR Reading software manual.
On page 12 of our STAR Reading technical manual there’s an overview of testing time by grade that illustrates guidance for timing. This information can be used to assess what is the best time limits for your students (based on analysis of testing conducted in the fall of 2011).
Reliability
Reliability is a far more complex topic. There are three things to look at when discussing this topic: Reliability, Validity and Standard Error of Measurement (SEM).
Reliability is the extent to which a test yields consistent results from one test administration to another. Validity is the degree to which it measures what it is intended to measure and is often used to judge a test’s effectiveness. Standard error of measurement (SEM) measures the precision of a test score. It provides a means to gauge the extent to which scores would be expected to fluctuate because of imperfect reliability, which is a characteristic of all educational tests. These elements are described in detail in the Understanding Why STAR Test Scores Fluctuate.
STAR assessments have been independently reviewed and certified by the National Center on Response to Intervention www.rti4success.org and the National Center on Intensive Intervention http://www.intensiveintervention.org and received high ratings as a screening and progress monitoring tool based on the criteria set forth to meet exceptional standards.
And my response:
Thank you for your response; the time limit is not something decided by me but by my district, but the fact that the product even comes with one should be debated further; what does time have to do with reading comprehension and vocabulary knowledge besides the selling point of being able to administer it quickly?
The next point is the reliability; you seem to have missed the major point of the post, which is that when we do not know which child’s scores are reliable or not, then it becomes very hard to use the test for anything. While I have read the document you linked again (I had read it before the post) it doesn’t yield any new information. In fact, it appears that teachers are expected to either assume it is because of something going on with the child or a measurement error. The reliability for 7th grade as reported by STAR itself is 0.70 as referenced on page 25
in this manual. According to the technical manual the SEM reported on page 41 in table 12
it is 71.74 for 7th grade. That is incredibly high error measurement when it comes to kids’ scores, and yet that wouldn’t cover the fluctuation that we see in many students.
While I appreciate your response, I stand by the post; it is a travesty that teachers are being evaluated based upon tests like this, particularly when they are meant to be a diagnostic tool. And while scores are probably accurate for some students it is hard to figure out who they are accurate for and who they are not. My only wish for the future is that the test is either more accurate or somehow allows us to better decide which children’s scores are accurate.