assessment, Be the change, testing

Dear STAR Test, We Need to Talk Again…

Three years ago, almost to this date, I wrote my first blog post about the STAR test, a test sold by Renaissance Learning and employed in thousands of districts across the United States. That post started a discussion with the people behind STAR and while I wish I could say that it created change, isn’t that after all what we always hope for, it didn’t. Three years later, on the eve of my final STAR reading test of the year, I return to those same questions, once again hoping for some clarity, some light to be shed on how this test can be sold as a valid assessment tool.

Because, dear STAR test, it just doesn’t seem like you have evolved much from when we first started together. That in the three years since I last wrote to you hoping for some answers, that you have changed much. I guess, I could count your fancy new interface as change, but really all that has done is cause me to spend more time searching for the things I need in order to try to figure out what my students’ results supposedly are and what they may mean. But the essence of you, a comprehensive reading test that will quickly give me an elaborate understanding of 46 reading skills in 11 different domains remains the same. And much like so many of your cousins, all of the other computer tests who are supposed to be useful in our instruction, I keep feeling like I get the short end of the stick. Like a fool when I tell my students to show off their knowledge, to prove to the computer what we already know; just how much they have grown, just how much stronger they are.

Because according to the tests today, I have pretty much made all of my students worse readers than when they started. Or amazing super readers whose results are so incredible I want to cry tears of joy. It happens every year it seems. That the computer test tells us that they exploded, or that they didn’t grow or in fact reversed their abilities, but the face-to-face tests tell us a different story. The conversations we assess in their book clubs that show deep critical analysis and understanding. The written depth of their knowledge as they explore what it means to think about others’ stories and how it may affect them. How we see them share books, read books, recommend books.

And so that old letter stands the test of time, which is why I am reposting it, because honestly, now three years later into this relationship, I am still wondering why I bother. Why I get my hopes up for reliable, useable date? Why I tell my students to try their hardest? Why we take the time to try to do it right? Because I want to believe in you, STAR, I really do, but at this point, I am just not sure you are worth my time.

So Dear STAR test, we need to talk…again

We first met five years ago, I was fresh out of a relationship with MAP, that stalwart older brother of yours that had taken up hours of my 5th graders time.  They took their time and the results were ok; sometimes, at least we thought so but we were not sure.  But oh the time MAP and I spent together that could have been used for so many better things.

So when I heard about you, STAR, and how you would give me 46 reading skills in 11 different domains in just 30 or so questions, I was intrigued.  After all, 34 timed questions meant that most of my students would spend about 20 or so minutes with you.  You promised me flexibility and adaptation to my students with your fancy language where you said you “…combine computer-adaptive technology with a specialized psychometric test design.”  While I am not totally sure what psychometric means, I was always a sucker for fancy words.   Game on.

With your fast-paced questions, I thought of all the time we would save.  After all, tests should be quick and painless so we can get on with things, right?  Except giving my students only 90 seconds to read a question and answer it correctly meant they got awfully good at skimming, skipping lines, and in general being more worried about timing out than being able to read the whole text.

In fact, every year I have a child in tears who tell me that the timer popped up when they were still reading, that their anxiety is peeking because of that timer.  (Fun fact, if a child times out of a question it is treated as incorrect).  For vocabulary, all they get is 45 seconds because either they know it or they don’t, never mind that some of my kids try to sound words out and double-check their answer all within those precious seconds, just like I have taught them to do.  I watched in horror as students’ anxiety grew.  In fact, your 90 second time limit on reading passages meant that students started to believe that being a great reader was all about speed.  Nevermind, that Thomas Newkirk’s research into reading pace tells us that we should strive for a comfortable pace and not a fast one.  So yes, being a slow reader= bad reader.  

And sure, we could just turn the time off except that is not a decision I am allowed to make as an educator because that is a power given to the administration level, not the individual. On a larger scale, the fact that the product even comes with a time limit should be debated further; what does time have to do with reading comprehension and vocabulary knowledge besides the selling point of being able to administer it quickly or as you say “there are time limits for individual items intended to keep the test moving and maintain test security?” What does that do to bolster the validity of our test? How is that supported by best practice?

And so for some reason, year after year, I keep hoping that this will be the year where the data will truly be useful. Where I will gain knowledge that I can use to shape my teaching, isn’t that, after all, what the whole purpose of collecting data on our students is? But much like previous years, the results are a kaleidoscope of fragmented stories that refuse to fit together into a valid picture.  Students whose scores dropped 4 grade levels and students whose scores jumped 4 grade levels.  Students who made no growth at all.  Once again, I spend the day questioning my capabilities as a teacher because I don’t know what to take credit for.  Is it possible that I am the worst teacher ever to have taught 7th grade ELA or perhaps the best?  You confuse me, STAR, on so many occasions.  

As in previous year, students whose score differences are significant sometimes get to re-test, after all, perhaps they are just having a bad day?  And sure, sometimes they have gone up more than 250 points, all in the span of 24 hours, but other times they have dropped that amount as well.  That is a lot of unmotivated or “bad day” students apparently.   And yet, you tell me that your scores are reliable, and you’re not alone, many studies say you are too, yet that is simply not what we see every day in our classroom.  Although, this study (sponsored by you_did point out that you are most reliable between 1st and 4th grade, so where does that leave my 7th graders?

And last time I dug around your reports, I found that according to your own research at the 7th-grade reading level you only got a score of 0.73 retest reliability which you say is really good but to me doesn’t sound that way (page 54) 0.73 – shouldn’t it be closer to 1.0? If we look at the Cronbach’s Alpha Reliability that is only acceptable. And I guess that’s what I keep coming back to. Is your reliability simply measured as compared to other tests who are also problematic in their assessment methods and who we also know do not give us overly valid results?   Who knows, you would need a math degree to dig through your technical manual to make sense of all of the numbers.

Yet through all of this, you have dazzled me with your data, even know when I dig into your research I keep getting tripped up in your promises of reliable test scores, of comparable rest results, of scores that mean something, but what it is they actually mean, I am not quite sure of.  With all of the reports that I could print out and pour over.  Perhaps you were not accurate for all of my students, but certainly, you had to be for some.  It wasn’t until a long night spent pondering why some of my students’ scores were so low that I realized that in your 0.73 reliability lies my 0.27 insecurity.  After all, who are those kids whose scores are not reliable?   I could certainly guess but the whole point of having an accurate assessment means that I shouldn’t have to.  So it doesn’t feel like you are keeping up your end of the deal anymore, STAR test.  In fact, I am pretty sure that my own child will never make your acquaintance, at least not if we, her parents, have anything to say about it.

So dear STAR test, I love data as much as the next person.  I love reliable, accurate data that doesn’t stress my students out.  That doesn’t make them really quiet when they realize that perhaps they didn’t make the growth.  I love data that I can rely on and it turns out STAR, I just don’t think you fit that description, despite the efforts of those who take you.  Perhaps I should have realized that sooner when I saw your familial relationship with Accelerated Reader.  Don’t even get me started on that killer of reading joy.  You even mention it yourself in your technical manual that there may be measurements errors.  You said,  Measurement error causes students’ scores to fluctuate around their “true scores”. About half of all observed scores are smaller than the students’ true scores; the result is that some students’ capabilities are underestimated to some extent.”  Granted it wasn’t until page 81.  So you can wow me with all of your data reports.  With all of your breakdowns and your fancy graphs.  You can even try to woo me with your trend scores, your anticipated rate of growth and your national percentile rankings.  Your comparability scores to other state testing. But it is not enough, because none of that matters if I can’t count on you to provide me with accurate results. It doesn’t matter if I can’t trust what you tell me about my students.

So I wish  I could break up with you, but it seems we have been matched for the long run for now.  All I can be thankful for is that I work for a district that sees my students for more than just one test, for more than just their points because does anyone actually know what those points mean?  I can be so thankful that I work in a district that encourages us to use STAR as only one piece of the data puzzle, that chooses to see beyond it so we can actually figure out a child’s needs.   But I know I am lucky, not everyone that is with you has that same environment. So dear STAR, I wish you actually lived up to all of your fancy promises, but from this tired educator to you; it turns out I don’t need you to see if my students are reading better because I can just ask them, watch them, and see them grow as they pick up more and more books.  So that’s what I plan on doing rather than staring at your reports, because in the end, it’s not really you, it’s me.  I am only sorry it took me so long to realize it.

Best,

Pernille

PS: In case it needs to be spelled out, this post does not reflect the official view of my employer.

9 thoughts on “Dear STAR Test, We Need to Talk Again…”

  1. Oh my gosh – were you reading the minds of my colleagues and I today? We were literally having these same frustrated conversations about STAR – our students aren’t growing, the test isn’t reliable, the report interface is now a disaster, and the list goes on and on. I am so glad we aren’t the only ones who are feeling this way. Thank you for vocalizing what we have been feeling and complaining to them about for the last year.

  2. Of course i love data! I love KNOWING my stydenrs have progressed. Why oh wht do i contunue to allow the results to demoralize me and bring me to tears?

    Ugh! This test is ridiculous!!!!

  3. Hi! STAR was used for our at risk (Title I) and students with reading goals in their IEPs as a progress monitoring tool. I could have written your latest post. Granted I could accommodate the students by turning off the timer (allowing for accommodation of extended time) but that was one small fix to the bigger concerns I had with STAR. I am glad to say as of this week my students took their final STAR benchmark as my district is sunsetting STAR. It did not meet our needs but more importantly the needs of our students. Good Luck!

  4. Thank you Pernille! Your blog is like the “Room of Requirement”. Always has what wee need, just when we need it!

  5. So well-written and We’ll States! Wish I could send this to my administration and fellow educators who totally trust and rely on STAR scores!

  6. I’m a special education/English teacher for 7th-8th graders. We use STAR to test our lowest kids to see if they are making progress. I often have to explain to parents at IEP meetings why these scores bounce up and down. My biggest problem is that most of my “low” kids are dyslexic. The are supposed to have text-to-speech accommodations, yet STAR does not offer TTS. In fact, STAR says that if a test is read to the student, it would be invalidated.

Leave a comment