Dear STAR Test, We Need to Talk

Dear STAR,

We first met two years ago, I was fresh out of a relationship with MAP, that stalwart older brother of yours that had taken up hours of my 5th graders time.  They took their time and the results were ok; sometimes, at least we thought so but we were not sure.  But oh the time MAP and I spent together that could have been used for so many better things.

So when I heard about you, STAR, and how you would give me 46 reading skills in 11 different domains in  just 30 or so questions, I was intrigued.  After all, 34 timed questions meant that most of my students would spend about 15 or so minutes with you.  You promised me flexibility and adaptation to my students with your fancy language where you said you “…combine computer-adaptive technology with a specialized psychometric test design.”  While I am not totally sure what psychometric means, I was always a sucker for fancy words.   Game on.

With your fast-paced questions, I thought of all the time we would save.  After all, tests should be quick and painless so we can get on with things, right?  Except giving my students only 120 seconds to read a question and answer it correctly meant they got awfully good at skimming, skipping lines, and in general being more worried about timing out than being able to read the whole text.  (Fun fact, a fellow teacher timed out of most of her questions when she took the test in training and still received above 11th grade level).  For vocabulary all they get is 60 seconds because either they know it or they don’t, never mind that some of my kids try to sound words out and double-check their answer all within those precious seconds, just like I have taught them to do.  I watched in horror as students’ anxiety grew.  In fact, your 120 second time limit on reading passages meant that students started to believe that being a great reader was all about speed.  Nevermind that Thomas Newkirk’s research into reading pace tells us that we should strive for a comfortable pace and not a fast one.  So yes, being a slow reader= bad reader.  Thanks STAR.

And yet, maybe it was just my first year with you.  After all, we all have growing pains.  But this year, it didn’t get better, it just got worse.  Students whose scores dropped 4 grade levels and students whose scores jumped 4 grade levels.  Or how about those that made no growth at all.  I didn’t know what to take credit for.  Was it possible that I was the worst teacher ever to have taught 7th grade ELA or perhaps the best?  I know my You confused me, STAR, on so many occasions.  So when students significantly dropped, they sometimes got to re-test, after all, perhaps they were just having a bad day?  And sure, sometimes they went up more than 250 points, all in the span of 24 hours, but other times they dropped that amount as well.  That is a lot of unmotivated or “bad day” students apparently.   And yet, you tell me that your scores are reliable.  Yet, I guess they aren’t always, after all, at the 7th grade reading level you only got a score of .82 retest reliability which you say is really good but to me doesn’t sound that way.  0.82 – shouldn’t it be closer to 1.0?  In fact, when your company compared you to other recognized standardized tests it dropped to 0.70 for 7th grade, but perhaps it was because of the small sampling size, just 3, 924 students?  Who knows? I suppose I could email you to ask for more updated results like it says in the very small footnote.

Yet through all of this, you have dazzled me with your data.  With all of the reports that I could print out and pour over.  Perhaps you were not accurate for all of my students, but certainly you had to be for some.  It wasn’t until a long night spent pondering why some of my students’ scores were so low that I realized that in your 0.81 reliability lies my 0.19 insecurity.  After all, who are those kids whose scores are not reliable?   I could certainly guess but the whole point of having an accurate assessment means that I shouldn’t have to.  So it doesn’t feel like you are keeping up your end of the deal anymore, STAR test.  In fact, I am pretty sure that my own child will never make your acquaintance, at least not if we, her parents, have anything so say about it.

So dear STAR test, I love data as much as the next person.  I love reliable, accurate data that doesn’t stress my students out.  That doesn’t make them really quiet when they realize that perhaps they didn’t make the growth.  I love data that I can rely on and it turns out STAR, I just don’t think you fit that description.  Perhaps I should have realized that sooner when I saw your familial relationship with Accelerated Reader.  Don’t even get me started on that killer of reading joy.  You even mention it yourself in your technical manual that there may be measurements errors.  You said,  Measurement error causes students’ scores to fluctuate around their “true scores”. About half of all observed scores are smaller than the students’ true scores; the result is that some students’ capabilities are underestimated to some extent.”  Granted it wasn’t until page 81.  So you can wow me with all of your data reports.  With all of your breakdowns and your fancy graphs.  You can even try to woo me with your trend scores, your anticipated rate of growth and your national percentile rankings.  But it is not enough, because none of that matters if I can’t count on you to provide me with accurate results. It doesn’t matter if I can’t trust what you tell me about my students.

So I wish  I could break up with you, but it seems we have been matched for the long run for now.  All I can be thankful for is that I work for a district that sees my students for more than just one test, for more than just their points because does anyone actually know what those points mean?  I can be so thankful that I work in a district that encourages us to use STAR as only one piece of the data puzzle, that chooses to see beyond it so we can actually figure out a child’s needs.   But I know I am lucky, not everyone that is with you has that same environment. So dear STAR, I wish you actually lived up to all of your fancy promises, but from this tired educator to you; it turns out I don’t need you to see if my students are reading better because I can just ask them, watch them, and see them grow as they pick up more and more books.  So that’s what I plan on doing rather than staring at your reports, because in the end, it’s not really you, it’s me.  I am only sorry it took me so long to realize it.

Best,

Pernille

PS:  I am grateful that Renaissance Learning did reach out to me to discuss my post, here is their response:

Renaissance Learning is deeply committed to teacher success in the classroom. I am the STAR Product Marketer and read your blog regarding our product. I welcome the opportunity to talk with you about your concerns and help you get the best experiences with Renaissance!

I captured two primary issues from your blog:

  1. STAR Reading Time Limits
  2. Reliability

STAR Reading Time Limits

I wanted to make sure you know that you can set an extended time preference in the software to help reduce students’ test anxiety and frustration. The instructions for doing so are on page 217 in our STAR Reading software manual.

On page 12 of our STAR Reading technical manual there’s an overview of testing time by grade that illustrates guidance for timing. This information can be used to assess what is the best time limits for your students (based on analysis of testing conducted in the fall of 2011).

Reliability

Reliability is a far more complex topic. There are three things to look at when discussing this topic: Reliability, Validity and Standard Error of Measurement (SEM).

Reliability is the extent to which a test yields consistent results from one test administration to another. Validity is the degree to which it measures what it is intended to measure and is often used to judge a test’s effectiveness. Standard error of measurement (SEM) measures the precision of a test score. It provides a means to gauge the extent to which scores would be expected to fluctuate because of imperfect reliability, which is a characteristic of all educational tests. These elements are described in detail in the Understanding Why STAR Test Scores Fluctuate.

STAR assessments have been independently reviewed and certified by the National Center on Response to Intervention www.rti4success.org and the National Center on Intensive Intervention http://www.intensiveintervention.org  and received high ratings as a screening and progress monitoring tool based on the criteria set forth to meet exceptional standards.

And my response:

Thank you for your response; the time limit is not something decided by me but by my district, but the fact that the product even comes with one should be debated further; what does time have to do with reading comprehension and vocabulary knowledge besides the selling point of being able to administer it quickly?
 The next point is the reliability; you seem to have missed the major point of the post, which is that when we do not know which child’s scores are reliable or not, then it becomes very hard to use the test for anything.  While I have read the document you linked again (I had read it before the post) it doesn’t yield any new information.   In fact,  it appears that teachers are expected to either assume it is because of something going on with the child or a measurement error.  The reliability for 7th grade as reported by STAR itself is 0.70 as referenced on page 25 in this manual.  According to the technical manual the SEM reported on page 41 in table 12 it is 71.74 for 7th grade.  That is incredibly high error measurement when it comes to kids’ scores, and yet that wouldn’t cover the fluctuation that we see in many students.
While I appreciate your response, I stand by the post; it is a travesty that teachers are being evaluated based upon tests like this, particularly when they are meant to be a diagnostic tool.  And while scores are probably accurate for some students it is hard to figure out who they are accurate for and who they are not.  My only wish for the future is that the test is either more accurate or somehow allows us to better decide which children’s scores are accurate.

17 thoughts on “Dear STAR Test, We Need to Talk

  1. Every time I read your blog, you have zeroed in on exactly what I am thinking and stated it in more eloquent terms than I ever could. You’ve done it again; I woke up 2 hours early, today, stressing about my kiddos’ STAR scores. Thank you for letting me know, time & time again, I am not alone. Or crazy. You are awesome, and I thank you for all you do!

  2. Four weeks into state testing (language arts and math), we started giving students the EZCBM test during advisory time (language arts–they’ll still have to do math later). We’ve been told we’ll see the state test results next October, or possibly in January, 2017. So the one thing I can say for STAR is that if you’re going to force me to give a test, go ahead and have me give a one day test with instant results. I wish ANY of this gave me information I could use. Or that I could refuse to give the tests without losing my job.

  3. Wow…you hit the nail right on the head…I couldn’t agree more with your thoughts regarding STAR……I loath the idea that my students must take this ridiculous test 3 times a year for ” benchmarking purposes…..it is such a waste of valuable time for my students and for me as I go over the results and wonder why so many have dropped..and how others show such large increases…I’ll never figure it out! Here is what I know…I have created a classroom of
    children who LOVE to pick up a book and read it for the shear pleasure of reading a book and enjoying a good story….for that I am thankful and proud to be called their teacher!

  4. My favorite part of the STAR results is when they identify a student’s Zone of Proximal Development (ZPD). For some students the ZPD can span 3-4 grade levels. For several reasons I don’t think that was what Vygotsky had in mind when he theorized ZPD. STAR gives a whole knew meaning to Vygotsky’s concept.

  5. My “star” student, a “slow” reader, said she didn’t have enough time to read, half of the time she just guessed. I am a bad reader to. Like mother like daughter. Good thing we don’t base our value on tests.

  6. 1. You can turn off time limits
    2. If scores change radically overnight, maybe your students aren’t always giving 100% effort
    3. As teachers, we are always supposed to pair qualitative (our eyes and ears) and quantitative data to make decisions. At least that’s what my principal says!

    • Thank you for your comment, unfortunately, I do not have the power to turn off the time or I would. Also, sure, some students don’t give it their best but many do. I agree with your principal but know many who are not in the same situation.

      • Interesting contradiction on the time limits. Could that be technical issues? Like different software version in use, or the way the admin installed the software?

  7. I love STAR and the data it provides. In my opinion, the Instructional Planning report is key for a classroom teacher. I have have never used a program that tells me the skills (not standards) a student needs to learn next. We especially love SGP so we can monitor growth of every child. As the person above stated, you can turn off or extend time limits for the assessments. And we use multiple types of data in our school to make ANY decisions related to students.

  8. I thank you for bringing attention to STAR. We are in our second year and I absolutely LOVE the concept of a short test and all of the amazing data! That said, I continue to be in a state of confusion as to the best way to use the data provided and
    what pieces pinpoint comprehension to assist our struggling readers. I wanted this test to be successful yet still have many questions.

  9. One of the aspects of living in Wisconsin Rapids that I will not miss (I have a new position in the SW part of WI) is working within walking distance of the headquarters of Renaissance Learning. I have also questioned their approaches to reading assessment, only to be ignored or provided a vague response that doesn’t address the inquiry. It’s a tough situation to be in as many community members and parents are employed by this computer company, maker of Accelerated Reader and STAR.

    So good of you Pernille to question this tool and allow Renaissance to respond. What they had to share in response really doesn’t address the fundamental problems with their program. For example, while a .70 is considered acceptable regarding reliability for a program, it is the lowest rate. In more simplistic terms, this means that for every ten students that take this assessment, seven (70%) students will find their results to be reliable. This is too low. You are absolutely correct in your conclusions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s