assessment, being a teacher, being me, testing

Dear STAR Test, We Need to Talk

Dear STAR,

We first met two years ago, I was fresh out of a relationship with MAP, that stalwart older brother of yours that had taken up hours of my 5th graders time.  They took their time and the results were ok; sometimes, at least we thought so but we were not sure.  But oh the time MAP and I spent together that could have been used for so many better things.

So when I heard about you, STAR, and how you would give me 46 reading skills in 11 different domains in just 30 or so questions, I was intrigued.  After all, 34 timed questions meant that most of my students would spend about 15 or so minutes with you.  You promised me flexibility and adaptation to my students with your fancy language where you said you “…combine computer-adaptive technology with a specialized psychometric test design.”  While I am not totally sure what psychometric means, I was always a sucker for fancy words.   Game on.

With your fast-paced questions, I thought of all the time we would save.  After all, tests should be quick and painless so we can get on with things, right?  Except giving my students only 120 seconds to read a question and answer it correctly meant they got awfully good at skimming, skipping lines, and in general being more worried about timing out than being able to read the whole text.  (Fun fact, a fellow teacher timed out of most of her questions when she took the test in training and still received above 11th-grade level).  For vocabulary, all they get is 60 seconds because either they know it or they don’t, never mind that some of my kids try to sound words out and double-check their answer all within those precious seconds, just like I have taught them to do.  I watched in horror as students’ anxiety grew.  In fact, your 120 second time limit on reading passages meant that students started to believe that being a great reader was all about speed.  Nevermind that Thomas Newkirk’s research into reading pace tells us that we should strive for a comfortable pace and not a fast one.  So yes, being a slow reader= bad reader.  Thanks, STAR.

And yet, maybe it was just my first year with you.  After all, we all have growing pains.  But this year, it didn’t get better, it just got worse.  Students whose scores dropped 4 grade levels and students whose scores jumped 4 grade levels.  Or how about those that made no growth at all.  I didn’t know what to take credit for.  Was it possible that I was the worst teacher ever to have taught 7th grade ELA or perhaps the best?  You confused me, STAR, on so many occasions.  So when students significantly dropped, they sometimes got to re-test, after all, perhaps they were just having a bad day?  And sure, sometimes they went up more than 250 points, all in the span of 24 hours, but other times they dropped that amount as well.  That is a lot of unmotivated or “bad day” students apparently.   And yet, you tell me that your scores are reliable.  Yet, I guess they aren’t always, after all, at the 7th-grade reading level you only got a score of .82 retest reliability which you say is really good but to me doesn’t sound that way.  0.82 – shouldn’t it be closer to 1.0?  In fact, when your company compared you to other recognized standardized tests it dropped to 0.70 for 7th grade, but perhaps it was because of the small sampling size, just 3, 924 students?  Who knows? I suppose I could email you to ask for more updated results like it says in the very small footnote.

Yet through all of this, you have dazzled me with your data.  With all of the reports that I could print out and pour over.  Perhaps you were not accurate for all of my students, but certainly, you had to be for some.  It wasn’t until a long night spent pondering why some of my students’ scores were so low that I realized that in your 0.81 reliability lies my 0.19 insecurity.  After all, who are those kids whose scores are not reliable?   I could certainly guess but the whole point of having an accurate assessment means that I shouldn’t have to.  So it doesn’t feel like you are keeping up your end of the deal anymore, STAR test.  In fact, I am pretty sure that my own child will never make your acquaintance, at least not if we, her parents, have anything to say about it.

So dear STAR test, I love data as much as the next person.  I love reliable, accurate data that doesn’t stress my students out.  That doesn’t make them really quiet when they realize that perhaps they didn’t make the growth.  I love data that I can rely on and it turns out STAR, I just don’t think you fit that description.  Perhaps I should have realized that sooner when I saw your familial relationship with Accelerated Reader.  Don’t even get me started on that killer of reading joy.  You even mention it yourself in your technical manual that there may be measurements errors.  You said,  Measurement error causes students’ scores to fluctuate around their “true scores”. About half of all observed scores are smaller than the students’ true scores; the result is that some students’ capabilities are underestimated to some extent.”  Granted it wasn’t until page 81.  So you can wow me with all of your data reports.  With all of your breakdowns and your fancy graphs.  You can even try to woo me with your trend scores, your anticipated rate of growth and your national percentile rankings.  But it is not enough, because none of that matters if I can’t count on you to provide me with accurate results. It doesn’t matter if I can’t trust what you tell me about my students.

So I wish  I could break up with you, but it seems we have been matched for the long run for now.  All I can be thankful for is that I work for a district that sees my students for more than just one test, for more than just their points because does anyone actually know what those points mean?  I can be so thankful that I work in a district that encourages us to use STAR as only one piece of the data puzzle, that chooses to see beyond it so we can actually figure out a child’s needs.   But I know I am lucky, not everyone that is with you has that same environment. So dear STAR, I wish you actually lived up to all of your fancy promises, but from this tired educator to you; it turns out I don’t need you to see if my students are reading better because I can just ask them, watch them, and see them grow as they pick up more and more books.  So that’s what I plan on doing rather than staring at your reports, because in the end, it’s not really you, it’s me.  I am only sorry it took me so long to realize it.



PS:  I am grateful that Renaissance Learning did reach out to me to discuss my post, here is their response:

Renaissance Learning is deeply committed to teacher success in the classroom. I am the STAR Product Marketer and read your blog regarding our product. I welcome the opportunity to talk with you about your concerns and help you get the best experiences with Renaissance!

I captured two primary issues from your blog:

  1. STAR Reading Time Limits
  2. Reliability

STAR Reading Time Limits

I wanted to make sure you know that you can set an extended time preference in the software to help reduce students’ test anxiety and frustration. The instructions for doing so are on page 217 in our STAR Reading software manual.

On page 12 of our STAR Reading technical manual there’s an overview of testing time by grade that illustrates guidance for timing. This information can be used to assess what is the best time limits for your students (based on analysis of testing conducted in the fall of 2011).


Reliability is a far more complex topic. There are three things to look at when discussing this topic: Reliability, Validity and Standard Error of Measurement (SEM).

Reliability is the extent to which a test yields consistent results from one test administration to another. Validity is the degree to which it measures what it is intended to measure and is often used to judge a test’s effectiveness. Standard error of measurement (SEM) measures the precision of a test score. It provides a means to gauge the extent to which scores would be expected to fluctuate because of imperfect reliability, which is a characteristic of all educational tests. These elements are described in detail in the Understanding Why STAR Test Scores Fluctuate.

STAR assessments have been independently reviewed and certified by the National Center on Response to Intervention and the National Center on Intensive Intervention  and received high ratings as a screening and progress monitoring tool based on the criteria set forth to meet exceptional standards.

And my response:

Thank you for your response; the time limit is not something decided by me but by my district, but the fact that the product even comes with one should be debated further; what does time have to do with reading comprehension and vocabulary knowledge besides the selling point of being able to administer it quickly?
 The next point is the reliability; you seem to have missed the major point of the post, which is that when we do not know which child’s scores are reliable or not, then it becomes very hard to use the test for anything.  While I have read the document you linked again (I had read it before the post) it doesn’t yield any new information.   In fact,  it appears that teachers are expected to either assume it is because of something going on with the child or a measurement error.  The reliability for 7th grade as reported by STAR itself is 0.70 as referenced on page 25 in this manual.  According to the technical manual the SEM reported on page 41 in table 12 it is 71.74 for 7th grade.  That is incredibly high error measurement when it comes to kids’ scores, and yet that wouldn’t cover the fluctuation that we see in many students.
While I appreciate your response, I stand by the post; it is a travesty that teachers are being evaluated based upon tests like this, particularly when they are meant to be a diagnostic tool.  And while scores are probably accurate for some students it is hard to figure out who they are accurate for and who they are not.  My only wish for the future is that the test is either more accurate or somehow allows us to better decide which children’s scores are accurate.

39 thoughts on “Dear STAR Test, We Need to Talk”

  1. Every time I read your blog, you have zeroed in on exactly what I am thinking and stated it in more eloquent terms than I ever could. You’ve done it again; I woke up 2 hours early, today, stressing about my kiddos’ STAR scores. Thank you for letting me know, time & time again, I am not alone. Or crazy. You are awesome, and I thank you for all you do!

    1. I appreciate your post so much!. My daughter, who is a second grader, took this test and did not do well. First thing she said about the test was that they had to finish everything really fast. I’m not sure how emphasizing such strict time limits on each question is accurate, especially with students who struggle with fluency. This time limit encourages test anxiety and a whole lot of clicking. In my own third grade classroom I have used MAPS testing and would much rather spend the time on this test to allow students to show what they know and get more accurate results.

  2. Four weeks into state testing (language arts and math), we started giving students the EZCBM test during advisory time (language arts–they’ll still have to do math later). We’ve been told we’ll see the state test results next October, or possibly in January, 2017. So the one thing I can say for STAR is that if you’re going to force me to give a test, go ahead and have me give a one day test with instant results. I wish ANY of this gave me information I could use. Or that I could refuse to give the tests without losing my job.

  3. When my students grew several levels, that was me. When they dropped several, that was the test. Right?

  4. Wow…you hit the nail right on the head…I couldn’t agree more with your thoughts regarding STAR……I loath the idea that my students must take this ridiculous test 3 times a year for ” benchmarking purposes… is such a waste of valuable time for my students and for me as I go over the results and wonder why so many have dropped..and how others show such large increases…I’ll never figure it out! Here is what I know…I have created a classroom of
    children who LOVE to pick up a book and read it for the shear pleasure of reading a book and enjoying a good story….for that I am thankful and proud to be called their teacher!

  5. My favorite part of the STAR results is when they identify a student’s Zone of Proximal Development (ZPD). For some students the ZPD can span 3-4 grade levels. For several reasons I don’t think that was what Vygotsky had in mind when he theorized ZPD. STAR gives a whole knew meaning to Vygotsky’s concept.

  6. My “star” student, a “slow” reader, said she didn’t have enough time to read, half of the time she just guessed. I am a bad reader to. Like mother like daughter. Good thing we don’t base our value on tests.

  7. 1. You can turn off time limits
    2. If scores change radically overnight, maybe your students aren’t always giving 100% effort
    3. As teachers, we are always supposed to pair qualitative (our eyes and ears) and quantitative data to make decisions. At least that’s what my principal says!

    1. Thank you for your comment, unfortunately, I do not have the power to turn off the time or I would. Also, sure, some students don’t give it their best but many do. I agree with your principal but know many who are not in the same situation.

      1. No, you cannot turn off time limits. You can only double the time limit. The time still exists.

      2. Interesting contradiction on the time limits. Could that be technical issues? Like different software version in use, or the way the admin installed the software?

    2. I would feel better about the Star test if it weren’t for the fact that I took it. I am an adult advanced degree holding fluent native English speaker who has no problems with time constraints and has always scored very highly on all reading tests. I am not bragging, I want to illustrate the effectiveness of this test. I took this test myself, on a good day, giving 100% effort- I was scored as 3/.6 reading level, in the 4% rank.
      Something is off here, on the test itself, the administration of the test.. something.

  8. I love STAR and the data it provides. In my opinion, the Instructional Planning report is key for a classroom teacher. I have have never used a program that tells me the skills (not standards) a student needs to learn next. We especially love SGP so we can monitor growth of every child. As the person above stated, you can turn off or extend time limits for the assessments. And we use multiple types of data in our school to make ANY decisions related to students.

  9. I thank you for bringing attention to STAR. We are in our second year and I absolutely LOVE the concept of a short test and all of the amazing data! That said, I continue to be in a state of confusion as to the best way to use the data provided and
    what pieces pinpoint comprehension to assist our struggling readers. I wanted this test to be successful yet still have many questions.

  10. One of the aspects of living in Wisconsin Rapids that I will not miss (I have a new position in the SW part of WI) is working within walking distance of the headquarters of Renaissance Learning. I have also questioned their approaches to reading assessment, only to be ignored or provided a vague response that doesn’t address the inquiry. It’s a tough situation to be in as many community members and parents are employed by this computer company, maker of Accelerated Reader and STAR.

    So good of you Pernille to question this tool and allow Renaissance to respond. What they had to share in response really doesn’t address the fundamental problems with their program. For example, while a .70 is considered acceptable regarding reliability for a program, it is the lowest rate. In more simplistic terms, this means that for every ten students that take this assessment, seven (70%) students will find their results to be reliable. This is too low. You are absolutely correct in your conclusions.

  11. From what I’ve read of the intent of STAR assessments, they are *only* supposed to be used for tracking growth of an individual student. Unfortunately, our son’s school district used it as a grade-level comparison and then denied him a grade skip when he was in Kindergarten (which he desperately needed) because of it. We ended up leaving the school district because of that decision.

  12. Thank you so much for your post. After my first experience using the Star Early Literacy with my kindergarteners I went to my friend Google to see if other teachers had similar concerns about validity and reliability. I too loved the idea of “quick” testing vs. stressful longer testing sessions. The problem is that the data I gathered from this assessment did not correlate at all with my classroom observations. I had very high students perform poorly and very low students receive inflated scores. Now that we are doing our second administration I am forced with trying to figure out how to explain to parents why this test says their child went backwards 100 points when their child has made great gains. Ironically the one student in my class who has not made much progress (and I’ve met with the parents about this)…had the highest growth. I’m frustrated. I’m frustrated that when my top reader scored very poorly in January, I was told I could retest and did so 10 minutes after the first administration and saw a difference of over 200 points….TWO HUNDRED POINTS…huh?!?! Something just doesn’t sit right with me .

  13. Totally agree with this post and your response to the email they sent you. I sit next to and carefully observe every student take their test and I also require them to read everything out loud to me. I administer the test one on one (which I know many teachers cannot do) But if there was a way for the teacher to select the questions that students completely guessed on versus the ones that they actually attempted, I think the assessment would be more accurate. Removing humans from the assessment equation can be detrimental and now we are seeing why. There should be a combination of computer tools and human capital.

  14. This is so true and I am being evaluated on this test. I had a student in first grade who did not know all his letters tested into the 67th percentile. I questioned this result and had him retested. Sure enough the second test showed he was in the 10th percentile. HOWEVER, the district refused to remove the original test score so when he went up from the 10th to the 25th percentile eight weeks later, it still showed negative growth from the 67th percentile. I was deemed not a good teacher and the child was deemed to be on the critical list and was subjected to having to go to other teachers’ classrooms for “remediation.” Nevermind that he was never exposed to books before he met me. What I was doing was not good enough ACCORDING TO THE TEST. I hate how administrators cling to these false representations of learning.

  15. I’m sitting here trying to decipher the many timed tests my dyslexic daughter must take to show she is behind in reading. Searching for some understanding on how they can actually get reliable data on her reading ability. The Star reading assessment just doesn’t do that. You give a child a test to measure if she can read on grade level, that will “dumb down” as the test goes on. Not to mention that after a minute of not answering it, it will move onto the next question. My daughter, like most dyslexics have an auditory processing disorder. They get stuck on a word, a word they have seen a hundred times and they can’t read it. I’ve sat thru more than a minute if her trying to sort out the word in her brain, to then just read the rest of the sentence with no problem. Except on these tests, it’s already onto to the next question. And let me not start with the count down clock to let her know the question is almost done while she is still trying to figure out a word.

  16. Hi, I am preparing a case study on top K-12 assessment products worldwide and want to understand the success story of Renaissance Learning, would want to connect with someone from the company to gain valuable insights. Please get me to the right person.

  17. I absolutley hate STAR test
    they always change the level of reading.
    if you get a question right it gets harder
    if you get one wrong, it gets easier

  18. Thank you for this! I am not a teacher, but a concerned mom.

    I was just informed that my 7 year old, who loves reading and has tested in the “gifted” range since preschool, is eligible for Title I services in second grade because she scored very low on the STAR reading test at the beginning of the school year. I was shocked and started questioning my daughter about whether she was really doing her best at school.

    I don’t know what happened on the test, but I feel awful that in my own panic over this test and having Title I recommended, I amped up my child’s anxiety about reading and took some of the joy out of reading for her.

  19. Thank you! It was so hard to find any info from real people (and many educators) on this test. My daughter scores in the single digits and despite at home practice, private tutors and a great improvement that we all see in her, she is a slow reader with somewhat low confidence and continues to score in the single digits. Placing her 2+ grade levels behind where she should be. I am going to try to ignore this test, advocate for more time for her to test take and let the school know my feelings on this. Thanks again!

  20. I am a 7th grader and got a 1334 which seems strangely high and thought it was a bit suspicions now seeing that it is not very accurate now makes sense because if I was really that proficient in reading I should be better at spelling I should have an A+ but I do not so it does not seem very accurate thank you for the insight.

  21. My 3rd grader has IEP. He has limited reading skills and is unable to read the star reading test proficiently with no accommodations.Yet he still has to take the test and become frustrated. Any advice?
    Thank you
    Cathy Pacheco

    1. I ran into this too and ended up at this blog after googling “STAR tests are horrible.” My child is NOT an independent reader and literally could not access the reading test as a result. I deliberately used those words with her teacher since it was clearly a disability access issue, as the test was trying to assess more than just my kid’s decoding skills. I understand why they could not have a reader were it testing their ability to decode and specifically read words, but the questions were comprehension-based and dealt with material that, independent reading aside, my kid WOULD be able to answer and answer well.

      The teacher ended up saying we could do the early literacy test instead — and my child got frustrated with the slowness of the directions and assumption that they could not operate a mouse/touchpad. Neither test was completely appropriate — the reading test was too reliant on independent reading to measure skills that WEREN’T independent reading, and the EL test failed to presume any competence in other areas.

      Meanwhile, my other, proficiently reading kid was given passages about business and accounting. My child is 8. They have no context for those passages which could have conveyed the same idea and tested the same skills outside of an overwhelmingly BORING context.

      I loathe these tests.

  22. I discovered this fall that the STAR domain scores that break the scaled scores down into separate areas are not based on any actual responses or data but instead are based on extrapolations from the scaled score. If you look at students with the same overall Scaled Score, their domain scores are identical. What’s the point then?

  23. Thank you for your posts and everyone for their reply. I too have a daughter in 7th Grade that took the STAR test for math and also didn’t do very well either, now knowing it’s a timed test explains so much. She has test anxiety already and to place a timer on each que a tion is just not appropriate and doesn’t measure whether they know the material. Math requires due process and if you are not allowed to go through the processes to complete the work thoroughly the results cannot be considered accurate in my opinion.

    Best regards,
    Anna Christenson

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s