For further reading

Sunday, February 8, 2015

PARCC Tests and Readability: A Close Look

I approach the subject of readability on the new PARCC tests with caution. Readability is the third rail for literacy specialists. While The Literacy Dictionary, defines readability as  "an objective estimate or prediction of reading comprehension of material in terms of grade level", such objectivity does not ensure accuracy. All sorts of formulas for estimating readability exist and all of them are both useful and inaccurate or misleading in some way.

As I said in a previous post here, readability is too complex to be captured by a mere number as the currently popular Lexile measures attempt to do, or by a grade level as other traditional formulas try to do. Readability is best understood as a dynamic between the characteristics of the reader, the characteristics of the text and the particular task that is being attempted. The only real way to know if a text is "readable" for a student is to sit down with a child, hand them a text and see how they do with reading and talking about it.

Of course, this is not practical in a mass testing environment, so we need to use readability tools to determine the difficulty of texts. To that end, spurred on by my Facebook friends Heidi Maria Brown, Darci Cimarusti and Ani McHugh of Opt Out of Standardized Tests-New Jersey, I have decided to take a close look at the PARCC sample test reading comprehension passages and try to assess their readability, and therefore, their appropriateness for a testing environment.

Since readability formulas are notably unreliable, I first decided to use several different readability measures to see if I could get a closer approximation of level. The measures I use are all commonly used in assessing readability. All of them use two variables, with slight variations, to determine readability: word length and sentence length. They vary slightly in the weights they give these variables and in how these variables are determined.

The readability formulas I used were the Fry Readability Graph (Fry), the Raygor Readability Graph (RR), the Flesch-Kincaid Readability Tests(FK), the Flesch Reading Ease test (FRE) and Lexile Framework for Reading.The Fry, Raygor and Flesch-Kincaid formulas yield a grade level readability estimate. The Flesch Reading Ease test provides an estimate of the "ease of reading" of a passage based on a child's age. Lexile measures are the preferred readability measure of the whole corporate education reform movement behind the Common Core and PA,RCC so it must be included here as well. According to the Lexile Framework website "Lexile measures are the gold standard for college and career readiness."

I have written about Lexile measures before here. The "chief architects" of the Common Core State Standards worked with the company that licenses the Lexile framework to realign the Lexile levels to raise the levels in every grade starting with grade 12 and stair stepping down to the early grades. You can find a comparison chart of the changes here. This was done, ostensibly, to ensure the college and career readiness of our graduating students. It was also done without any research base to back up the changes. Anyway, the Lexile Framework is the reformy "gold standard" so it is included here.

The Flesch Reading Ease test needs a bit of an explanation. Reading ease is estimated on a scale of 1 - 100. The higher the number, the easier the text is to read. Texts in the 90-100 range should be easily read and understood by an 11-year-old. Texts in the 60-70 range should be easily read and understood by a 13-15-year-old. Texts scoring in the 0-30 range are best understood by university graduates.

With that background here is what I found on the PARCC sample tests, using one reading sample from each grade level. In each case I took a 300+ word sample.

Grade 3 Passage: A Once in a Lifetime Experience by Sandra Beswethrick

  • Lexile  680 (3rd Grade range is 520 - 820)
  • FK        3.5 (grade level)
  • Fry       4.0 (grade level)
  • RR       3.3 (grade level)
  • FRE     87
Summary: By all measures the reading passage seems challenging, but appropriate for the upper levels of grade 3.

Grade 4 Passage: Just Like Home by Mathangi Subramanian
  • Lexile   1100 (4th Grade range is 740 - 940)
  • FK        6.1
  • Fry       6.3
  • RR       6.0
  • FRE     80.5
Summary: By all measures this reading passage seems inappropriate for assessment in grade 4.

Grade 5 Passage: from Moon Over Manifest by Clare Vanderpool
  • Lexile   950 (5th Grade range is 830 - 1010)
  • FK        7.0
  • Fry       6.9
  • RR       6.5
  • FRE     74
Summary: While the Lexile measure would indicate the text is appropriate for 5th grade, all other measures would indicate that this would be an extremely challenging text for 10 or 11-year-olds.

Grade 6 Passage: Emancipation: A Life Fable by Kate Chopin
  • Lexile   970 (6th Grade range is 925 - 1070)
    • FK        8.7
    • Fry       8.6
    • RR       8.2
    • FRE     73.6
    Summary: While the Lexile level would indicate the text is appropriate for 6th grade, all other measures indicate that this text will be very challenging for 11-12-year olds.

    Grade 7 Passage: from The Count of Monte Cristo by Alexandre Dumas
    • Lexile   1080 (7th Grade range is 970 - 1120)
    • FK        10.0
    • Fry        8.5
    • RR        8.5
      • FRE      64.6
      Summary: While the Lexile level would indicate the text is appropriate for upper level 7th graders, all other measures indicate that this text will be very challenging for 12-13-year olds.

      Grade 8 Passage: Elephants Can Lend a Helping Trunk by Virginia Morell
      • Lexile   1110 (8th Grade range is 1010 - 1185)
      • FK        10.6
      • Fry        10.4
      • RR        6.4
      • FRE      51.1
      Summary: The passage falls within the Lexile range for 8th grade. The Fry, Flesch (FK) and FRE all indicate that the passage would be more appropriate for 10th grade students or above. The Raygor (RR) score appears to be anomalous.

      Conclusions: The stated purpose of the Common Core State Standards and the aligned PARCC test was to "raise the bar" based on the notion that in order to be "college and career ready" students needed to be reading more complex text starting in their earliest school years. The PARCC sample tests show that they have certainly raised the bar when it comes to making reading comprehension passages quite difficult at every grade level. 

      These results clearly show that even by the altered Lexile level standard the 4th grade passage is much too difficult for 4th grade children. I would hope that the actual PARCC would not include any material remotely like this over-reaching level of challenge for children. I would hope, but the inclusion of this passage in the sample does not give me confidence.

      The other results show that the passages chosen are about two grade levels above the readability of the grade and age of the children by measures other than the Lexile level. The results of testing children on these passages will be quite predictable. Students will score lower on the tests than on previous tests. We have already seen this in New York where test scores plummeted when the new tests were given last year. English Language Learners (ELL) and students with disabilities will be particularly hard hit because these tests will prove extraordinarily difficult to them.

      What happens when students are asked to read very difficult text? For those students who find the text challenging, but doable, they will redouble their efforts to figure it out. For the majority of children, however, who find the text at their frustration level, they may well give up. That is what frustration level in reading means. The ideal reading comprehension assessment passage will be easy for some, just right for most and challenging for some. The PARCC passages are likely to be very, very challenging for most.

      What can schools and parents learn from these tests? These types of mass administered standardized tests have never been very good at giving teachers or parents actionable feedback that they can use to help students. The tests are best used to help educational leaders and classroom teachers spot trends in performance of a district or a school over time and to make programmatic adjustments. When more than 70% of students fail to reach proficiency on a test, as was the case in New York when this test was tried, the only possible conclusion is that the test was not appropriate. Many students gave up in frustration. There is no actionable feedback available.

      The results of the PARCC will no doubt feed into the education reform movement narrative that our kids, schools and teachers are failing. A cynic might think that this was deliberate. That this was a way to continue to discredit public school teachers, children and schools. If I wanted to advance this narrative, I would devise a test that arbitrarily raised the standards, provide some pseudo-science to make it appear reasonable, make sure students and teachers had limited time to adjust to the new testing standards and then broadcast the predictable results widely.

      As a parent considering whether or not I want my child to take this test, I would want to know what I am going to learn by having my child participate in something that will likely cause frustration and which will give me very limited information on how my child is doing. The reformers will tell us that these tests are a "civil rights" issue. That having kids take these yearly tests is the only way we can know if all children are being well served by the school. As a parent, I would want the reformers to show me that the students are being well served by the test. Now there is a civil rights issue.