Russ on Reading: February 2015

Wednesday, February 25, 2015

From Text Complexity to Considerate Text

The Common Core State Standards call for kids to read lots of complex nonfiction text so they can be "college and career ready." As Appendix A of the English Language Arts section of the Common Core rather breathlessly puts it,

[T]he clear, alarming picture that emerges from the evidence... is that while the reading demands of college, workforce training programs, and citizenship have held steady or risen over the past fifty years or so, K–12 texts have, if anything, become less demanding. This finding is the impetus behind the Standards’ strong emphasis on increasing text complexity as a key requirement in reading.

As I have discussed in previous posts here, here and here, this Common Core call for employing more complex texts has led to much confusion and inappropriate instruction. The statement is also demonstrably wrong when it comes to readability on the K-3 level.

There is, however, another issue related to text complexity that I have yet to see anyone explore in the Common Core context. Text complexity is not an unqualified good. Indeed, it may be more reflective of the writer than of the reader. Just what is the responsibility of the author to the reader when writing any text?

Any act of reading is by definition an effort by a reader to comprehend, but it is also an attempt by a writer to be understood. There exists, in what Louise Rosenblatt has called the reading "transaction", an implicit contract between writer and reader. The writer promises to make every effort to be understood and the reader promises to make every effort to understand. So, if a reader's comprehension breaks down when faced with a complex text, is that a failing of the reader or a failing of the writer or a little bit of both?

Nathaniel Hawthorne said, "Easy reading is damned hard writing." Shouldn't a reader expect the writer to put in the effort to write clearly, so that complexity is primarily a matter of the concepts discussed and not a product of the limitations of the writer?

What makes a text complex?

Zhihui Fang and Barbara G. Pace (2013) have identified five factors that make a text complex.

Vocabulary (high frequency of content specific words)
Cohesion (lack of skillful use of cohesive elements can make text complex)
Grammatical metaphors (discussed below)
Lexical density (packing lots of content words into individual clauses)
Grammatical intricacy (lots of long sentences strung together with multiple clauses through coordination/subordination)

Grammatical metaphors are linguistic choices that a writer makes to communicate meaning in an atypical way. Instead of saying "the businesses failed and slowed down", the writer chooses to say "business failures and slowdowns." These atypical structures may make the text harder for a reader to comprehend.

What I have tried to demonstrate here is that most of what makes a nonfiction text complex is rooted in choices that the author makes that may present special challenges to the reader. I believe that the writer, however, has an obligation to consider the reader in making these decisions.

Musing on this issue took me back to some reading I had done long ago in graduate school about a concept called considerate text.

What is considerate text?

Considerate text was first explored by literacy researchers Bonnie Armbruster and Thomas Anderson (1985). Essentially, Armbruster and Anderson posit that authors and editors can do several things in presenting information to make it easier for the reader to understand. They suggest that the following things make writing considerate:

Coherent structure (discussed below)
Introductory paragraph (sets up expectations for the reader)
Headings and sub-headings (guides reader's thinking)
Vocabulary defined in context (assists comprehension)
Clear use of graphic elements like tables, charts and graphic organizers (aids in developing understanding)

Coherence needs a bit of explanation. Armbruster and Anderson identify two types of coherence. First , there is global coherence, which describes the overall organizational structures of the text. Regular discernible structures, where the main idea and supporting details are easily identified, make for considerate text. Local coherence allows the reader to integrate ideas within and between sentences. The skillful use of conjunctions, transition words and clear pronoun referents make a text locally coherent. Reading problems may arise when connections between sentences or between paragraphs are not clear.

As you can see text complexity and considerate text have many intersections and at the point of those intersections stands the writer. To what extent is the reader to be held accountable for the writer's limitations?

Implications

Text book authors and editors have a responsibility to produce text that is considerate of the reader. This is not dumbing down readability as Appendix A of the Common Core suggests, but it is practicing skilled writing and targeting that writing to the correct audience.
Teachers and curriculum directors need to choose text books and supporting readings that are appropriately considerate of the target readers. Reading material can be both considerate and appropriately informative.
At times, of course, students will need to read complex text, because not all writers are as skilled or considerate as others. Teachers need to learn to recognize those elements of a text that make the text complex and plan activities that will help students deal with the complexity. Such activities would include preteaching vocabulary, paraphrasing grammatical metaphors and analyzing grammatically intricate sentences to unpack the meaning.

Forcing students to read more and more complex text under the pretext of college readiness is a mistaken idea. The best preparation for successful reading in college is lots of successful reading experiences in elementary, middle and high school and lots of good instruction in making meaning from a wide variety of texts. In the meantime, it might be a good idea to ask those who write text books for college students to do the hard work necessary to write considerate text.

Sunday, February 22, 2015

Readability of Sample SBAC Passages

In three earlier posts, I took at look at the readability of sample passages for the PARCC assessments which are being used to measure student progress on the Common Core State Standards (CCSS) in some states. You can find those posts here, here and here. As I stated in those posts the concept of readability is complicated and includes quantitative measures like readability formulas, task considerations and qualitative considerations including assessing how the text will match up with the reader.

In this post, I look at the same measures as they relate to the Smarter Balanced Assessment Consortium (SBAC) tests that are being used in other states. I looked at one reading passage on each grade level of the SBAC from grades 3 through 8. I found significant differences in the readability of these passages from what I found in the PARCC tests.

First, for the quantitative measure of the SBAC passages. I used several different readability formulas. Both the SBAC and PARCC tests use Lexile measures to determine readability. I added other commonly used measures of readability as a check against the Lexile levels. As I cautioned in previous posts, quantitative measures of readability are often imprecise, so I used several measures to see if I could get some sort of consensus on the passages.

In this table the Lexile score and the range considered appropriate for grade 3 is provided. Flesch-Kincaid and Fry measures are stated in terms of appropriate grade level. The Flesch Reading Ease score attempts to state the relative ease of reading a passage. A score of 90-100 should be relatively easy to read for an average 11-year-old. A score of 60-70 should be easily understood by a 13-year old.

Quantitative Readability

3rd Grade Passage - A Few New Neighbors

Lexile Level - 510 ((3rd Grade range is 520 - 820)
Flesch-Kincaid Readability Measure (FK) 1.8 Grade Level
Fry Readability Graph (Fry) 2.5
Raygor Readability Graph (RR) 2.5
Flesch Reading Ease (FRE) 94.9

Summary - This passage should be relatively easy to read for an average third grader.

4th Grade Passage - Coyote Tries to Steal Honey

Lexile 900 (4th Grade range is 740 - 940)
FK 4.9
Fry 5.2
RR 4.8
FRE 93.2

Summary - The consensus of the measures indicate that this passage falls in the upper range of readability for a fourth grader. A challenging, but not overly challenging passage by these measures.

5th Grade Passage - A Cure for Carlotta

Lexile 660 (5th grade range is 830 - 1010)
FK 5.8
Fry 6.5
RR 4.5
FRE 76.8

Summary - The Lexile score seems out of step with the other measures on this passage. I will look more closely at the passage below.

6th Grade Passage - Fishy Weather Conditions

Lexile 1040 (6th grade range is 925 - 1070)
FK 7.5
Fry 8.1
RR 4.5
FRE 70

Summary - The Raygor measure is out of step with all other measures, which provide a consensus that this is a challenging text for 6th graders. Again we will look at qualitative aspects of the passage below.)

7th Grade Passage - Life on the Food Chain

Lexile 900 (7th grade range is 970 - 1110)
FK 6.9
Fry 7.1
RR 4.5
FRE 68.3

Summary - Once again the Raygor measure is out of step with the others. The consensus is that this passage should be very readable for the average 7th grader.

8th Grade Passage - Ansel Adams, Painter with Light

Lexile 1090 (8th grade range is 1010 - 1185)
FK 8.3
Fry 8.8
RR 5.3
FRE 65.8

Summary - Once again the Raygor is anomalous, but the consensus here would be that the passage is appropriately challenging for average 8th grade readers.

Task Analysis

The task of a reader taking a standardized test is, of course, to answer questions. I looked at all the questions attached to these passages to determine what tasks were being required of students. For my analysis, I used the question categorization scheme developed by Dr. Taffy Raphael, Question Answer Relationships (QARs). QARs divide questions by the type of work the reader must do to find the answer to a question. Questions are categorized as follows.

Right There; These are literal level questions whose answers can be pointed to directly in the text
Think and Search: These are comprehension level questions like main idea questions that require the reader to put together an answer from pieces of information throughout the reading.
Author and You: These are inferential questions, requiring the reader to use text evidence and his/her own background knowledge to answer the question.
On Your Own: These are questions that are unrelated to the reading of the text. These types of questions are rarely seen on standardized tests.

I looked at 46 questions attached to the passages described above. Here is the breakdown as described by QARs.

Right There: 1
Thank and Search: 17
Author and You 28
On Your Own 0

As would be expected from a test tied to the CCSS, a number of questions asked students to cite evidence for their answers. In the PARCC test this accounted for almost 50% of the questions. On the SBAC this percentage was closer to 30%. Every grade level was asked a question requiring determining the meaning of a word from context. This is also aligned with skills emphasized n the CCSS. Every passage also included questions aimed at the understanding of key ideas in the text and at an overall understanding of the text. While some questions were aimed at text analysis, the balance on the SBAC appeared to me to be more in keeping with a focus on a general comprehension of the text than were the PARCC samples I looked at, which were more focused on passage analysis.

Qualitative Analysis

Since quantitative measures of reading difficulty are notably unreliable, a third factor we must look at is qualitative, i.e. how we think the text will match up with the readers who will be reading it. Ideally a test passage will not disadvantage students because of different background knowledge or culture. In reality we know that standardized tests have difficulty doing this because they are targeted at such a broad audience. Here I look at each of these passages to determine as best I can how they will match with the target readers.

3rd Grade Passage - A Few New Neighbors

A straightforward and pleasant story that follows regular narrative structure. Vocabulary appears very appropriate for a third grade reader.

4th Grade Passage - Coyote Tries to Steal Honey

This passage is a folk tale that also follows a regular narrative structure. The trickster tale should be familiar to most fourth grade readers because so many folk tales are focused on a trickster, whether it is a rabbit, a raven or a coyote. Vocabulary in this tale appears to be well within the wheelhouse of most fourth grade readers. The use of figurative language may cause some readers minor issues in comprehension, but this passage appears to be appropriate for a fourth grade reader.

5th Grade Passage - A Cure for Carlotta

This story of a young boy's immigration to America on a ship from Italy is typical of many other stories aimed at elementary age students studying the story of immigration. The structure is a straightforward narrative with more descriptive detail than the passages for the younger students. Vocabulary load does not appear overwhelming for most fifth grade readers.

6th Grade Passage - Fishy Weather Conditions

This nonfiction passage is informative and entertaining. It explains the unusual phenomenon of fish falling from the sky in some areas of the world. The passage has a fairly high readability level for a sixth grade passage, likely due in part to the introduction of unfamialr vocabulary. Words like "dissipate" and "phenomenon" and "adaptation" might cause readers some challenges, but "dissipate" is directly defined in the passage and skilled readers can probably deduce the other meanings from context. Some figurative language like "connect the dots" may challenge some students. All in all a challenging passage that will cause some grade 6 readers difficulty.

7th Grade Passage - Life on the Food Chain

This nonfiction passage provides a straightforward explanation of the food chain. The text is organized in such a way that it should be easy for 7th grade readers to follow. The vocabulary load is heavy, but almost all terms are clearly explained right in the text. Sentence structure is not overly complex. A fair passage to assess 7th grade readers.

8th Grade Passage - Ansel Adams, Painter with Light

This biographical piece is written in a narrative format, telling the story of how Ansel Adams came to be a great photographer who chronicled the beauty of the American West. The passage contains a good deal of fairly sophisticated sentence structures that may cause some readers difficulty, but in general the account is highly readable. There are few concerns with the level of vocabulary for an eighth grade reader. I think the passage is appropriate for an 8th grade assessment.

Conclusions -

Unlike the passages I reviewed for the PARCC test, I think the passages I examined from the SBAC test are fair representations of what children in those grades can and should be able to read.
The questions asked about these passages seemed to me to be a good mix of comprehension based questions and analysis based questions. In general the questions seemed appropriate to the text.
The passages chosen for the assessment all appeared to me to be straightforward enough that most students could follow them. There were no passages using archaic language or structures, no stories written long ago. Vocabulary was generally reasonable and often defined in the context of the passage.

Cautions:

I sampled only one passage for each grade level, so other passages may have problems I did not see here. Only by actually having large numbers of students taking the test will we be able to tell if the test meets industry and common sense standards of validity and reliability.
Just because I have judged this test to be a reasonable test does not mean that I think this test, or any standardized test, can be used for making high stakes judgements about children, teachers or schools. The failure of standardized tests to be helpful in these areas has been well established. True understanding of individual readers' strengths and weaknesses is best done by professional educators working with children over time.

Test passages that offer most students at grade level the opportunity to demonstrate their actual reading ability can give teachers data that can help to inform instruction. In this cursory look at the SBAC test, it looks like these tests could meet that standard. Time will tell.

Sunday, February 15, 2015

PARCC Readability, Part 3: Considering the Reader

In Part 1 of this series on readability and the Common Core aligned PARCC assessments, I looked at the readability of the PARCC reading passages themselves. In Part 2 the focus was on the task that students were asked to do based on those passages, i.e., the questions that they had to answer after reading those texts. This third part of the series looks at the other factor that influences readability: the student.

At the start, I believe that it is fair to note that Appendix A of the Common Core State Standards, which deals with text complexity, recognizes the student as a critical consideration in determining the readability of a text. Here is part of what that document says about matching students to texts:

“[H]arder texts may be appropriate for highly knowledgeable or skilled readers, who are often willing to put in the extra effort required to read harder texts that tell a story or contain complex information. Students who have a great deal of interest or motivation in the content are also likely to handle more complex texts.”

And Appendix A adds significantly, “Teachers employing their professional judgment, experience, and knowledge of their students and their subject are best situated to make such appraisals.”

So the importance of the reader in any measure of readability is universally acknowledged. Unfortunately, standardized tests cannot match texts to readers like the child’s teacher can; therefore, the ease or difficulty of any reading passage can never be fully ascertained. This is why good standardized tests are pilot tested to determine if some passages cause some readers comprehension difficulties not based on reading skills alone.

Constructivist reading research has identified five things about the reader that matter in the reader’s comprehension of text: reading skills, reader prior knowledge, reader cognitive development, reader culture, and reader purpose. Presumably, we would want a reading comprehension test to measure student reading skills and cognitive development. We also know the purpose of reading in the case of any standardized test is to answer questions after reading. But what of prior knowledge and culture?

We would hope that the test would not advantage or disadvantage a student because of differences in prior knowledge or culture. We would hope, but years and years of standardized testing have shown that this is never the case. Over their many years of use, standardized tests have not been able to keep prior knowledge and culture neutral in assessing reading comprehension.

Prior knowledge is critical to reading comprehension. Essentially, the more you know about a topic before you read about that topic, the better you will be able to comprehend that material and the better you will be able to accommodate any new material you encounter during that reading. In an article titled, Individual Differences that Influence Reading Comprehension, Darcia Narvaez, a professor at the University of Minnesota, puts it this way, “If a person [who] has a great familiarity with a grocery store reads a text about a grocery store, the person will activate a grocery store script.” This script aids the reader in making sense of text. If the reader has little or no script to bring to the text, say if the script were about a trip to an outdoor market in Morocco, the comprehension could be more limited.

Cultural bias in standardized tests has been well-documented over the years. Here is just one example that I read about in a technical report from The Center for the Study of Reading (1981) that looked at response differences of Black students and white students.

“This item involved a passage about a visit of Captain Cook to a group of islands in the South Pacific. The critical section was,

‘he called them the Friendly Islands because of the character of their people. Today, the Tongans still provide visitors with a warm welcome.’

The test item asked for the meaning of the word character as it was used in the story. Most whites chose nature, the answer scored as correct. Blacks frequently chose style. This is a term used more in black than white communities, and it can be argued that in its colloquial sense style is more apt than nature as a synonym for character. It is apparent, at least, that style is not a wrong answer.”

With all the study that has been put into issues of cultural bias, we can expect that passages and questions from any reputable test developer will be vetted for this bias, but again, these issues can be subtle and might best be accounted for through extensive pilot testing.

Now let’s take a look at the sample passages in the PARCC that I have discussed in the previous two posts and see how they measure up on the background knowledge and cultural bias scale from one literacy specialist’s perspective.

Grade 3 – A Once in a Lifetime Experience

This is a pleasant and innocuous story typical of the Highlights for Children magazine from which it was taken. As I said in a previous post, the quantitative readability seems appropriate. The story involves a camping trip with two friends and a dad. Some background knowledge on camping, fishing and boating might advantage students over those who have no such experience, but I do not think the impact would be great. More problematic, for me, is that answering one of the questions requires knowledge of the word “jostle.” The word cannot be clearly defined through context and I would not deem the word as one that most third graders would know. While the text provides students with the definition of “bail” and “adrift”, students are left to their own devices on “jostle.”

Grade 4 – Just Like Home

As I stated in my first post in this series, Just Like Home, by every quantitative measure, is too difficult for use in a reading comprehension test for fourth grade. As far as cultural bias or giving advantage to some who might have greater background knowledge, the concern is slight. The story is an obvious attempt to include a multi-cultural perspective to the test, and that is a good thing. It takes place on a school playground and the concerns seem fairly universal. My favorite part of the story is when the protagonist, Priya, says that the only thing she likes about her new school is art, because she did not have art in her old school. I immediately thought that art had been cut in her previous school so that the kids could do more test prep, but now my bias is showing.

Grade 5 – Moon Over Manifest

This passage is taken from the 2011 Newbery award winning book by Clare Vanderpool. The passage is well written, but it has some characteristics that may make it challenging for some readers. The story is set in the 1930s and references to such things as pocket watches, satchels, storefronts and bustling townsfolk may prove problematic to some. Typical of Newberry award books, the author uses lots of rich figurative language that puts a further burden on the reader, fine for some fifth grade readers, but challenging for others with a less rich background knowledge. Finally, the passage assumes knowledge of what has gone before in the story, the main character is a veteran of hitching rides on trains with her father, which may throw off some fifth grade readers.

Grade 6 – Emancipation: A Life Fable

This fable, written in the 1860s by the well-regarded proto-feminist author, Kate Chopin, seems an odd choice for a test passage. Because it was written 150 years ago, it is replete with word choices and sentence constructions that may be unfamiliar to 11- and 12-year-old readers. Here are two examples:

Here he grew, and throve in strength and beauty under the care of an invisible protecting hand. Hungering, food was ever at hand.

Back to his corner but not to rest, for the spell of the Unknown was over him, and again and again he goes to the open door, seeing each time more Light.

The fable is actually a good example of why quantitative readability measures like Lexiles are problematic. They cannot measure the impact of arcane language or parse the allegorical nature of a fable. This is an extremely challenging passage for a sixth grader, one that would be best used in the classroom with plenty of support from the teacher, certainly not in a testing situation.

Grade 7 – from The Count of Monte Cristo

The Count of Monte Cristo is, of course, the classic adventure novel written in the 1840s by Alexandre Dumas. Written in French, the text has been translated and updated many times over the years. The translation used by PARCC is identified as in the Public Domain, so I would assume the test passage is from an older translation. We know that older texts, which employ language patterns and structures that are unfamiliar to many children, provide a greater reading comprehension challenge than more contemporary texts, so once again the difficulty of this text cannot be accurately measured by a Lexile score.

The text is replete with vocabulary that seventh graders will find challenging and that will impact their comprehension. Words in the passage include countenance, lucidity, well-nigh, loathsome, delirious, ascertain, recurrent. While some of these words might be determined through context by a skilled reader, most cannot. The setting of the story, a dungeon in early twentieth century France during the Bourbon Restoration, also would provide readers with a unique challenge.

The Count of Monte Cristo is a wonderful adventure story and would make entertaining reading for a certain subset of skilled middle school readers. The advisability of using it as general reading in a reading comprehension test for seventh graders is highly questionable.

Grade 8 – Elephants Can Lend a Helping Trunk

This is a non-fiction passage describing an experiment conducted by a team of scientists to test the social cognition of elephants. The passage is clearly written, is cohesive, and for the most part does not present an extraordinary vocabulary challenge. An included photo should help students visualize the experiment.

A minor quibble with this passage is that it was clearly written by a person who uses United Kingdom English. So crows are called “rooks” and some sentence patterns are slightly different than American English.

The biggest concern with this passage is that the understanding of the entire passage depends on the understanding of the words “cognitive” and “cognition.” Indeed the first two questions deal with these two terms. The main body of the passage discusses the elephants’ skills at social cooperation, but the text never draws a clear line from the understanding of cooperation and an understanding of “cognition” as mental processes. I think this passage is appropriate for a high school class in psychology, but question it as a test passage for 8^th graders. I recently had a discussion with my college freshman class about the term “cognition” and some of its related terms cognate, recognize, metacognition, cogitate. I can report that the word cognitive was not in most of my college freshmen’s vocabulary.

Standardized tests by their very nature cannot be well matched to individual readers. Text matching takes a skilled and informed teacher with deep knowledge of her students, the reading to be done and the task to be completed. It is not fair to ask standardized tests to meet these criteria. I believe it is also not sound test science to choose as test passages those passages that contains vocabulary beyond most students in the grade, passages whose targeted audience was adults not children and passages that use archaic language and sentence structures.

The high readability of most of these passages, the unique challenges of the questions asked on the test and the failure of some of the passages to be considerate of the background knowledge and culture of many of the children who will be encountering these tests, guarantees that many children will struggle. Ultimately, the results of the PARCC tests will tell us more about the tests themselves than it tells us about the students taking the tests. One thing it will tell us, I believe, is that these tests are not useful for making any high stakes decisions about individual children, teachers, or schools.

The one thing I can guarantee as an outcome of these tests is that, overall, children living in areas of affluence will do considerably better than children living in areas of poverty. I can guarantee this because it is true of every standardized test ever given, and so it always shall be. If education reformers want to learn from standardized tests, this is the lesson to be learned. The real issue in education is inequity, not the ability of a seventh grader to parse The Count of Monte Cristo.

Wednesday, February 11, 2015

PARRC Test Readability, Part 2: Looking at the Questions

This is the second in a series on issues related to the readability of the PARCC sample tests; tests that many students in the country will be taking for the first time this year. In the previous post I focused on the texts students were to read for the test. With one notable exception, I found the reading passages were aligned with the revised Lexile levels the test makers used for guidance, but about two years above grade level on other widely used measures of text difficulty.

Readability, however, is about more than the level of difficulty of the text itself. It is also about the reading task (what the student is expected to do with the reading) and the characteristics of the reader (prior knowledge, vocabulary, reading strategies, motivation).

In this post I will look at the second aspect of readability that must be considered in any full assessment of readability: the task that the reader faces based on the reading. Since this is a testing environment, the task is answering reading comprehension questions and writing about what has been read.

In any readability situation the task matters. When students choose to read a story for pleasure, the task is straightforward. The task is more complex when we ask them to read something and answer questions that someone else has determined are important to an understanding of the text. Questions need to be carefully crafted to help the student focus on important aspects of the text and to allow them to demonstrate understanding or the lack thereof.

There are many different types of questions that can be asked of a reader, of course. Some questions are easier than others. Bloom's Taxonomy offers a useful scaffold for understanding the levels of questions moving from literal to inferential to evaluative, but for our purposes here, I will use the categories developed by literacy professor, Dr. Taffy Raphael, known as Question-Answer Relationships or QARs.

QARs are useful because they describe the relationship between the question and the kind of work the student must do to answer the question. Four general types of questions are identified in this scheme,

Right There: The answer to the question is right in the text and is usually in the same sentence. The reader can point to the answer in the text. Bloom would call this the literal level of understanding.
Think and Search: The answer to the question is right in the text, but not in the same sentence. The reader needs to think about different parts of the text to come up with an answer. Bloom would designate this as the comprehension level.
Author and You: The answer is not in the text. The reader needs to think about what s/he knows, what the author says, and how these two things fit together. Bloom would call this the inferential level.
On Your Own: The text got you thinking, but the answer is inside your own head; it is not directly answered in the text. On Your Own questions are not usually a part of a standardized test.

I looked at the questions that followed one reading passage for each grade level 3-8 on the PARCC sample tests. The passages I chose were the same as those that I analyzed for readability in the previous post (A Once in a Lifetime Experience, Just Like Home, Moon Over Manifest, Emancipation: A Life Fable, The Count of Monte Cristo, and Elephants Can Lend a Helping Trunk). I looked at a total of 34 test items, which represented all of the questions asked about these passages. Using the QAR framework I found the following:

Right There Questions 0
Think and Search Questions 16
Author and You Questions 18
On Your Own 0

Looking more closely at the questions, some other patterns emerged. The structure of the test called for students to answer one question and then determine what textual evidence supported their answer in the next question. In other words, if you got the first question wrong, it was very likely you would get the second question wrong because the second question was dependent on a correct answer on the first. Of course, it is also possible that the second question would help the test taker revise the answer to the first, but I think that would be more likely to happen in an instructional situation than a testing situation. Fully 16 of the 34 questions examined were dependent on a correct answer to the previous question. Essentially, if a student made one error on these comprehension questions, it likely would become two errors.

Here are some more discoveries from a closer look at the questions:

Five questions were vocabulary in context questions
Sixteen questions focused on identifying textual support for an answer
Three questions dealt with the main idea of the story
Two questions dealt with the structure of the story

Conclusions:

The questions are generally focused on higher order thinking skills (inferential level). Literal understanding of the text (which would be assessed by Right There questions) is either assumed or not valued by the test designers. This is, I suppose, consistent with the CCSS call to "raise the bar", but it will certainly impact student scores on the test.
The questions indicate that the test designers are very focused on the idea of "citing textual evidence" as a dominant reading skill. Textual evidence is repeatedly cited in the CCSS and so I suppose it is unsurprising that it plays a prominent role in this CCSS aligned test. Citing textual evidence is a highly valuable and necessary reading ability, whether it deserves to be addressed in almost fifty percent of the questions on a test of reading comprehension is certainly open to debate.
There are other ways to know if students are using textual evidence. One way is, of course, to ask a Right There question. In order to answer a question that is "right there" in the text a student must, by definition, use textual evidence. The PARCC test completely ignores this way to assess the use of textual evidence in favor of inferential level questions tied to other inferential questions.
The test designers seem to be more interested in textual analysis than on reading comprehension. Analysis of text is, of course, a part of reading comprehension, but it is only a part. Many of the questions on this test could be answered without a good understanding of the text as a whole. The focus seems to be on small bits of understanding (vocabulary in context, textual support) rather than on a more generalized comprehension of the text. The irony here is that as the CCSS calls for a higher level of critical thinking about text, these questions focus on a very narrow view of the text itself.
There are some things to like about these tests. First of all, the tests use complete stories, rather than truncated segments of stories. This should assist students in building the understanding they need to reply to questions. Secondly, the test highlights an important aspect of being a thoughtful reader, citing textual evidence. This emphasis will surely spur teachers to emphasize this in their instruction, which is a good thing.

Whenever a new test is rolled out, we know through past experience that test scores will go down. Over time schools, teachers, and students adjust and the trend then is for scores to go up. It will be no different with the PARCC tests. As the scores rise, some questions will arise like, "Have we been focused on the right things in these tests?" and "Have the tests led to better, more thoughtful readers?" Based on my analysis of these test questions, I am not confident.

In my next post, I will look at the third variable in readability: the reader.

Sunday, February 8, 2015

PARCC Tests and Readability: A Close Look

I approach the subject of readability on the new PARCC tests with caution. Readability is the third rail for literacy specialists. While The Literacy Dictionary, defines readability as "an objective estimate or prediction of reading comprehension of material in terms of grade level", such objectivity does not ensure accuracy. All sorts of formulas for estimating readability exist and all of them are both useful and inaccurate or misleading in some way.

As I said in a previous post here, readability is too complex to be captured by a mere number as the currently popular Lexile measures attempt to do, or by a grade level as other traditional formulas try to do. Readability is best understood as a dynamic between the characteristics of the reader, the characteristics of the text and the particular task that is being attempted. The only real way to know if a text is "readable" for a student is to sit down with a child, hand them a text and see how they do with reading and talking about it.

Of course, this is not practical in a mass testing environment, so we need to use readability tools to determine the difficulty of texts. To that end, spurred on by my Facebook friends Heidi Maria Brown, Darci Cimarusti and Ani McHugh of Opt Out of Standardized Tests-New Jersey, I have decided to take a close look at the PARCC sample test reading comprehension passages and try to assess their readability, and therefore, their appropriateness for a testing environment.

Since readability formulas are notably unreliable, I first decided to use several different readability measures to see if I could get a closer approximation of level. The measures I use are all commonly used in assessing readability. All of them use two variables, with slight variations, to determine readability: word length and sentence length. They vary slightly in the weights they give these variables and in how these variables are determined.

The readability formulas I used were the Fry Readability Graph (Fry), the Raygor Readability Graph (RR), the Flesch-Kincaid Readability Tests(FK), the Flesch Reading Ease test (FRE) and Lexile Framework for Reading.The Fry, Raygor and Flesch-Kincaid formulas yield a grade level readability estimate. The Flesch Reading Ease test provides an estimate of the "ease of reading" of a passage based on a child's age. Lexile measures are the preferred readability measure of the whole corporate education reform movement behind the Common Core and PA,RCC so it must be included here as well. According to the Lexile Framework website "Lexile measures are the gold standard for college and career readiness."

I have written about Lexile measures before here. The "chief architects" of the Common Core State Standards worked with the company that licenses the Lexile framework to realign the Lexile levels to raise the levels in every grade starting with grade 12 and stair stepping down to the early grades. You can find a comparison chart of the changes here. This was done, ostensibly, to ensure the college and career readiness of our graduating students. It was also done without any research base to back up the changes. Anyway, the Lexile Framework is the reformy "gold standard" so it is included here.

The Flesch Reading Ease test needs a bit of an explanation. Reading ease is estimated on a scale of 1 - 100. The higher the number, the easier the text is to read. Texts in the 90-100 range should be easily read and understood by an 11-year-old. Texts in the 60-70 range should be easily read and understood by a 13-15-year-old. Texts scoring in the 0-30 range are best understood by university graduates.

With that background here is what I found on the PARCC sample tests, using one reading sample from each grade level. In each case I took a 300+ word sample.

Grade 3 Passage: A Once in a Lifetime Experience by Sandra Beswethrick

Lexile 680 (3rd Grade range is 520 - 820)
FK 3.5 (grade level)
Fry 4.0 (grade level)
RR 3.3 (grade level)
FRE 87

Summary: By all measures the reading passage seems challenging, but appropriate for the upper levels of grade 3.

Grade 4 Passage: Just Like Home by Mathangi Subramanian

Lexile 1100 (4th Grade range is 740 - 940)
FK 6.1
Fry 6.3
RR 6.0
FRE 80.5

Summary: By all measures this reading passage seems inappropriate for assessment in grade 4.

Grade 5 Passage: from Moon Over Manifest by Clare Vanderpool

Lexile 950 (5th Grade range is 830 - 1010)
FK 7.0
Fry 6.9
RR 6.5
FRE 74

Summary: While the Lexile measure would indicate the text is appropriate for 5th grade, all other measures would indicate that this would be an extremely challenging text for 10 or 11-year-olds.

Grade 6 Passage: Emancipation: A Life Fable by Kate Chopin

Lexile 970 (6th Grade range is 925 - 1070)

FK 8.7
Fry 8.6
RR 8.2
FRE 73.6

Summary: While the Lexile level would indicate the text is appropriate for 6th grade, all other measures indicate that this text will be very challenging for 11-12-year olds.

Grade 7 Passage: from The Count of Monte Cristo by Alexandre Dumas

Lexile 1080 (7th Grade range is 970 - 1120)
FK 10.0
Fry 8.5
RR 8.5

FRE 64.6

Summary: While the Lexile level would indicate the text is appropriate for upper level 7th graders, all other measures indicate that this text will be very challenging for 12-13-year olds.

Grade 8 Passage: Elephants Can Lend a Helping Trunk by Virginia Morell

Lexile 1110 (8th Grade range is 1010 - 1185)
FK 10.6
Fry 10.4
RR 6.4
FRE 51.1

Summary: The passage falls within the Lexile range for 8th grade. The Fry, Flesch (FK) and FRE all indicate that the passage would be more appropriate for 10th grade students or above. The Raygor (RR) score appears to be anomalous.

Conclusions: The stated purpose of the Common Core State Standards and the aligned PARCC test was to "raise the bar" based on the notion that in order to be "college and career ready" students needed to be reading more complex text starting in their earliest school years. The PARCC sample tests show that they have certainly raised the bar when it comes to making reading comprehension passages quite difficult at every grade level.

These results clearly show that even by the altered Lexile level standard the 4th grade passage is much too difficult for 4th grade children. I would hope that the actual PARCC would not include any material remotely like this over-reaching level of challenge for children. I would hope, but the inclusion of this passage in the sample does not give me confidence.

The other results show that the passages chosen are about two grade levels above the readability of the grade and age of the children by measures other than the Lexile level. The results of testing children on these passages will be quite predictable. Students will score lower on the tests than on previous tests. We have already seen this in New York where test scores plummeted when the new tests were given last year. English Language Learners (ELL) and students with disabilities will be particularly hard hit because these tests will prove extraordinarily difficult to them.

What happens when students are asked to read very difficult text? For those students who find the text challenging, but doable, they will redouble their efforts to figure it out. For the majority of children, however, who find the text at their frustration level, they may well give up. That is what frustration level in reading means. The ideal reading comprehension assessment passage will be easy for some, just right for most and challenging for some. The PARCC passages are likely to be very, very challenging for most.

What can schools and parents learn from these tests? These types of mass administered standardized tests have never been very good at giving teachers or parents actionable feedback that they can use to help students. The tests are best used to help educational leaders and classroom teachers spot trends in performance of a district or a school over time and to make programmatic adjustments. When more than 70% of students fail to reach proficiency on a test, as was the case in New York when this test was tried, the only possible conclusion is that the test was not appropriate. Many students gave up in frustration. There is no actionable feedback available.

The results of the PARCC will no doubt feed into the education reform movement narrative that our kids, schools and teachers are failing. A cynic might think that this was deliberate. That this was a way to continue to discredit public school teachers, children and schools. If I wanted to advance this narrative, I would devise a test that arbitrarily raised the standards, provide some pseudo-science to make it appear reasonable, make sure students and teachers had limited time to adjust to the new testing standards and then broadcast the predictable results widely.

As a parent considering whether or not I want my child to take this test, I would want to know what I am going to learn by having my child participate in something that will likely cause frustration and which will give me very limited information on how my child is doing. The reformers will tell us that these tests are a "civil rights" issue. That having kids take these yearly tests is the only way we can know if all children are being well served by the school. As a parent, I would want the reformers to show me that the students are being well served by the test. Now there is a civil rights issue.

Monday, February 2, 2015

Text Complexity: Towards a More Nuanced Understanding

This blog has addressed the issue of text complexity on a number of occasions. Some initial concerns were laid out here and here. While my concern that the Common Core State Standards (CCSS) approach to text complexity might actually exacerbate the achievement gap was addressed here. Finally, in a recent post I cited a concern from noted literacy researchers Valencia, Wixson and Pearson that text complexity was being misunderstood and misapplied here.

And the drum beat of concern about text complexity goes on. Last week the Teachers College Record published a commentary by Connecticut College professor Lauren Anderson and USC professor Jamy Stillman entitled (Over)Simplifying Complexity: Interrogating the Press for More Complex Text. The article ties in directly with my earlier stated concerns that the concept of text complexity as laid out in the CCSS would lead to confusing, poor instruction and to the continued widening of the achievement gap.

Anderson and Stillman looked at the efforts of a group of first grade teachers in a bilingual school to apply the CCSS call for more complex texts in their classrooms. These teachers reported that they were being pressed by administrators to use texts with more complexity. Both the teachers and the administrators seemed to possess a simplistic understanding of text complexity based on reading level (i.e. a higher Lexile level = a more complex text).

As Anderson and Stillman put it

Ultimately, our data indicate that teachers experienced pressure from administrators to use complex texts, and that teachers understood “complex” to mean—and to mean to their administrators—more difficult in general. Indeed, the pressure seemingly rooted in administrators’ concerns about readying students for the kinds of text passages they would encounter on standardized tests—manifested in their directing teachers to select whole class texts that would prepare students for that level of challenge.

In practice the teachers found that this did not work. It became clear to them that simply trying to help students navigate harder text based on higher reading level caused a great deal of student struggle. Some struggle was expected, Anderson and Stillman report that these teachers bought into the narrative of low achieving students need for “grit”, but the struggle the students were experiencing went beyond what the teachers were comfortable with.

Indeed, the struggle was such that as Anderson and Stillman see it, this simplistic approach to the concept of complexity and the resultant struggle led to the students not being able to engage in any type of meaningful dialogue around the text.

The teachers came to realize that their operant understanding of text complexity as higher Lexile level texts was not adequate. They were increasingly aware that they needed to revise their definition of text complexity to include the context of the reading situation, the background knowledge and skills of the students and the reading instruction goals.

Anderson and Stillman sum it up this way:

[R]ather than treating complex text’ as a gateway and/or necessary pre-condition for complex literacy learning, educators would be wise to nurture more nuance. Indeed, since even the most helpful, reliable measure of a text’s complexity will have its limitations we advocate for an understanding of text complexity that is less about single or narrow measures,and more about process and pedagogy.

This is consistent with what I have reported in my previous postings. Complexity is more about the challenge embedded in the instruction than it is about the level of the text. The text must be accessible to the students for complex instruction and high level discussion to take place. A simplistic understanding of complexity, which like these authors, I find to be rampant among school administrators, will only lead to less quality reading and discussion. It is ironic that the CCSS call for more complex discourse around books could well be undermined by a misguided call for more text complexity.

I’ll give Anderson and Stillman the last word.

[Our] findings suggest that the CCSS implementation process, even at a high performing school, pressed dynamic, dedicated, bilingual teachers—the kind of teachers for whom policymakers and practitioners alike clamor to practice in ways that were ultimately less sensitive, scaffolded and responsive to students than any of them intended.

For further reading