{"id":11769,"date":"2022-04-08T19:31:19","date_gmt":"2022-04-08T19:31:19","guid":{"rendered":"https:\/\/citejournal.org\/\/\/"},"modified":"2022-09-02T20:10:50","modified_gmt":"2022-09-02T20:10:50","slug":"technology-enhanced-tasks-to-assess-three-dimensional-science-sense-making-possibilities-and-lessons-learned-from-the-onpar-ngss-based-classroom-assessment-project","status":"publish","type":"post","link":"https:\/\/citejournal.org\/volume-22\/issue-2-22\/science\/technology-enhanced-tasks-to-assess-three-dimensional-science-sense-making-possibilities-and-lessons-learned-from-the-onpar-ngss-based-classroom-assessment-project","title":{"rendered":"Technology-Enhanced Tasks to Assess Three-Dimensional Science Sense-Making: Possibilities and Lessons Learned from the ONPAR NGSS-Based Classroom Assessment Project"},"content":{"rendered":"\n

In the late 1990s, Randy Bennett, the director of the National Assessment of Educational Progress Technology\u2013Based Assessment Project, envisioned three phases to the reinvention of assessment through technology (Bennett, 1998). First, infrastructure building would be necessary to enable widespread use of computer-based assessments. Second, he foresaw that technology would support a transformation of question-and-response formats and scoring sophistication, enabling the field to move beyond multiple-choice items and dichotomous scoring. Third, he envisioned assessment being rooted in cognitive science, serving individual and institutional purposes, and allowing teachers and students to utilize assessment feedback to enhance learning.<\/p>\n\n\n\n

In the decade following Bennett\u2019s prediction, assessment did not characteristically change (Tucker, 2009a). However, starting in 2010, the U.S. Department of Education supported several assessment consortia focused on developing large-scale summative online assessments that aimed to measure student mastery of new standards. Smarter Balanced, the Partnership for Assessment of Readiness for College and Careers, WIDA, English Language Proficiency Assessment for the 21st Century, and Dynamic Learning Maps transformed large-scale assessments from paper and pencil to online administration, making delivery more streamlined.  Phase 1 of Bennett\u2019s transformation was realized, with administration of most large-scale assessments for K-12 students taking place via computer.<\/p>\n\n\n\n

Another vision for technology-enhanced assessment has been to provide greater accessibility to students with disabilities and English learners (Almond et al., 2010). In the past, these students have received traditional tests with accommodations added on. In one study, researchers found that over 75 different assessment accommodation strategies were used for English learners, including dictionaries, glossaries, extra time, and test translations (Rivera & Collum, 2006). However, not all accommodations were appropriate to English learners\u2019 needs, and their use showed mixed results (Kieffer et al., 2009).<\/p>\n\n\n\n

Rather than adding accommodations after an assessment has already been developed, researchers have envisioned that technology-enhanced assessments could be designed with accessibility principles at the outset to include the greatest number of students (Thurlow et al., 2006). For example, technology could offer the opportunity to embed tools within the assessment platform to reduce the need for post hoc accommodations or provide scaffolding to support student understanding (Almond et al., 2010). Research on accessibility may, in fact, dovetail with Phase 2 of innovation, reenvisioning how item stimuli is presented to students.<\/p>\n\n\n\n

Numerous researchers have written about the potential of technology for Phase 2 innovation, asserting that technology in assessment has the potential to afford new opportunities that were not possible with paper and pencil tests, notably, the ability to better assess the construct of interest as well as skills and reasoning abilities (Alonzo & Ke, 2014; Gane et al., 2018; Pellegrino & Quellmalz, 2010).<\/p>\n\n\n\n

Proponents argue that technology offers the chance to change what is observed in the assessment context and how it is observed, because technology-enhanced assessments have the capability of delivering novel stimuli and gathering unique responses that are not possible in traditional formats (Gane et al., 2018; Kopriva & Wright, 2017; Tucker, 2009b). Moreover, technology-enhanced assessments provide the potential to gather evidence of student learning behaviors as well as interpret them. For example, technology-enhanced assessments can contain animations and graphics to present information to students dynamically and offer novel response types that enable students to draw, model, and carry out investigations. Computer algorithms can be generated to interpret these behaviors automatically, easing the burden of scoring for educators with less subjectivity.<\/p>\n\n\n\n

Thus, technology may afford educators the opportunity to assess practices and skills that are better matched to the types of reasoning and response process that are of interest (Gorin & Mislevy, 2013). This feature is especially useful under the Next Generation Science Standards<\/em> (NGSS, NGSS Lead States, 2013), which call for students developing three-dimensional science abilities \u2014 understanding of disciplinary core ideas and mastery of science and engineering practices and crosscutting concepts \u2014 as well as assessing these dimensions in an integrated way.<\/p>\n\n\n\n

Assessing students\u2019 three-dimensional abilities has been one of the challenges facing science education since the inception of NGSS (Alonzo & Ke, 2016; Pellegrino, 2012; 2013; Pellegrino et al., 2014; Songer & Ruiz Primo, 2012). Although science education has been at the forefront of exploring how to present and interpret complex questions in assessment environments (Pellegrino & Quellmalz, 2010), reviews of pre-NGSS science assessments indicate that most high stakes science tests were unidimensional, focused on disciplinary core ideas. Further, most test formats were limited to multiple choice questions, which limited abilities to make inferences about other dimensions of NGSS (Sawchuk, 2019). Technology holds a great deal of promise for assessing students\u2019 three-dimensional abilities, because it provides a context for students to use and apply their reasoning skills in innovative ways.  <\/p>\n\n\n\n

To meet the challenge of NGSS assessment, several groups have embarked upon researching and designing technology-enhanced assessment materials intended to be used in classroom contexts. One such project is a collaboration involving the BEAR Center and Stanford University, which created online assessments for middle and early high school focused on two science topics, the physical behavior of matter and ecology, along with the practice of argumentation. Technology is utilized to administer the assessments through the Berkeley Assessment System Software; items are text based, and the tasks are hand-scored according to rubrics. Example tasks and scoring may be found at http:\/\/scientificargumentation.stanford.edu<\/a>.<\/p>\n\n\n\n

Another group, the Next Generation Science Assessment Collaborative, has created science assessment tasks, rubrics, and accompanying instructional resources for Grades 3-5 and Grades 6-8, available at https:\/\/ngss-assessment.portal.concord.org\/<\/a>. A variety of science content areas are covered by the assessments. Technology enables the use of models, videos, data analysis tools, as well as other tools that allow students to demonstrate understanding (Damelin & McIntyre, 2021). Upon completion of the project, the collaborative anticipates that 200 tasks will be available, along with scoring rubrics for teachers to use in classroom contexts.<\/p>\n\n\n\n

A third project, High-Adventure Science, has created six classroom activities and related item sets pertaining to cutting-edge Earth Science topics alongside the practice of argumentation. Starting around 2014, the project began to explore the use of technology to automatically score student-generated arguments for two of their new high school units. The assessments provide both individual student and classroom-level feedback to students and teachers. Example materials can be found at http:\/\/has.concord.org\/index.html#interactives<\/a>.<\/p>\n\n\n\n

The ONPAR project, the focus of this article, has also created technology enhanced assessment tasks for middle school science classroom use.  The project leverages technology to design challenging, accessible assessments. The assessments are appropriate for students who struggle with text heavy assessments such as English language learners and students with disabilities in reading, as well as mainstream students (Kopriva et al., 2021). Assessments also utilize technology to offer automatic scoring and reporting.  The project has completed 12 units of science materials covering life, physical and earth sciences, and a total of 75 assessment tasks. Examples of the ONPAR approach to assessment are available on the project website at http:\/\/iiassessment.wceruw.org\/projects\/<\/a>. <\/p>\n\n\n\n

These four projects range in their utilization of technology \u2014 from administering assessments via computers in online formats, presenting information and providing interactive online tools, and utilizing accessibility resources to scoring responses and providing reports automatically. These approaches illustrate the ways NGSS assessments are leveraging technology for the purposes of measurement and the various ways that educators will need to become adept in technology use in classroom contexts. Teachers will have to become comfortable administering assessments online, supporting student interaction with innovative item types, and interpreting and using assessment results that are automatically scored, rather than scored by themselves. <\/p>\n\n\n\n

Even though technology-enhanced assessments hold a great deal of promise for improving measurement of student knowledge and skills, adoption and implementation of new technology is challenging for educators (Koehler & Mishra, 2009), especially at a time when implementation of new standards also requires substantial instructional shifts from educators (Alonzo & Ke, 2014; Reiser, 2013). If educational measurement is to realize Bennett\u2019s third phase of reinvention and inform teaching and learning, classroom contexts and the way assessments are used by educators need to be researched and understood.<\/p>\n\n\n\n

Koehler and Mishra noted the importance of understanding the affordances and constraints of new technologies and the ways they influence teacher behaviors. Their framework, technology, pedagogy, and content knowledge (TPACK), describes how teachers combine pedagogical content knowledge (Shulman, 1987) with their understanding of educational technologies. Successful use of technologies in classrooms, they claimed, requires that teachers develop a knowledge base consisting of content, pedagogical methods, and technology. Further, technology use is particular to each content domain and should be influenced by the pedagogical practices specific to each discipline (see Bull et al., 2019).  Developing fluency with these three domains allows teachers to have a deep and flexible understanding of teaching with technology which, in turn, helps them utilize technology to advance student learning. <\/p>\n\n\n\n

Koehler and Mishra (2009) noted that external factors such as time, teacher beliefs about pedagogy (see also Ertmer, 2005), and access to training may influence their success with technology adoption and implementation. They recommended that professional development be designed with these factors in mind.<\/p>\n\n\n\n

Similarly, Gane et al. (2018) recommended that teachers receive support to understand and utilize technology-enhanced assessments, noting that educative supports have the potential to increase teachers\u2019 success. However, in-service teachers have indicated that they need more professional learning on classroom assessment (DeLuca & Klinger, 2009; Klinger et al., 2012), and many commercial classroom assessment packages do not provide a robust program of professional learning. Thus, when adopting a classroom assessment, especially one that is technology enhanced, it is important to consider the amount of training and support teachers need to use it successfully so that it can be fully leveraged for instructional aims. <\/p>\n\n\n\n

The project described in this article researched and developed assessments with accessibility principles from the outset, aimed at all three levels of Bennett\u2019s vision for technology-enhanced testing. Assessments were fully delivered online, utilized innovative item types and scoring, and sought to provide instructionally useful assessment data. The ONPAR project researched and developed innovative multisemiotic (Kress & van Leeuwen, 2001) science assessment tasks for middle school science classroom use, utilizing visuals, action, sound, and language to communicate to and from students in the assessment environment.<\/p>\n\n\n\n

Using a variety of communicative methods reflects the varied ways students learn and reason in science classrooms and addresses access needs of students who may struggle with the language load of traditional tests such as English language learners (ELLs; Kopriva, 2008; Logan-Terry & Wright, 2010). The project developed 12 units of science materials, including 75 assessment tasks to assess students in middle school science classroom contexts. When the assessments were piloted, project participants also committed to participating in a series of professional learning meetings to learn about the assessment targets, how to implement the assessments, and how to utilize scoring and reporting information for instructional purposes.<\/p>\n\n\n\n

Because the assessments were novel in all aspects, including the NGSS focus, the types of items, as well as the automatic scoring and reporting, the project sought to investigate the conditions that supported or hindered teachers\u2019 successful implementation and use of ONPAR classroom assessments through a survey. In the sections that follow, we provide an overview of the project and review results from project surveys to describe the overall success of implementation, as well as the factors that supported and hindered its success. <\/p>\n\n\n\n

Description of Assessment Approach<\/h2>\n\n\n\n

The ONPAR assessment methodology is a unique multisemiotic approach that uses a wide range of representations both to present assessment items and to open up response types (http:\/\/iiassessment.wceruw.org\/projects\/<\/a>). The theoretical underpinnings and empirical support for the assessment methodology come from the fields of semiotics (Jewitt, 2008; Kress, 2003, 2010; Kress & van Leeuwen, 2001), cognitive science (Gee, 2007; Graf & Kinshuk, 2008; Myers, 2015; Pellegrino et al., 2001) and Evidence Centered Design (ECD; Kane, 2013; Mislevy, 2009; Mislevy, 2013).<\/p>\n\n\n\n

The assessment approach has been developed and researched through a series of federally funded grants and has demonstrated success in addressing the linguistic and cultural barriers encountered by low-English proficient ELLs on assessments for large-scale, summative purposes, such as annual state accountability measures (Kopriva et al., 2016; Kopriva et al., 2021). Further, research has shown that ONPAR items can successfully measure challenging science concepts and skills of ELLs\u2019 using novel computer-interactive techniques that largely redirect the language comprehension and production loads to multisemiotic representations (Kopriva et al., 2016; Kopriva & Wright, 2017). The current project sought to extend this line of research, applying the assessment methodology to develop assessments intended for use in middle school science classrooms.   <\/p>\n\n\n\n

To develop the assessments for the most recent project, a systematic approach was undertaken by project staff, using ECD as a starting point. Development of each ONPAR science unit began with identification of the NGSS Performance Expectations (PEs) on which assessment tasks and items would be based. Once the PE for a specific task was identified, it was then unpacked<\/em> (Harris et al., 2016) to fully understand the depth and scope of what demonstrable student abilities were expected. In unpacking, task developers focused on the connections<\/em> between the dimensions. That is to say, rather focusing on DCIs, SEPs, or CCCs on their own, task developers considered the relationships between at least two dimensions at a time (i.e., DCI & CCC, DCI & SEP, and SEP & CCC).<\/p>\n\n\n\n

Understanding the standards this way supports the eventual development of multidimensional assessment items that incorporate at least two of the NGSS\u2019s three dimensions. By disentangling the connections between the dimensions while maintaining a vision of them in the whole of the PE, ONPAR domain analyses focused on the components of the PE that cue assessment task contexts, screen content, and interactive elements.<\/p>\n\n\n\n

In addition to NGSS PEs, task designers used the current understanding of learning progressions in science education (Alonzo & Elby, 2019; National Research Council [NRC], 2012) to determine assessment goals and evidence. Learning progressions describe how students develop successively more sophisticated ways of reasoning about science content as they obtain more experience with phenomena and representations and improve their cognitive abilities (Smith et al., 2006). Because NGSS for middle school encompasses the entire Grade 6 \u2013 8 band, task designers used learning progression endpoints for Grade 5 and Grade 8 (NRC, 2012) to bracket the conceptual understanding assessed on screens, in tasks, and across assessment units. <\/p>\n\n\n\n

In addition to unpacking the NGSS, the assessment approach also considered test takers from the outset of development. The ONPAR approach is unique in its emphasis on the person dimension<\/em> described by Kopriva et al. (2016) and Kopriva and Wright (2017). These researchers argued that assessments must be designed in ways to allow the widest range of test takers to demonstrate their abilities. Consequently, the experience <\/em>of individuals taking the test factor significantly into all stages of the task development.<\/p>\n\n\n\n

At the item series level, this means identifying contexts that would be familiar or accessible to most learners. At the screen level, this translates into a set of design practices that provide users with multiple ways to understand and respond to questions. ONPAR screen design practices include the following (Kopriva & Wright, 2017; Wright, 2013):<\/p>\n\n\n\n