Validation of newly developed tasks for the assessment of generic Critical Online Reasoning (COR) of university students and graduates

Nagel, Marie-Theres; Zlatkin-Troitschanskaia, Olga; Fischer, Jennifer

doi:10.3389/feduc.2022.914857

ORIGINAL RESEARCH article

Front. Educ., 07 November 2022
Sec. Higher Education
Volume 7 - 2022 | https://doi.org/10.3389/feduc.2022.914857

Validation of newly developed tasks for the assessment of generic Critical Online Reasoning (COR) of university students and graduates

Marie-Theres Nagel^*

Olga Zlatkin-Troitschanskaia

Jennifer Fischer

Department of Business and Economics Education, Johannes Gutenberg University, Mainz, Germany

In recent decades, the acquisition of information has evolved substantially and fundamentally affects students’ use of information, so that the Internet has become one of the most important sources of information for learning. However, learning with freely accessible online resources also poses challenges, such as vast amounts of partially unstructured, untrustworthy, or biased information. To successfully learn by using the Internet, students therefore require specific skills for selecting, processing, and evaluating the online information, e.g., to distinguish trustworthy from distorted or biased information and for judging its relevance with regard to the topic and task at hand. Despite the central importance of these skills, their assessment in higher education is still an emerging field. In this paper, we present the newly defined theoretical-conceptual framework Critical Online Reasoning (COR). Based on this framework, a corresponding performance assessment, Critical Online Reasoning Assessment (CORA), was newly developed and underwent first steps of validation in accordance with the Standards for Educational and Psychological Testing. We first provide an overview of the previous validation results and then expand them by including further analyses of the validity aspects “internal test structure” and “relations with other variables”. To investigate the internal test structure, we conducted variance component analyses based on the generalizability theory with a sample of 125 students and investigated the relations with other variables by means of correlation analyses. The results show correlations with external criteria as expected and confirm that the CORA scores reflect the different test performances of the participants and are not significantly biased by modalities of the assessment. With these new analyses, this study substantially contributes to previous research by providing comprehensive evidence for the validity of this new performance assessment that validly assesses the complex multifaceted construct of critical online reasoning among university students and graduates. CORA results provide unique insights into the interplay between features of online information acquisition and processing, learning environments, and the cognitive and metacognitive requirements for critically reasoning from online information in university students and young professionals.

Introduction

The digital age has transformed learning in higher education as well as the learning materials accessible to students (Ali, 2020; Banerjee et al., 2020). The acquisition and use of information has evolved substantially in recent decades and also fundamentally affects students’ learning (Boh Podgornik et al., 2016; Brooks, 2016; Maurer et al., 2020). University students nowadays prefer the Internet to traditional textbooks for information acquisition; moreover, in the recently increasingly prevalent digital teaching and learning contexts, students use not professionally produced learning resources, found by eclectically browsing the web, more often and ubiquitously than the recommended OER. The Internet has therefore become one of the most important sources of information for learning; not only for the preparation of papers or presentations but also when studying for exams (Brooks, 2016; Newman and Beetham, 2017; Maurer et al., 2020). The World Wide Web provides a flexible learning resource while also accelerating the dissemination and processing of information and knowledge (Braasch et al., 2018; Weber et al., 2019; Maurer et al., 2020). However, learning with freely accessible online resources also presents challenges (Qiu et al., 2017; Ciampaglia, 2018). Since content can be freely distributed on the Internet, vast amounts of unstructured, untrustworthy, inaccurate, or biased information are just as readily available to learners as credible, verified information (Walton et al., 2020). Dealing with the vast amount of information available online, on a platform characterized by low publication barriers and deficiently established quality standards, requires students to be critically evaluative (Liu et al., 2014; Tribukait et al., 2017). Thus, the ever-changing information and learning environment has profound consequences for the imparting of knowledge in higher education (Harrison and Luckett, 2019; Weber et al., 2019; Maurer et al., 2020). To competently use and successfully learn from the information and resources openly accessible on the Internet, students must be able to critically search, select, review, and evaluate online information and sources based on relevant quality criteria (Sendurur, 2018; Molerov et al., 2020; Nagel et al., 2020). In the context of increasingly digital and self-directed teaching and learning processes in higher education, the successful use of digital media and competent, critical use of online information constitutes one of the most important student skills for successful study (Harrison and Luckett, 2019; Molerov et al., 2020), as has also been emphasized by the most recent research review (Osborne et al., 2022). This classifies it as a so-termed generic skill, which college graduates are expected to develop to operate successfully as professionals and responsible citizens of democratic societies (Binkley et al., 2012; National Research Council, 2012; Shavelson et al., 2018; Virtanen and Tynjälä, 2018; Zlatkin-Troitschanskaia et al., 2021a). In addition to professional knowledge, such skills include quantitative reasoning, critical literacy and thinking, ethical and moral reasoning, and written and oral communication that college graduates can draw upon to address life’s everyday judgments, decisions, and challenges. As a current literature review indicates, nowadays, searching, evaluating, selecting, and using high-quality online information have additionally become generic skills important for successfully studying in higher education (Zlatkin-Troitschanskaia et al., 2021b).

So far, the related subskills have been assessed based on various theoretical constructs, such as “multiple-source use” (MSU; Braasch et al., 2018; Hahnel et al., 2019), “information trust” (Johnson et al., 2016; Leeder, 2019), and “web credibility” (Flanagin and Metzger, 2017; Herrero-Diz et al., 2019). While providing important insights into the individual subskills, these approaches have not yet systematically focused on the interplay between features of online information acquisition and learning environments and the (cognitive) requirements for critical reasoning from online information (Goldman and Brand-Gruwel, 2018). Another relevant research strand focusses on the aspect of communicating the selected and critically evaluated information to answer an initial question, as such communication skills are particularly needed in later (professional) life (Chan et al., 2017; Braun, 2021). Lawyers or physicians, for example, not only have to compile various, reliable pieces of information on individual cases and draw conclusions from them, but also regularly exchange information with clients and patients in this process (e.g., Korn, 2004; Aspegren, and LØnberg-Madsen, P., 2005).

A recent review consolidating information problem-solving and multiple source use approaches highlights existing desiderata in examining how evaluated information is used in more advanced analytical reasoning processes and what role the characteristics of information play in reasoning (Goldman and Brand-Gruwel, 2018). For instance, while students may differ in their judgment of the credibility of a source, drawing invalid inferences is generally wrong epistemically and indicates poor (online) reasoning skills. In addition, most of the tests used so far to measure these subskills have a close-ended format, thus covering only limited aspects of dealing with online information use and, in particular, failing to measure the actual reasoning process, and underlying procedural skills (Ku, 2009; Desai and Reimers, 2019). In addition, these procedures no longer do justice to the current efforts of higher education institutions regarding the measurement of students’ competencies, which increasingly focus on a holistic representation of students’ capabilities to act (Shavelson et al., 2019).

To make these crucial student skills pertaining to the online information environment empirically measurable and to be able to specifically promote them, a new theoretical-conceptual framework of Critical Online Reasoning (COR) was developed (see section “Conceptual background”; for details, see Molerov et al., 2020). COR describes the abilities of searching, selecting, accessing, processing, and critically reasoning from online information, e.g., to solve a particular generic or domain-specific problem or task (for details, see Molerov et al., 2020). This involves critically distinguishing trustworthy from untrustworthy information and making argumentative and coherent judgments based on credible and relevant information from the online environment. Based on this conceptual framework, a COR performance Assessment (CORA) was newly developed and underwent initial validation (Molerov et al., 2020; Zlatkin-Troitschanskaia et al., 2021a). Based on the COR construct definition, CORA includes various authentic situational tasks in the online media environment, i.e., the real Internet, to objectively and validly assess students’ COR skills in a realistic performance assessment. This holistic assessment measures all required skill (sub-)dimensions and their interplay instead of only individual facets as would be the case, for example, with closed-ended tests (Davey et al., 2015; for a CORA task example, see Figure 1).

FIGURE 1

Figure 1. Sample task of the Critical Online Reasoning Assessment (CORA).

When measuring students’ COR skills through CORA, validity is one of the key quality criteria for the reliable interpretation of students’ test results. The Standards for Educational and Psychological Testing (hereafter referred to as “AERA Standards”) provide criteria for the reliable validation of educational tests (AERA, APA, and NCME, 2014). According to the AERA Standards, five aspects should be analyzed during validation and various sources of information should be used as evidence. The aspects to be analyzed are “test content,” “task-and test-response processes,” “internal structure of a test,” “interrelationships with other variables,” and “consequences of testing” (for details, see AERA, APA, and NCME, 2014). Therefore, the focus and central contribution of this paper is to present the comprehensive, multi-perspective and in-depth validation of the CORA as a novel performance-based test of generic student skills in higher education.

To validate the CORA tasks and interpret the test scores, initial validation steps have already been carried out:

1. Validity evidence regarding the CORA content was obtained through expert interviews and expert ratings of the CORA tasks (for details, see Molerov et al., 2020).

2. Validity evidence regarding the task response processes of the test takers was analyzed by Schmidt et al. (2020) on the basis of log files and eye-tracking data including gaze duration and fixations.

3. Initial validity evidence on the correlations with other variables was obtained by Nagel et al. (2020) through analyzing the extent to which participants’ web search behavior—specifically, the number and type of web pages accessed as well as the quality of the content on the web pages—is related to better task performance and thus to a more critically-reflective use of online information.

In this paper, further validation of the CORA tasks focusing on the two criteria ‘internal structure of the test’ and ‘correlations with other variables’ is presented and critically discussed. In this way, further validity aspects not yet considered are systematically and thoroughly investigated according to the AERA standards to obtain a comprehensive overview of the validity of the CORA. The results of the analyses are combined with the validity evidence outlined above to provide a comprehensive validity assessment of the new COR Assessment.

In Chapter 2, the definition of the COR construct, which serves as a basis for an appropriate interpretation of the CORA test results (Molerov et al., 2020), is explained in more detail. In addition, the COR Assessment framework is presented, including a sample task. Chapter 3 explains the validation approach of CORA, which is based on the model of argumentation-based validation of test score interpretations. According to the argumentation-based validation process (Mislevy et al., 2012), we briefly summarize the results of the previous validation studies on the content validity (section Content validity) and validity of task response processes (section Validity of task reponse processes) of CORA, before the newly obtained validity evidence is presented (sections Internal test structure and Relations with other variables) and integrated with the previous validation work for CORA. Chapter 4 provides a critical discussion of the results including the limitations of the work and an outlook on the further research.

Conceptual background

The COR construct definition

To harness the potential of the Internet for learning, students require a variety of information acquisition and processing skills, which have been previously summarized as such a broad literacy construct as digital literacy (Reddy et al., 2020; Park et al., 2021), which is also related to media literacy (Koltay, 2011), information literacy (Limberg et al., 2012; Sanders et al., 2015; Walton et al., 2020), and computer literacies [e.g., information and communication technology (ICT) literacy, computer and information literacy (CIL); Siddiq et al., 2016; Makhmudov et al., 2020; see also, e.g., studies on multimedia learning, Mayer, 2009]. Particularly for students of higher education, current research presumes basic computer knowledge (Rammstedt, 2013; Schlebusch, 2018) as well as multimedia (Naumann et al., 2001; Goldhammer et al., 2013) and general Internet skills, which are required for self-directed online learning, a given (Rammstedt, 2013). However, numerous studies outline substantial deficits in students’ Internet-based learning in higher education that can hinder their study success. Based on prior research, we are going beyond such broad literacy and general ability concepts, and focus more specifically on modeling and validly assessing actual online information acquisition and processing skills, and in particular critical reasoning based on this online information. When modeling COR, we particularly draw on extended information problem-solving (IPS-I) models (Brand-Gruwel et al., 2009; Huang et al., 2019; Whitelock-Wainwright et al., 2020) to distinguish and describe the main processes involved in self-directed online learning. Thereby, we further expand these models by focusing on processes of argumentation as well as communication, which are not only important for students’ academic success but also key requirements that higher education graduates encounter on the labor market (Braun and Brachem, 2018). These skills can be summarized under the REAS-facet: Reasoning based on Evidence, Argumentation and Synthesis. Therefore, the COR model describes students’ key generic skills not only for searching, evaluating, and selecting—as in IPS-I models–but also additional processes including analyzing, synthesizing, and reasoning from (high-quality) online information, while self-directedly engaging with (more or less domain-specific) content or working toward course-related learning goals, e.g., outside of classrooms (e.g., preparing an essay at home). We differentiate between two main requirement areas for COR processes: generic and domain-specific, e.g., within particular study domains like Medicine or Law (for details, see Molerov et al., 2020). The focus of the analyses presented here is particularly on the generic COR skills required for researching more general topics that are not specifically related to a particular domain (for a differentiation between generic and domain-specific requirement areas for COR in higher education, see Zlatkin-Troitschanskaia et al., 2021a).

In our prior research, we theoretically analyzed the links and overlaps between the existing concepts and models for assessing students’ skills related to COR (for more details on these specific concepts, underlying constructs, and particularly overlaps and distinctions, see our differentiated descriptions in Molerov et al., 2020). Going beyond established abovementioned “literacy” concepts and constructs like digital literacy and multiple source use, we especially draw on the triad model of critical alertness, reflection, and analysis (Oser and Biedermann, 2020). Thereby, we particularly focus on how students analytically reason from as well as justify and critically reflect on online information they used for their higher education studies and infer from and weight arguments and (covert) perspectives of (partly conflicting) sources and information pieces. Based on this theoretical rationale, we specify a set of skills assumed crucial for the acquisition and use of high-quality online information for learning in higher education, which we term Critical Online Reasoning (for details, see Molerov et al., 2020). Thereby, in addition to the abovementioned models and concepts, we also particularly draw on the U.S.-established concept of civic online reasoning. This concept describes the ability to successfully deal with online information and distinguish, for instance, reliable and trustworthy sources of information from biased and manipulative ones (Wineburg et al., 2016). While this concept focuses especially on the handling of online information on political and social topics in particular, our approach of COR has been expanded to encompass all cross-domain topics relevant for students’ learning in higher education and beyond. In addition, we further substantially expanded the concept of civic online reasoning as well as the information problem-solving models by Brand-Gruwel et al. (2009), to cover the whole process of searching, evaluating, selecting, analyzing, synthesizing, and reasoning from online information. In doing so, we also specifically incorporated a new reasoning facet, described as Reasoning based on Evidence, Argumentation, and Synthesis (for details, see Molerov et al., 2020).

To sum up, the COR concept leans closely on previous process and phase models of (online) information search, selection, and evaluation, in particular the information problem-solving models (Brand-Gruwel et al., 2009; Huang et al., 2019; Whitelock-Wainwright et al., 2020). Thereby, we also consider insights from related “web credibility” research, especially on multiple-source use and multiple-source comprehension (Braasch et al., 2018; Goldman and Brand-Gruwel, 2018; Hahnel et al., 2019). We expand the modeling of students’ information use in self-directed learning by adding a new critical reasoning component, i.e., Reasoning based on Evaluation, Argumentation, and Synthesis (REAS). In addition, we also integrate a metacognitive regulative component, i.e., Metacognitive Activation (MCA) skills, that helps students decide when to employ COR skills (e.g., to initiate a critical evaluation; for more details, see Molerov et al., 2020).

Based on this conceptual work, to model and measure COR according to international testing standards by AERA, APA, and NCME (2014) in an evidence-centered design (Zieky, 2014; Mislevy, 2017), we specified its construct definition with three overarching and overlapping cognitive facets:

1. online information acquisition skills (OIA), e.g., selecting search engines or databases, specifying search queries;

2. critical information evaluation (CIE) skills, e.g., evaluating website credibility based on cues; and

3. reasoning skills, e.g., using evidence to generate and justify a valid argument based on a synthesis of accessed information (REAS), including accounting for common errors and biases as well as considering (contradictory) arguments and (covert) perspectives from (possibly conflicting) sources and information.

In addition, metacognitive (MCA) skills regulate the state-specific and situation-specific activation, continuation, and conclusion of COR process within the encompassing information acquisition context, e.g., recognizing the need to use COR in learning-related contexts.

Based on this definition, we established COR as an operationalizable, multifaceted construct of students’ (meta)cognitive skills for goal-oriented and competent use of online information focusing on study-related contexts in higher education (for details, see Molerov et al., 2020).

The COR assessment framework

Methodologically, recent assessment research shows that tests with a closed-ended format are limited when it comes to validly measuring (meta)cognitive higher-order skills such as COR (e.g., Braun et al., 2020). In addition, they no longer do justice to the more recent efforts at universities to ensure the validity of testing procedures, which increasingly aim to holistically measure students’ capabilities to act (Shavelson et al., 2019). Closed-ended tests generally have a limited ecological validity as they fail to measure the procedural skills underlying the processing of (online) information used for learning, and, evidentially, students struggle to transfer the measured skills to more authentic, real-life situations (Ku, 2009; Davey et al., 2015; Desai and Reimers, 2019). It is thus evident that such complex, higher-order skill construct as COR can be more validly measured through performance assessments (Shavelson et al., 2019) that simulate the online information environment and adequately reflect the formal and informal learning contexts and conditions students of higher education experience in real life. The focus on the online information environment is therefore, following the tradition of measuring higher-order cognitive skills by means of performance assessments (Braun and Brachem, 2018; Shavelson et al., 2019; Braun et al., 2020), reflected in task scenarios that employ real websites and Internet searches, including sources, platforms, and services that are typical for current online media.

Since designing and developing new performance assessment tasks is particularly resource-intensive and time-consuming, we first looked for existing assessments, which could be possibly adapted and used to validly measure COR skills. In the past, therefore, we tried to measure COR using an adaptation of an Internet-based assessment developed and validated in the United States by the Stanford History Education Group (SHEG) to assess the abovementioned recently established concept of “civic online reasoning” at the middle school, high school, and college level (Wineburg et al., 2018). It is an innovative holistic assessment of how students evaluate online information and sources, containing short evaluation prompts, real websites, and an open Internet search (Wineburg et al., 2016; Wineburg and McGrew, 2016). The Stanford History Education Group asked students, for example, to evaluate the credibility of information on political and social issues of mostly U.S.-centric civic interest and to justify their judgment, also citing web sources as evidence (Wineburg and McGrew, 2019).

Based on preliminary validation, however, we further developed and expanded the COR assessment framework. Since an adaptation of this US assessment for the German university context was not feasible due to fundamental differences between the systems of higher education in the two countries, the conceptual-theoretical framework was modified and expanded, resulting in the new construct definition of Critical Online Reasoning described above (Section “The COR construct definition”; for more details, see Molerov et al., 2020; Zlatkin-Troitschanskaia et al., 2021a). In this process, a corresponding test definition was developed that provided the basis for the design of new CORA tasks with new scenarios as well as corresponding scoring rubrics to rate students’ responses to the new tasks (for the description of the assessment and the ratings, see Section “Method and design”).

Our newly developed COR performance assessment allows for validly measuring all theoretically defined COR facets (see the section “The COR construct definition”) as we seek to demonstrate with the comprehensive validation presented in this paper.

Validity results

When developing the new COR assessment, the evidence-centered design (ECD) approach of Mislevy (2017) and the Standards for Educational and Psychological Testing of AERA, APA, and NCME (2014) were followed to ensure the development of a valid assessment from the very beginning (see section “Conceptual background”). Consequently, as part of the CORA development, we also developed a student model (based on the construct definition), as well as a task model and an interpretive model (based on the test definition), as—according to the evidence-centered design approach—the alignment of these models is necessary for designing valid assessments (Mislevy, 2017). We also followed the standards according to AERA, APA, and NCME (2014) with regard to test development, scoring, and test quality assurance, in particular by conducting initial validity tests during the development of CORA (Molerov et al., 2019, 2020). These were systematically complemented by analyses of the different types of validity outlined in the following.

Content validity

Molerov et al. (2020) conducted a qualitative evaluation of CORA according to the standards of AERA, APA, and NCME (2014), with focus on the task content, i.e., analyzing the coverage of the theoretically derived COR construct facets by the tasks and the suitability of the requirements and content of the newly developed assessment and corresponding scoring approach for higher education in Germany. For this purpose, they conducted an analysis of the task content by means of 12 semistructured interviews with experts in the fields of computer-based performance assessments in higher education, media studies (focusing on online source evaluation or media literacy), linguistics, and cultural studies, which were then analyzed by means of content analyses.

The experts (1) confirmed that the CORA tasks measure the generic COR ability, (2) supported the assumption that CORA measures test participants’ personal construct-relevant abilities in terms of the defined construct definition, and (3) concurred that no specific domain knowledge is required to complete the tasks. The experts also recommended to expand the scope of the assessment, as it was observed that the tasks might be too difficult for first-year students. In addition, some experts referred to the problem that participants’ prior knowledge, interest, beliefs, or (political) attitudes in terms of the task topic could influence their CORA performance.

The additional content analysis confirmed that the assessment and corresponding scoring scheme included two different types of CORA tasks, each prioritizing a different COR facet (online information acquisition and critical information evaluation; Molerov et al., 2020, p. 20). To implement the indications of these analyses, a task format focusing more explicitly on the reasoning skills facet should be included for future assessments (Molerov et al., 2020, p. 20). Consequently, the tasks were expanded by two subtasks each, with a processing time of 10 min per subtask (see section Conceptual background).

Validity of task response processes

In a second validation approach focusing on the validity of task response processes, Schmidt et al. (2020) investigated how test participants’ cognitive processes during task-solving can be described and to what extent certain empirically distinct patterns exist in the participants’ task-and test-solving processes in relation to COR abilities. Therefore, their test-taking process data were collected through verbalizations, eye movements, response times, and computer clicks during the processing of the CORA tasks. Subsequently, Schmidt et al. operationalized the COR construct in two dimensions: At the level of COR ability, which is represented by the score in the CORA tasks (task performance), and at the level of process performance, which is indicated by gaze fixations and response times in the log files (online information processing).

The results showed that better process performance is associated with significantly higher scores, indicating a relationship between participants’ process performance and task performance. Through an analysis of test-taking processes, the two distinct patterns of avoidance strategy and strategic information processing were identified during CORA task-solving. Participants using the avoidance strategy exhibited both poorer process performance and poorer task performance, i.e., they spent most of their time on only one web page, resulting in many fixations that were all focused on one specific process step. In contrast, participants using strategic information processing showed better performance and more intensive processing of online information through a larger number of (total) process steps, which was in line with the theoretical assumption for CORA (for details, see Schmidt et al., 2020).

Internal test structure

Theoretical background

According to the argumentative validation process following AERA, APA, and NCME (2014), evidence for the validity of the CORA scores and their interpretation could already be shown regarding the CORA content and the test takers’ task response processes; initial evidence could also be obtained for correlations with other variables. The assessment’s internal structure is also an important validity aspect, since analyses thereof can “indicate the degree to which the relationships among test tasks and test components conform to the construct on which the proposed test score interpretations are based” (AERA, APA, and NCME, 2014, p. 13). A performance assessment such as CORA, which includes a free Internet search and open-ended written answers that are evaluated by raters, differs fundamentally from classical test procedures with regard to its structure. Therefore, analysis methods according to classical test theory such as task analyses (e.g., test–retest reliability or internal consistency coefficients) are not suitable for this assessment format as they do not comprehensively take into account the complexity of various possible influencing factors that are incorporated in performance assessments in contrast to conventional closed-ended assessments (Cronbach et al., 1972); for more details, see also Shavelson and Webb (1981) and Shavelson et al. (1989). Following Shavelson and Webb (1981), within the framework of Generalizability Theory, it is possible to sufficiently take into account the specifics of performance assessments. Generalizability Theory distinguishes between different components of the assessment, so-called facets, which can exert an influence on the test scores both individually and in interaction (Cronbach et al., 1963). In CORA, such facets are, in addition to test takers’ varying COR abilities and other individual characteristics, certain characteristics of the tasks used (e.g., task topic, format, formulation, or time limits) and effects by the raters, which can also exert a systematic influence and thus affect the test results (Goldman and Brand-Gruwel, 2018; Solano-Flores, 2020).

While certain influences on the test scores are desirable, in particular those of participants’ differing COR abilities or intentional variation of task difficulty, (uncontrolled) influences, for example those of rater effects, should be minimized. In the context of validating the CORA tasks, it should therefore be determined which influences the individual facets of the assessment exert on the scores and how they may interact with each other. The variance decomposition method used in this study allows for the analysis of the influencing factors across different CORA tasks (Jiang, 2018).

Method and design

The process described in the section “Conceptual background” resulted in the new COR assessment framework, which is a computer-based holistic performance assessment that measures students’ and young professionals’ real-world information-processing, decision-making, and judgment skills. It contains criterion-sampled realistic situations that students may encounter in their public and private lives or when studying and working in professional domains (Davey et al., 2015; Shavelson et al., 2018, 2019). Each task consists of a short context description, an objective, and a request to conduct a free Internet search (for a task example, see Figure 1). The participants are prompted to evaluate the online information they found during their search and to write a short open-ended response based on the information found. As the tasks are characterized by an open-ended information environment, with test takers having unrestricted access to the Internet for COR task processing to holistically capture the process of Internet research, those taking the test have to perform a live, open web search, find relevant and credible information, identify and exclude untrustworthy information, and write a short, coherent statement to answer the task prompt. While a processing time of 10 min per task was originally specified, the format was further adapted after the initial validation and extended to 20 min to capture the three COR facets (see section “Conceptual background”) more validly.

The rating scheme for the scoring of the answers was also accordingly extended and adapted to the new CORA task format, with a greater differentiation and weighting of the individual COR facets aligned with the construct definition. The resulting rating scheme thus distinguishes between six aspects: (1) formulating a clear answer regarding the question, (2) comprehending the task, (3) quality of sources used (for researching general topics as may be encountered in public and private life), (4) accurately evaluating sources, (5) correctly considering arguments of different quality, and (6) giving a reasoned explanation. Depending on the degree of fulfillment, 0–2 points per aspect can be awarded in increments of 0.5, with the respective degree of fulfillment for the point categories described in more detail by behavioral anchors. The different aspects are then included in the overall score with different percentage weightings, depending on their importance to the overall COR construct (for an excerpt of the scoring scheme, see Table 1). While the first part of the task specifically addresses the facets of Online Information Acquisition and Critical Information Evaluation, the second part requires the ability of Reasoning based on Evidence, Argumentation, and Synthesis (see Figure 1).

TABLE 1

Table 1. Excerpt of the COR Scoring Scheme, OIA Facet, Subfacet “Quality of Researched Sources.”

In addition to the written responses, participants’ browsing histories are recorded during their web search for further analysis (Nagel et al., 2020; Schmidt et al., 2020). Subsequently, the participants’ responses are evaluated by trained raters using the newly developed and validated rating scheme, which takes into account the quality of the sources they used, the correctness of their evaluation of the information found, and the quality of their statements. The collected log data are analyzed, for example, in terms of the number of online sources used and the quality and type of web pages accessed. For this analysis, a new media categorization scheme was developed based on established research approaches (Nagel et al., 2020).

To examine the extent to which different test facets contribute to the variance of the test scores, we analyzed their individual contributions to the total variance of the test scores with the method of variance component analysis (Jiang, 2018). To this end, we computed linear mixed-effect models using R (lme4-package; Bates et al., 2015), in which we differentiated the assessment facets person, i.e., influences specific to the individual participants, rater, i.e., influences of rater effects or the scoring method, and task, i.e., influences of task characteristics, as independent variables (Shavelson and Webb, 1981). The test score was used as the dependent variable. The data set was converted for the analyses so that there was an entry in the dataset for every possible combination of characteristics (see Figure 2; Jiang, 2018). Subsequently, we calculated the linear mixed-effect models by gradually adding the person, rater, and task facets as well as the respective interactions, and compared them on the basis of the residuals and the variance explained in each case.

FIGURE 2

Figure 2. Exemplary representation of the dataset format for calculating the variance component decomposition (adapted from Jiang, 2018).

The analyses were conducted with the data of 125 students of economics and economics education at a German university, who participated in the CORA study in 2019–2020. Participants were 61% female, reported a mean age of 22 years (SD = 2.8), and were on average in their second semester of study (SD = 1.82; see Table 2). Participation in the CORA study was voluntary and requested in obligatory introductory lectures. To ensure higher test motivation for their participation in the study, the students received credits for a study module.

TABLE 2

Table 2. Sample description.

The study was conducted via an online assessment platform, which the participants could access individually using access data sent to them in advance. Prior to the survey, the students were informed that their web history would be recorded and that their participation in the experiment was voluntary; all participants signed a declaration of consent to the use of their data for research purposes. Subsequently, the participants were given a standardized questionnaire (approx. 10 min) collecting sociodemographic data such as gender, age, and study semester and their general (self-reported) media use behavior using the validated scale by Maurer et al. (2020). They were also asked to rate the reliability of various media types on a scale of 1 (not at all trustworthy) to 6 (very trustworthy). Due to limited test time, we used a booklet test design. Thereby, students were then given randomly assigned 2–3 CORA tasks to answer (out of a total of six available tasks), which all shared the same structure as well as task description and only differed in topic (for more details on the tasks, see section “Conceptual background”). Participants were asked to enter their written responses to the open-ended questions in the assessment platform, from which they could subsequently log out by themselves. After the assessment, the answers were scored by two trained human raters each, using the newly developed rating scheme (for more details on the scoring process, see section “Conceptual background”), and the scores of all raters for each participant and for each task were averaged to obtain the CORA score¹. Participants’ scores between tasks varied (task 1: m = 0.71, SD = 0.64; task 2: m = 1.3, SD = 0.59; task 3: m = 0.53, SD = 0.66; task 4: m = 0.63, SD = 0.54; task 5: m = 0.77, SD = 0.61) with an average overall score of m = 0.84 (SD = 0.51).

Results

Table 3 shows the results of the model calculations of the linear mixed-effect models (Jiang, 2018). First, separate models were computed for the direct effects of the considered facets person, task, and rater and compared to each other, showing already that in comparison, most variance is explained by the person facet (R² = 0.397), followed by the task facet (R² = 0.164). In contrast, an influence on the part of the raters was hardly observable (R² = 0.076). Even when combining the facets in pairs (M4–M6), the model including person and task explains most of the variance (R² = 0.435). Adding the facet rater in M7 leads only to a slight increase in the explained variance (R² = 0.451). If, in addition to the direct effects, the interaction effects between the facets were also taken into account, the greatest variance explanation was seen in M8, in which the interactions person x task and person x rater were included in addition to the person facet (R² = 0.713). In this model, especially the person x task interaction stands out, which can be interpreted in the sense that there are not only general differences between the performances of the individual participants (direct effect of the person facet), but also that the demonstrated performances of the individual test takers differ depending on the task in question (person × task).

TABLE 3

Table 3. Results of variance decomposition according to G-Theory using mixed-effect models.

In summary, the comparison of the individual facets and their interactions shows that the largest effect on the CORA score is that of the individual test takers’ personal characteristics or their interaction with the different tasks, with the effects of tasks and raters being present but much less pronounced.

Interpretation

The examination of the internal structure of the CORA tasks by means of variance decomposition confirms that, overall, by far the largest part of the score variances is explained by the test takers, as intended in the assessment. Here, it is also important to distinguish between the direct person effects and interaction effects of the participants with the tasks, both of which have an important influence: While the direct effect suggests that interindividual differences (in COR ability) among participants lead to different CORA performance, the interaction effects indicate that participants also perform differently intraindividually depending on the task they are working on. This can possibly be explained by the fact that certain task characteristics (e.g., formulation or the topic of the task) interact with differently developed personal characteristics of the test takers (e.g., different levels of ability in the individual COR facets, certain sociodemographic characteristics, or other personality traits) during task processing. For instance, although the tasks cover general (to the extent that this is possible) social topics, it can be assumed that the participants have a different degree of prior knowledge in certain subject areas due to individual interests, which influences them in their task performance. Which correlations between personal characteristics and CORA performance actually exist, and how these possibly interact with certain task characteristics, must be analyzed in detail in further investigations and falls within the validity criterion of “relationship with other variables” (AERA, APA, and NCME, 2014; section “Relations with other variables”).

While the largest effects can be explained by the test takers, the direct effects of the raters and the tasks turn out to be much smaller, which suggests that, in terms of assessment, there are rather small systematic influences caused by the task properties (e.g., different difficulty) or the rater effects. Nevertheless, it is also necessary to analyze these in further studies, for example, with regard to the task difficulty of individual topics, to ensure the comparability of the respective results. In addition, to be able to draw comparable conclusions about the performance, the tasks should not be used alone, but, as intended in the assessment, rather in combination if possible.

Overall, the analyses confirm that the CORA scores indeed reflect differences in the performance of the participants and are only marginally influenced by rater effects and task properties, which also speaks in favor of maintaining the methodological approach used (rating scheme, rater training, and standardized structure of the tasks).

Nevertheless, it is important for the further development and interpretation of CORA to investigate the causes for the found rater effects more closely and, if necessary, to make adjustments with regard to the rating scheme, the training, and the selection of the raters. Even if the content validity of the tasks and the developed rating scheme has already been demonstrated by the findings of Molerov et al. (2020), it should also be ensured in further analyses and, if necessary, expert interviews that these actually cover only the COR skills and do not, for example, systematically disadvantage individual groups of people due to the task topics (e.g., men/women might have different preferences on health-or sport-related topics).