Spatial skills, which represent the ability to visualize and imagine manipulating objects in one’s mind, are necessary for success in the science, technology, engineering, and mathematics (STEM) fields and are particularly relied upon by geoscientists. Although scholars recognize the importance of these skills, explicit training is inconsistently offered throughout courses. Furthermore, the relationship between spatial training and students’ perspectives on STEM fields is underexplored. To address this, we developed a case study that included over 700 students enrolled in introductory geology classes over three semesters. These students were randomly divided into control and experimental groups; the experimental group completed 10 spatial training assignments, and the control group completed the course as usual. We relied on situated expectancy-value theory to interpret changes in students’ perceptions of both the course and science overall, and asked the following research questions: (1) Do students who complete the spatial training assignments (i.e., the experimental group) have a statistically significant improvement in their final course grade, self-efficacy, and/or value when compared to the control group? (2) If so, what are the effect sizes of these changes? (3) Is there a minimum number of trainings that need to be completed to achieve this effect? (4) Is there a maximum number of spatial training assignments where we stop seeing improvement (ceiling effect)?

We surveyed all students enrolled using 38-question pre- and post-assessments of their self-efficacy, defined as a belief in their ability to succeed, and value. We found that between the control and experimental groups, there were significant differences in students’ pre- to post-changes in perception of science self-efficacy, class self-efficacy, and class value. We found non-significant between-group differences in final grade and science value. We interpret this to mean that using weekly spatial training assignments could increase students’ perceived self-efficacy in their introductory geology course as well as in science more broadly, potentially having ripple effects that support students’ long-term engagement with the sciences.

Findings suggest that practitioners should include explicit spatial training in their courses to improve students’ perceptions of the course and science overall. Building on this work may include describing to students the purpose of spatial training (which was deliberately avoided in this study) and outlining the research that supports the relationship between spatial skills and success in STEM fields. Future directions may also include longitudinal tracking of spatial and related skill development throughout students’ college careers.

Spatial skills have been repeatedly shown to be related to entrance, persistence, and success in the science, technology, engineering, and mathematics (STEM) fields (Pallrand and Seeber, 1984; Lord, 1990; Coleman and Gotch, 1998; Shea et al., 2001; Webb et al., 2007; Wai et al., 2009, 2010; Buckley et al., 2018), and have been found to be especially valuable to geoscientists (Hegarty et al., 2010; Ormand et al., 2014; Newcombe and Shipley, 2015). While the importance of these skills for geoscience students is known (e.g., Mosher, 2014, 2015; Mosher and Ryan, 2019), explicit training in spatial skills is not offered systematically at the kindergarten through twelfth-grade level (National Research Council and Geographical Sciences Committee, 2005). Therefore, this is an important set of skills to teach in undergraduate courses, and many college-level instructors report including some kind of training in their courses (Viskupic et al., 2021). Because spatial skills could serve as a filter for who is able to join and be successful in the geosciences, early training should be a priority for the geoscience community. Studies have repeatedly shown that spatial skills are malleable and can be improved through training (Baenninger and Newcombe, 1989; Wright et al., 2008; Terlecki et al., 2008; Uttal et al., 2013). Training students in these skills early is crucial to ensure that they can be successful and fully participate in their STEM courses (Sorby and Baartmans, 1996; Stieff and Uttal, 2015). Providing training in these skills may especially benefit students who are interested in choosing a STEM major.

While as a field we recognize that spatial skills are related to success in STEM, and the geosciences in particular, we have less understanding as to why that relationship exists. Here, we consider how situating spatial training within a course may play a role in students’ success in science, specifically in an introductory geology course. By providing students with repeated opportunities to build their spatial skills in the form of short, low-stakes assignments, we prepare students to engage with more complex material (e.g., determining the relative age of units in cross section, preservation of fossilized ripple marks), thus encouraging their success in geology and STEM more broadly. We hypothesize that completing these spatial training assignments would develop students’ belief in their ability to succeed (self-efficacy) in the course and science overall, and improve their perceived value of the course and science. Additionally, we hypothesize that students who complete these trainings will experience greater academic success as measured by course grade.

This study seeks to answer the following research questions (RQs): (1) Do students who complete the spatial training assignments (i.e., the experimental group) have a statistically significant improvement in their final course grade, self-efficacy, and/or value when compared to the control group? (2) If so, what are the effect sizes of these changes? (3) Is there a minimum number of trainings that need to be completed to achieve this effect? (4) Is there a maximum number of spatial training assignments where we stop seeing improvement (ceiling effect)?

To address these questions, we designed a case study in which introductory-level geology students were randomly divided into control and experimental groups. Those in the experimental group were provided with 10 of the 11 spatial training assignments that were created and validated by Gold et al. (2018a). Students enrolled in the control group were given an alternative assignment that was similar in terms of the time commitment required but was deliberately non-spatial in nature. Completion of either the control or experimental assignments accounted for 5% of the students’ final grade in the course.

The Relevance of Spatial Skills

Spatial skills are what “…enable us to manipulate, organize, reason about, and make sense of spatial relationships in real and imagined spaces” (Atit et al., 2020, p. 1–2). Although spatial skills have been shown to be important in STEM disciplines in a broad context, studies have shown that they are particularly relevant to geoscientists (Hegarty, 2014; Gagnier et al., 2016; Atit et al., 2020). From predicting the geometry of Earth’s interior based on surface data, to finding the best surface upon which to take a measurement at an outcrop, to visualizing global ocean currents, problems that require spatial skills permeate the entire geoscience discipline.

The term spatial skills encompasses many individual factors including, but not limited to, mental rotation, spatial visualization, and spatial relations. Of those skills, some have been determined to be more important in the geosciences. These factors include penetrative thinking, frames of reference, disembedding, mental folding, and scaling (Hegarty, 2014; Gagnier et al., 2016). This conclusion is supported by the discipline-based studies available on these specific factors. For example, Reynolds (2012), Ormand et al. (2014), Cohen and Hegarty (2014), Gold et al. (2018a), and Kreager et al. (2022) focused on some combination of mental rotation, penetrative thinking, and disembedding. Given that geology majors often discover the science in their introductory courses (Brock et al., 2006; Houlton, 2010; LaDue and Pacheco, 2013), it is pertinent that spatial skills be trained early (Stieff and Uttal, 2015) before students begin additional spatially intensive courses such as Structure, Sedimentology and Stratigraphy, and Mineralogy (Hambrick et al., 2012; Atit et al., 2020; Klyce and Ryker, 2022; Kreager et al., 2022). Developing a more solid foundation of spatial skills in these introductory courses can go beyond helping all students master content by setting majors up for greater success in their academic progression.

Because geology is such a spatially intensive discipline, we hypothesized that offering specialized training in spatial skills would be beneficial to students’ overall success in geology courses. This hypothesis is based on the idea that training spatial skills would improve students’ self-efficacy and value. With spatial training, we argue that students would be better equipped to tackle the spatial challenges given to them in geology classes. We propose that, in part, this influence on student behavior is due to element interactivity effect, a component of cognitive load theory.

Cognitive load theory (Sweller et al., 2019) posits that human working memory can hold a limited number of pieces of information at a given time. Instructors utilize this theory when they avoid overloading students with novel information. However, sometimes it is necessary to use multiple pieces of information at the same time—an idea known as element interactivity effect. Element interactivity represents how many things students must hold in their minds at a time to understand and solve a problem. For example, when determining the series of events that created a cross section, students must simultaneously recall rock types, unconformity types, the law of original horizontality, and depositional environments. Sweller et al. (2019) remind us that element interactivity varies based on the knowledge of the student and the nature of the information. Some of this information (e.g., unconformities) is engrained in the minds of experts, whereas novices have to grapple with all of this information to solve the problem. In our study, we posit that providing spatial training would reduce the cognitive load of future course assignments by increasing the spatial skills (knowledge) of students. This increase in spatial skills would decrease the potentially negative element interactivity effect on learning. By explicitly teaching students the spatial skills necessary for success, we increase their utility and interest in the material (i.e., if you understand the content, then you can actually use and enjoy it) and lower the cost (i.e., effort) necessary for them to understand the material. Testing our hypothesis involves determining if training spatial skills in geology students increases their self-efficacy and value, which in turn would impact their ability to succeed (as described by situated expectancy-value theory).

Discipline-Based Research on Spatial Skills

While much of the work done to understand spatial skills was done by psychologists and cognitive scientists (Miyake et al., 2001; Hegarty et al., 2006; Uttal and Cohen, 2012; Uttal et al., 2013; Newcombe and Shipley, 2015; Hegarty 2014), a new wave of research exploring the implications for specific disciplines was conducted via discipline-based education research (Shipley et al., 2013; Ormand et al., 2014, 2017; Sanchez and Wiley, 2014; Atit et al., 2015; Gagnier et al., 2016; Gold et al., 2018a, 2018b; Hannula, 2019; LaDue et al., 2021; Polifka et al., 2022; Kreager et al., 2023). This more recent work explored the implications of preparing students by offering training specifically designed to enhance their spatial skills. The spatial training used in these studies varied from domain-general (mentally rotating blocks) to domain-specific (visualizing three dimensions on topographic maps). For example, Ormand et al. (2017) employed the use of both training types. To develop students’ domain-general spatial skills, students engaged in activities such as sketching objects or slicing through three-dimensional objects (such as a strawberry). Conversely, to train their domain-specific spatial skills, students used gesturing to demonstrate Miller indices and sketched geologic block diagrams. Ormand et al. (2017) found that geology majors trained in both types of spatial skills improved their ability to solve spatially intensive geological problems. By studying spatial skills in the discipline, the benefits of training both domain-general and specific skills can be tested, including whether it is necessary to scaffold from general to specific spatial skills.

Theoretical Framework: Situated Expectancy-Value Theory

Situated expectancy-value theory (Eccles and Wigfield, 2020) posits that believing one is capable of completing a task, and finding the task to be valuable, contributes to overall achievement-related choices and success in completing a task. This theory lends itself to understanding how to promote student interest as well as increase effort and engagement (Wigfield, 1994; Eccles and Wigfield, 2002; Bandura, 1994). The core concepts of situated expectancy-value theory are self-efficacy and value. Self-efficacy is one’s belief in their ability to achieve the goals they set for themselves and has been described as a part of goal setting, effort, and resilience (Bandura, 1994). Insufficient self-efficacy lowers students’ goals and reduces the effort they are willing to invest to achieve those goals (Bandura, 1994). Value is broken down into intrinsic (interest), attainment (relevance to identity), utility (usefulness), and cost (sacrifice) (Eccles, 1983). Value is also related to students’ self-schema, perceptions of their own abilities and perceptions of how difficult tasks are (Eccles and Wigfield, 2002). While situated expectancy-value theory informs us that there are many factors that contribute to one’s overall performance, such as interpretation of experience and perception of others’ beliefs, Eccles and Wigfield (2020) suggest that all of these other factors load onto value and expectation of success (self-efficacy) before they affect achievement-related choices and performance. For this reason, it is possible to measure changes in self-efficacy and value and conclude that an intervention had some impact on the student. However, it is possible for that impact to occur anywhere on the flowchart (Fig. 1). Changes in these measures are also easier to assess than changes in other components of situated expectancy-value theory, such as “socializer’s beliefs,” which would require evidence about or from those outside of the classroom.

In this study, we look through the lens of situated expectancy-value theory to assess whether or not training spatial skills in introductory geology students has an impact on students’ efficacy and value. While these trainings could impact different points throughout the model, we chose to measure the final stages (self-efficacy and value) because they include the impact of any prior changes (e.g., a change in self-concept of one’s abilities or previous achievement-related experiences; see Fig. 1). If we are successful in increasing efficacy and value, then we would expect to see improvements in students’ achievement in the course. Situated expectancy-value theory builds on expectancy-value theory (Eccles, 1983) by emphasizing the contextual nature of students’ interpretations of experiences, which in turn affect choices such as taking additional classes or changing majors. This is especially relevant in this study because if students have a positive interpretation of their experience completing their spatial training assignments, then they will have a more positive outlook when completing spatially oriented tasks in their geology course. Having this reinforced self-efficacy and value supports students’ ability to continue engaging with the course, making achievement-related choices, and their overall success.

We posit that by training spatial skills in a broad context (i.e., through domain-general weekly training assignments), students will have a greater sense of self-efficacy when completing inherently spatial assignments in their introductory geology course. We also posit that by having greater abilities to solve spatial problems, we will lower the cost (e.g., cognitive load) associated with completing spatial tasks in the course. Reducing the overall cost, or effort necessary to complete assignments, in-turn increases students’ perceived value of the material. We propose that the spatial training assignments can reduce the cost associated with completing course assignments by reducing the overall element interactivity effect students experience on future assignments (Hanham et al., 2017; Sweller et al., 2019). This is done by increasing students’ spatial skills (knowledge), which better equips them to solve spatially oriented problems during their introductory geology course.

The goal of this study is to determine if implementing online spatial training assignments in an introductory geology course increased students’ self-efficacy, value, and success. To accomplish this, we utilized a case study (Yin, 2012; Fig. 2) in which the context is the introductory geology class and the cases are the experimental group (receiving spatial training) and the control group (receiving no training). The embedded units of analysis are self-efficacy, value, and success. In this section, we describe the spatial training assignments and corresponding placebo assignments, and then the efficacy-value survey, determination of success in the course, and the study context, and we conclude with a description of how data were cleaned and analyzed. Because instructors emphasizing the importance or relevance of assignments could alter students’ beliefs about and feelings toward the assignment, we tried to limit the amount of instructor discussion about the spatial training assignments as well as the placebo assignments. When students asked what the purpose of these assignments was, they were told that “the assignments contribute to a well-rounded geologic education” or something similar to minimize external effects on self-efficacy and value.

Spatial Training Assignments

We used an adaptation of the spatial training assignments developed by Gold et al. (2018a) as the intervention in the experimental group. These assignments include exercises such as inferring what the cross-sectional outline of an object would be if it were cut at different angles and locations (Santa Barbara Solids Test; Cohen and Hegarty, 2012; Fig. 3) and predicting how an object would look from different perspectives (Object Perspective/Spatial Orientation Test; Hegarty and Waller, 2004; Fig. 3). Each of these assignments was deployed using an online survey platform. Students could complete the training assignments an unlimited number of times. We adapted this training by providing 10 of the 11 total spatial training exercises as student assignments over the semester. We removed one of the original 11 exercises to accommodate for the pre- and post-efficacy/value surveys at the beginning and end of the semester (Table 1). Each training assignment took about 15 minutes to complete.

Placebo Assignments

The placebo assignments were designed to be similar to the spatial training assignments in length and effort but were deliberately not spatial in nature. These assignments were similar to tasks typically asked of students taking introductory science courses and included summarizing current events in geoscience, writing summaries of scientists contributing to the discipline, and writing reflections on their study practices for an upcoming exam (Fig. 3). No incentives were provided for students to complete an assignment more than once.

Efficacy-Value Survey Development

A 38-question survey with four sections was designed to capture personal or demographic data (13 questions) and measure science self-efficacy (six questions), class self-efficacy (seven questions), as well as science and class value (six and five questions, respectively). An attention check question was used to confirm that students were thoughtfully completing surveys (see Data Cleaning section below for further details). The four sections of the survey align with different scales—that is, groups of questions measuring the same construct or idea. The two science-level scales were used to assess student attitudes toward science generally, while the two class-level scales were specific to the introductory geology course in which students were enrolled. To assess these variables, 10 novel survey questions were developed, and 14 others were revised from previously validated surveys known as the Science Motivation Questionnaire II and the Citizen Science Survey (Glynn et al., 2011; Hiller and Kitsantas, 2016). A full list of survey questions and their sources is available in Supplemental Material Table S1.1 Existing surveys were adapted by replacing terms (e.g., “in science classes”) with those relevant to this study (e.g., “this class”). Each question that assessed variables (i.e., not demographic questions) was administered using a Likert scale from one to six, and the responses to these questions were analyzed as continuous variables. The final course survey had a total of 24 Likert-scale questions to assess self-efficacy and value, one attention check question, and 13 questions related to personal identification and demographics.

We piloted the survey with 14 Geology 101 students (N1) at a large research university in the Southeastern United States in early Spring 2020. Pilot participants were encouraged to provide any and all feedback to a researcher present during the pilot survey. Feedback from participants included positive feelings regarding the length of the survey, confusion regarding the personal identification number used, and a question related to the participants’ length of time in college. The latter two questions were clarified before the survey was provided to the remaining Geology 101 students (N2 = 341).

Success in the Course

While there are many ways that success in an introductory geology course could be measured (e.g., likelihood of taking another geology course, decision to switch major, average score on laboratory assignments), we chose to use final course grade to provide the most generalizable and holistic view of student knowledge. We chose this to best account for different semesters, instructors, course design, and course delivery methods. There are limitations to this choice, including, but not limited to, assignments being weighted differently from semester to semester, variation in topics covered, and external factors that impact student success (e.g., the ongoing COVID-19 pandemic). However, we argue that final grade serves as a measure of the instructor’s perception of the student’s success in the course based on how the instructor values different course components, to which students are exposed from the course’s introduction (i.e., in the syllabus). We specifically chose not to use grades on laboratory assignments because standardized rubrics are not used across lab instructors; additionally, summative assignments (quizzes and tests) were not used alone because their administration varied among professors (e.g., subjects covered and test style).

Study Context

Data were collected from students enrolled in introductory geology classes taught at a large research university in the Southeastern United States from Spring 2021 through Spring 2022 (a total of three academic semesters). This four-credit hour course includes a required weekly laboratory taught by graduate teaching assistants. The course can be used to fulfill students’ general education science requirement and regularly enrolls 150–300 students per section. Approximately 80% of the students enrolled in this course identify as freshmen or sophomores, 75% as non-Hispanic White, and 93% as between 18 and 21 years old. Use of the experimental and control assignments varied slightly among semesters based on instructor input. These differences are described in the following sections and outlined in Table 1. As discussed further below, the instructor was included in statistical models as a random variable; this allowed us to simultaneously account for differences in data as well as variance due to the instructor.

Spring 2021

In this semester, 134 students participated in the study, with 77 in the experimental group and 57 in the control group. The introductory geology course was offered in an online synchronous format taught by a single lecturer. During the first week of the semester, students were randomly assigned to the control group or experimental group. All students were instructed to complete the efficacy-value survey within the first two weeks of class. Students were provided with course credit for completing these assignments. In total, survey completion amounted to 5% of students’ final grade in the course (or ~0.36% per survey). This 5% was earned by completing either the experimental group’s spatial training assignments or the control group’s placebo assignments, in addition to the pre- and post-surveys.

Fall 2021

Two sections of introductory geology were taught by different lecturers in Fall 2021. One instructor opted to implement the spatial training assignments, resulting in one section serving as the experimental group (102 students enrolled in the study) and one as the control group (226 students enrolled in the study). Because the experimental and control groups were in different course sections, no placebo assignments were given to students in the control group, based on instructor preference. Instead, extra credit was offered to students in the control group course who completed the pre- and post-surveys. Students in the experimental course continued to earn 5% of their final course grade by completing the spatial training assignments.

Spring 2022

During this semester, the introductory geology course was once again taught by two lecturers in two different sections of the course. In contrast to Fall 2021, however, each section was divided into control and experimental groups (rather than each section representing one group). In total, 155 students were enrolled in the experimental group, and 116 were enrolled in the control group. The placebo assignments were implemented once again. All students received 5% of their final course grade by completing either the spatial training assignments or the placebo assignments.

Data Cleaning

Efficacy Value Survey

Students completed the efficacy value survey via a link administered through our learning management system. Because survey completion did not require login, students were asked to provide multiple forms of identifying information, which were used to pair pre- and post-survey data. Once data were matched, students’ responses were paired with additional information from the learning management system (e.g., final grade and number of assignments completed).

Student data were matched using reasonable assumptions (e.g., two survey responses with the names “John Doe” and “Johnathon Doe” but the same student identification represented the same individual). When students completed either the pre- or post-survey more than once, we only retained their first survey response for analysis. When students had multiple submissions, the submissions were often identical (indicating students had clicked the submit button twice) or the second submission was done at or near the deadline (suggesting that students could not remember if they had already completed the assignment). For the latter case, we determined that the first response was more often the most thoughtful response and chose to retain it over the later submission. Students’ pre- and post-surveys were subjected to a series of filters to ensure the reliability of the data. These filters included removing surveys with failed attention checks (n = 155 survey submissions) and surveys submitted outside of the collection period (n = 7 survey submissions). Additionally, if a student was repeating the course (n = 8 students) or received a final grade below 40% (n = 24 students), their surveys were not used for the final analysis. This decision was made to ensure that data reflected only perceptions of this version of the introductory geology course, and that students had not stopped participating in the course. The average final grade of students included in the study was 85.57%, with a standard deviation of 10.31%, which suggests that students scoring below 40% were atypical, scoring more than four standard deviations below the mean. We also manually reviewed a random subset of six of these students to confirm that they had stopped participating in the course, having not submitted assignments after the end of the first month of the course. This resulted in a total of Nfinal = 733 students enrolled in the study, some of which only had partial data due to the filters outlined above.

Weekly Training Assignments

A total of 430 students from the experimental group completed one or more of the weekly training assignments. We calculated the number of unique trainings they completed (maximum of ten). These counts were used to determine if there is a maximum or minimum number of trainings that leads to improvement on the scales measured.

Data Analysis

Analyses were conducted in R (v. 4.2.1). Linear mixed models were estimated using restricted maximum likelihood estimation using the package lme4 (v. 1.1–31). As lme4 does not generate p-values, these models were submitted to the package lmertest (v. 3.1–3) to obtain estimates of p-value. The emmeans package (v. 1.8.2) was used to estimate marginal means and perform follow-up tests on linear mixed models, and the package effsize (v. 0.8.1) was used to calculate Cohen’s D values.

(RQ 1) Do students who complete the spatial training assignments (i.e., the experimental group) have a statistically significant improvement in their final course grade, self-efficacy, and/or value when compared to the control group?

We used a linear mixed modeling strategy to test the hypothesis that completing spatial training assignments will have a significant, positive impact on final grade, self-efficacy, and value. This method allows us to account for variance by instructor (using a random intercept), with a nested intercept of students. The nested intercept allows the model to correlate timepoints from the same student. For all models, the instructor served as a random intercept. The mixed model to test changes in self-efficacy and value included group (experimental or control) and time (change from pre- to post-survey score) as fixed effects. When modeling final grade as the outcome, we only considered the group to be a fixed effect, because there were no pre-scores for final grade. To ensure that results were robust to violations of normality, significant effects were rerun using robust linear mixed modeling as implemented by the R package robustlmm (3.0–4), which conducts rank-based statistical analysis. Because the group-by-time interaction is the effect that tests our hypotheses, we only present these effects in the results. For significant group-by-time interactions, we conducted follow-up tests within groups using estimated marginal means.

(RQ 2) For survey scales that were significantly different between the control and experimental groups, what is the effect size of the change?

To determine the effect size of these changes, we determined Cohen’s D for each outcome and interpreted these according to the benchmarks provided by Cohen (1988). To assess practical significance, we compared mean difference scores across groups against thresholds that we determined to represent a meaningful change. These included a 3% or more change to final grade and a 0.3 change, on average, for each survey scale.

(RQ 3) Is there a minimum number of spatial training assignments that need to be completed to achieve positive effects in the experimental group?

Using a separate mixed model, we conducted an additional analysis including only the experimental group to determine whether or not there were specific effects based on the number of trainings completed. The mixed model with final grade as the outcome measure had a fixed effect for the number of spatial trainings completed and a random intercept for instructor. Mixed models for the other outcome measures included fixed effects for time, spatial trainings completed and their interaction, and random intercepts for instructor and participant.

(RQ 4) Is there a maximum number of spatial training assignments where we stop seeing improvement (ceiling effect)?

Final grade was the only outcome related to the number of trainings completed (RQ 3). To assess whether or not there was a ceiling effect, or maximum number of trainings where we stopped seeing improvement, we visually inspected the relationship using a scatterplot comparing the outcome variable and the number of spatial trainings completed (Fig. 4). We chose this method of analysis because we determined the study was not sufficiently powered to conduct pairwise comparisons for each number of spatial trainings, and because of the convincing evidence of a linear relationship produced in the mixed model regression.

This study presents one possible explanation as to why spatial skills are related to increased entrance, success, and persistence in STEM disciplines. While there is a measurable relationship between efficacy, value, and training spatial skills, other theoretical frameworks could demonstrate a more robust relationship. This study primarily considered the spatial training assignments as a factor that affects student-perceived self-efficacy and value, as well as final course grade. However, other factors could have influenced these metrics, such as relationships with an instructor or teaching assistant, mental health during the COVID-19 pandemic, and previous experiences with science.

Both a strength and a limitation of this study is that descriptions of the spatial training and control assignments were not given to students (see Methods: Spatial Training Assignments and Methods: Placebo Assignments sections). This could have negatively impacted students’ perceptions of the assignments, or the class overall, given that explaining the relevance and importance of assignments improves students’ perception of them (Maltese and Tai, 2011, 2011; Anderson et al., 2013). Another limitation to consider is the differences in the study context from semester to semester. The exact procedure had to evolve with the changing classroom (e.g., ability to divide students in the same class into control and experimental conditions), meaning that we did not consistently implement either the control or experimental assignments (see Table 1). This could have altered the measurable impact the experimental assignments had on students. Finally, it is possible that a relationship found between final grade and the number of spatial training assignments completed could be due to student behavior instead of the intervention itself. For example, higher performing students are also more likely to complete all course assignments, making it difficult to determine the impact of a higher number of completed spatial trainings on grade.

(RQ 1) Do Students who Complete the Spatial Training Assignments (i.e., the Experimental Group) Have a Statistically Significant Improvement in their Final Course Grade, Self-Efficacy, and/or Value When Compared to the Control Group?

There were significant group by time interactions for the outcomes science self-efficacy, B = 0.17, t(451) = 2.37, p = 0.018; class self-efficacy, B = 0.28, t(463) = 3.96, p < .001; and class value, B = 0.17, t(467) = 2.15, p = 0.032. For science self-efficacy, estimated means for students enrolled in the experimental group had a trend-level (p < 0.1) pre- (estimated marginal mean = 4.7) to post-course (estimated marginal mean = 4.81) increase, M = 0.11, t(447) = 2.02, p = 0.088, 97.5% CI [.01, 0.22]. M is mean; CI is confidence interval. For this same measure, the control group had a decrease in the estimated mean, but this decrease was not statistically significant: pre- (estimated marginal mean = 4.67) post-course (estimated marginal mean = 4.61), M = −0.066, t(457) = −1.32, p = 0.375, 97.5% CI [−.18, 0.05]). Similarly, analysis showed an increase in means on the class self-efficacy scale for the experimental group (M = 0.16, t(458) = 3.1, p = 0.004, 97.5% CI [0.04, 0.27], robust regression, p < 0.001), and a significant decrease in the means for the control group (M = −0.122, t(469) = 2.48, p = 0.027, 97.5% CI [−.23, −0.01]), although this effect was no longer significant when applying robust regression (p = 0.147). Conversely, the class value scale was not significantly different from pre to post for students in the experimental group (M = 0.04, t(462) = 0.66, p = 1, 95% CI [−.09, .16], robust regression p = 0.2), but had a significant decrease in mean for the control group (M = −0.13, t(473) = −2.42, p = 0.032, 95% CI [−.25, −0.01], robust regression p = 0.014). None of the other group by time interactions were significant (all p-values >0.278).

(RQ 2) For Survey Scales that Were Significantly Different Between the Control and Experimental Groups, What Is the Effect Size of the Change?

We determined the effect size (Cohen’s D) for each outcome. There were significant differences for science self-efficacy, class self-efficacy, and class value, and we found that there was a small effect size for each (d = 0.22, 0.37, and 0.20, respectively; Table 2). None of these outcomes met the predetermined standard of a mean change of 0.3 point increase on the Likert scale. The mean score on the science self-efficacy scale trended toward an increase (4.70–4.8, mean change of 0.11) in the experimental group and trended toward a decrease (4.67–4.61, mean change of −0.06) in the control group. Similarly, class self-efficacy for the experimental group had a significant increase (5.00–5.15), and a trend toward a decrease for the control (5.00–4.88). The mean change in class self-efficacy for the experimental group was 0.15 and −0.12 for the control group. This pattern was not the case, however, with class value, which had no significant pre- to post-change in the experimental group (4.51–4.55, mean change of 0.04) but had a significant decrease in the control group (4.34–4.21, mean change of −0.13). Final grade and science value, which did not have significant results in our linear model, had insignificant effect sizes (0.08 and 0.10, respectively). The mean change for science value was 0.05 for the experimental group and −0.02 for the control group.

(RQ 3) Is There a Minimum Number of Spatial Training Assignments that Need to Be Completed to Achieve Positive Effects in the Experimental Group?

While the experimental and control groups did not significantly differ on final grade, within the experimental group students who completed more spatial trainings did have a higher final grade, B = 2.42, t(323) = 12.41, p < .001. Although completion of each survey was only worth completion of each survey was only worth 0.36% of students’ final grades, we determined a beta coefficient of 2.42, meaning that for each additional training completed by a student, the model predicted a 2.42% increase in students’ final grades. There were no significant effects on any survey outcome based on the number of trainings students completed.

(RQ 4) Is There a Maximum Number of Spatial Training Assignments Where We Stop Seeing Improvement (Ceiling Effect)?

We chose to only examine the possibility of a ceiling effect for final grade because it was the sole outcome found to be related to the number of trainings completed. The visual inspection showed that there was not a point where the number of trainings completed ceased to be related to final grade, demonstrating that there is not a measurable ceiling effect (Fig. 4). It is important to remember that the final grade outcome is a confounding variable, as completion of the spatial training assignments contributed to students’ final grades. This is discussed further in the Limitations and Discussion sections.

(RQ 1) Do Students Who Complete the Spatial Training Assignments (i.e., the Experimental Group) Have a Statistically Significant Improvement in their Final Course Grade, Self-Efficacy, and/or Value When Compared to the Control Group?

We found that there were statistically significant differences in pre- to post-change scores between the control and experimental groups for three of the four survey outcomes. Of these metrics, we found that the experimental group had an increase in class self-efficacy and a trend toward an increase in science self-efficacy, while there was a significant decrease in the control group’s class value, and no change in the experimental group’s class value. There were no significant differences in science value pre- to post-change scores between the control and experimental groups, nor were there significant differences between final grades in the control and experimental groups.

It would be reasonable to deduce that the science and class self-efficacy scales are related to one another, and that by increasing students’ self-efficacy in the course we would be able to increase it in science overall. For example, Dalgety et al. (2003) found that the presence of chemistry self-efficacy increased science self-efficacy. However, the lack of change in science value, even after we saw decreases in class value for the control, suggests that there is a more complex story to tell.

Improving science and class self-efficacy could still have impacts on students’ achievement-related choices (Eccles and Wigfield, 2020), interest in pursuing another course (Gold et al., 2018a), or other behaviors. We propose that these increases could be due to students recognizing spatial-related problems throughout their course materials and believing in their capability to solve them because of their success on the weekly spatial training assignments. Believing in oneself and developing self-efficacy would make students feel better while completing assignments, encouraging them to complete more of them. It is also possible that students’ class self-efficacy increased as they realized their ability to complete these spatial training assignments, therefore improving their belief that they are capable of succeeding in the course overall. This hypothesis would be best supported if we determined that it was the ease of these assignments, especially in comparison to the control group assignments, that influenced students’ beliefs and behavior. From other work we know that using frequent, low-stakes assignments improves student performance (Sotola and Crede, 2021), and that completing easy assignments can support the development of self-efficacy (Schunk, 1989). This is a question to be answered in future work, where we recommend that this study is completed with student interviews and a qualitative approach to data analysis. Finally, a possible explanation for this result is that students’ spatial skills have been improved, giving them the skills necessary to be successful in the course and on their assignments. This would be consistent with other available research. For example, Kreager et al. (2022) found that spatial skills are related to successfully interpreting a Wheeler diagram, McNeal et al. (2018) found that spatial skills were used to read and interpret meteorological charts, and Shipley et al. (2013) demonstrated that structural geologists use spatial skills to interpret cross sections and field sites. As students repeatedly succeed, their overall self-efficacy increases.

We were surprised to find that the control group had a measured decrease in class value during the semester, meaning they found the course less valuable to them upon completion of the course than they did at the beginning. This effect was buffered, however, for students enrolled in the experimental group, where there was no significant pre- to-post change in class value. This finding suggests that the experimental intervention does have an effect on class value, just not in the way we predicted. This is meaningful because it shows that spatial training can maintain students’ perceived value of the course, indicating they continue to find the course to be meaningful to their career, life goals, academic success, or to them personally. We propose that spatial training assignments increase students’ value because they help students fully understand the course content, rather than having to accept the course content as fact without having a complete grasp on it. Spatial skill training could provide students with the opportunity to thoughtfully engage with and think critically about the material, thus improving their sense of value (Jones, 2012). It is unclear whether the control group’s decrease in class value is consistent with findings from other courses, particularly introductory courses geared toward non-majors. Gilbert et al. (2012) found that the task value (usefulness) of introductory geology courses was higher for STEM majors than non-STEM or undecided majors. They also argued that a lack of success early in the course could lead to a “potential downward spiral” (p. 368) over the semester, which could be mediated by explicit connections to the relevance of course material. Additional work is needed to understand whether this semester-long decrease in class value with more traditional course assignments is “normal” and whether spatial assignments are the best interventions to address that decrease.

Science value did not have significant changes in either the control or the experimental group. Because we anticipated that class and science value would be related to one another, this finding is especially interesting. We interpret this lack of change to be an indicator that students are not seeing the connection between their introductory geology course and science as a whole. In this sense, we propose that students are able to perceive the attainment, intrinsic, utility, and cost value of their introductory geology course and its assignments, but may not be able to see this for science more broadly. This finding is not unprecedented, as others have reported that STEM courses are often perceived as “unrelated to reality” (Christensen et al., 2014, p. 174). This could be because though students are actively involved in their courses, but may not be able to conceptualize their relationship with science. In introductory geology, students are able to improve their sense of value by completing assignments and attending class. Conversely, students may not see how their actions or activities relate to science more broadly and therefore are unable to improve their sense of value. Increasing value, especially attainment value, relies on students’ ability to self-identify how tasks, topics, and ideas relate to their sense of self. If students struggle to develop a sense of science identity, perhaps because of gender, background in science, or different social experiences then they would also have a harder time increasing their sense of value (Robinson et al., 2018, 2019).

The lack of change to science value for both groups should serve as a call to action for instructors. We have known for years that introductory geoscience instructors, especially geology course instructors, have not emphasized the connections between course topics and societies (Egger, 2019). The geosciences are uniquely positioned to help students see how their understanding of science connects to their daily lives. Whether it be learning about natural hazards such as flash flooding or mass movements or understanding the long-term processes that lead to global climate change, the geosciences relate to all aspects of the human experience. Helping students see the connection between their lives and what they are learning motivates students to actively participate in their learning (Glynn et al., 2011) and helps increase students’ interest in science and science-related careers (Hulleman and Harackiewicz, 2009). It is entirely possible that improving students’ spatial skills will not have an effect on their perceived value of science until they see the connection between science and the geology class in which they are enrolled, which is historically a challenge (Dodick and Orion, 2003). Alternatively, it may be that spatial skills are unconnected with a general view of geology’s societal relevance.

Although increases in self-efficacy and value would be expected to support students’ achievement-related choices (Bandura, 1977; Eccles and Wigfield, 2002, 2020), such as studying for an exam or completing homework assignments, we found no differences between the control and the experimental group’s final grades. It is possible that the increases found on the survey outcomes were too small to be impactful on students’ final grades. This would mean that additional interventions to improve students’ self-efficacy and value would be needed to see a relationship between self-efficacy and final grade. Interventions to improve self-efficacy could include assignments intentionally designed for students to succeed and reflect on that success (Bandura, 1997); peer modeling, where students who recently succeeded on the assignment demonstrate how they did so (Margolis and McCabe 2006); and focusing assignments on material that personally interests students (Ross et al., 2016). Course activities that are designed to be challenging but still attainable, involve explicit reflection on success, and incorporate encouragement from a respected other (e.g., teacher) have been shown to be particularly effective at improving academic self-efficacy, which has a relationship with learning in a course (Bandura, 1997; Margolis and McCabe, 2006; Britner and Pajares, 2006). It is also possible that the high final grades in the introductory geology course were too easy to achieve; therefore, the achievement-related choices that were influenced by self-efficacy and value did not have as substantial of an impact on student success. As discussed in the Limitations section, it is possible that there were limited differences in final grades because students who completed more spatial training assignments could be higher performing students overall. Both the control and experimental group had improvements in their final grade relative to the number of assignments they completed. In both groups, the improvements were greater than the percent increase we would expect to see were the increase only due to assignment completion. Both of these observations support the interpretation that students who completed low-stakes assignments (e.g., a weekly spatial training assignment worth less than 1% of the student’s final grade) are more likely to complete more of their other, higher stakes, assignments. This agrees with the findings of Sotola and Crede (2021), who also found that requiring students to complete frequent, low-stakes assignments improved their course performance. Karpicke and Roediger (2007) found that repeatedly recalling information improves retention of knowledge; in this case, practicing spatial skills in weekly assignments could serve as a form of repeated testing that improves students’ skills. Although spatial training did not improve course grade, it also did not hurt it when compared to standard introductory geology assignments (i.e., the control assignments). This means that it may be beneficial to exchange some assignments in geology courses for spatial training, which can relatively improve students’ course value, as well as significantly improve their self-efficacy in the class.

(RQ 2) For Survey Scales That Were Significantly Different Between the Control and Experimental Groups, What Is the Effect Size of the Change?

For the outcomes that had a trend toward an increase (science self-efficacy) and statistically significant difference between the control and experimental group (class self-efficacy and class value), the effect sizes were small (d = 0.22, 0.37, and 0.20, respectively). The overall pattern present was a trend toward an increase (science self-efficacy) or a significant increase (course self-efficacy) in scores for the experimental group and a significant decrease (course value) in the control group. Although each of these outcomes only had a small effect size, it is possible that the impacts of these changes could be seen in students’ beliefs or behaviors. Increasing students’ self-efficacy and value creates a positive feedback loop that supports their future goal setting, concept of self, and self-schemata (as shown in Fig. 1; Eccles and Wigfield, 2020; Bandura, 1994), and can encourage them to seek out additional opportunities to learn (Gogolin and Swartz, 1992). While measuring these additional metrics (Fig. 1) was beyond the scope of this study, we assert that their investigation could be a promising avenue for researchers interested in understanding how students develop a sense of self-efficacy in introductory courses. For example, if students feel positive about their ability to succeed in the course, they may feel more encouraged to study for the upcoming exam. Improving students’ self-efficacy also affects students’ stress resilience (Bandura, 1977) and academic resilience (Cassidy, 2015). This could include, for example, students feeling more confident in asking questions during the lecture or maintaining a sense of optimism about the course after receiving a bad grade.

We propose that the small but positive changes in science self-efficacy experienced by the experimental group, but not the control group, could be due to a positive effect from the spatial training assignments. While the explicit spatial training has a small impact, there does appear to be a small advantage to these over the control assignments. Just as Eccles and Wigfield (2020) describe, these positive experiences can develop self-efficacy (Fig. 1). This was also seen in other studies, which found that mastery experiences are the strongest developer of self-efficacy (Lent et al., 1996; Hampton, 1998). If these weekly spatial training assignments led to trend-level (p < 0.1) increases in science self-efficacy, it is possible that more intensive spatial training would lead to greater, more significant increases. This could include efforts similar to the work done by Sorby and Baartmans (1996), where students were enrolled in an entire course devoted to developing spatial skills. The course developed by Sorby and Baartmans (1996) led to more students persisting in the participating engineering program, especially female students (Sorby, 2009). On a smaller scale, this could include incorporating 15 minutes of spatial training assignments in the laboratory portion of introductory courses, including instruction to students on the value of the exercise and how it contributes to their overall learning in the course.

Additionally, we consider why students in the control group had a significant decrease in class value, compared to no significant difference for the experimental group. One primary factor we consider is cost, which is described by Eccles and Wigfield (2020) as the effort that students would have to expend to complete the assigned tasks. We propose, though, that this reduced cost is extended beyond the spatial training assignments to the overall course assignments. As students complete their spatial training assignments, they learn the skills needed to complete other course assignments (e.g., a relative dating exercise during the laboratory portion of the course). When students who have completed the spatial training assignments encounter the spatially oriented assignments in the course, the cost of completing them is reduced because they already have the necessary underlying skills to do so. This is done by reducing the element interactivity on the course assignments (Sweller et al., 2019). That is, if students already have the ability to mentally rotate an everyday object (e.g., a strawberry), they have to spend less cognitive energy on that element of a geology-specific task (e.g., mentally rotating a mineral along its axes). Rather than students simultaneously grappling with new spatial reasoning and geologic content, they can shift their focus to just the new geologic content, relying on their long-term storage to address the spatially oriented parts of the problem (Sweller et al., 2019).

Spatial training assignments could serve as a component of a classroom that fosters the development of self-efficacy and value, and when combined with other components of a classroom culture, they could have more significant effects on student behavior. The findings of this study could be more pronounced if students were informed of the relevance of the assignments they were completing. In this study, students were not privy to the decades of research that supports the necessity of spatial skills in the sciences, so they may not have understood the potential for these training assignments to impact their success. Having the additional benefit of students knowing that these assignments would help them could serve as a sort of “placebo effect” and influence student outcomes.

(RQ 3) Is There a Minimum Number of Spatial Training Assignments That Need to Be Completed to Achieve Positive Effects in the Experimental Group?

Completing two or more assignments was associated with a practically significant increase (more than 3%) in final grade. No significant relationship was found between the number of trainings and the other four survey outcomes (science self-efficacy, geology course self-efficacy, science value, and geology course value). While completing a higher number of required assignments would naturally lead to higher course grades, we note here that these assignments were only worth 5% of the overall course grade. However, for each spatial-training assignment students completed, their grade was predicted to increase by 2.42%, with no apparent ceiling effect (see RQ 4 sections). Two potential explanations for this are noted. First, it is possible that the impact of spatial trainings may have “spillover” effects, positively influencing student performance in other aspects of the course. However, given the lack of a statistically significant difference in course grade between the control and experimental groups, we find it more likely that students who earn higher grades in the course are also those more likely to complete a higher number of any assignment. Others have come to a similar conclusion, such as Cooper et al. (2006), who found a positive relationship between time spent on homework and teacher-assigned grades in courses. Additionally, Cooper et al. (1998) and Trautwein (2007) found that the number of homework assignments students completed was related to their academic achievement. To confidently say that the statistical relationship between final grade and the number of spatial training assignments completed is authentic, future work should assign different numbers of assignments to groups of students, and then assess any differences that may persist.

It is interesting that no other outcomes had a significant relationship with the number of trainings completed. Others have found that repeated practice that leads to even mild success improves self-efficacy (Bandura, 1977; Burnham, 2011; Joët et al., 2011; Loo and Choy, 2013), which led us to believe that students completing more spatial training assignments would have greater gains on these measures. Our model predicted the estimated marginal means of students who completed either one or ten spatial training assignments and found that students who completed only one assignment were likely to have decreases on each of the four survey outcomes. This is in contrast to the predicted increase on each survey scale for students who completed all ten assignments. The differences between the predicted outcomes of students who completed one or ten assignments was not statistically significant, however, so the directions of their improving or worsening scores may be unreliable. Our model did not predict that any of these estimated marginal means would meet our threshold of a 0.30 increase on the Likert scale. This further supports our interpretation that spatial training influences these measures, but whether these changes make a meaningful difference in student attitude and behavior is unknown.

(RQ 4) Is There a Maximum Number of Spatial Training Assignments Where We Stop Seeing Improvement (Ceiling Effect)?

The only outcome variable that was related to the number of spatial training assignments completed was Final Grade, and we determined that there was not a maximum number of trainings where we stopped seeing an effect. In part, this is due to the fact that completing the assignments contributed to students’ final grades. The rate at which final grade improved, however, is greater than what we would expect to see if the only driver for the increase in final grade was completion of the spatial training assignments. This means that there is some sort of “spillover” effect. The lack of a ceiling effect is encouraging, as it could mean that incorporating more spatially intensive assignments into geology courses could continue to support improvement in students’ final grades. This result agrees with others, such as that of Sotola and Crede (2021), who also found that having frequent, low-stakes assignments was beneficial to students’ final grades. Additionally, we know that students’ self-efficacy can be increased by giving them the opportunity to succeed on assignments by aligning the difficulty with their skill level (Schunk, 1989; Britner and Pajares, 2006). While our study design does not allow us to determine if the spatial training or the nature of the assignments is what drove the difference in final grades, future work could further investigate this difference by assigning students a varying number of spatial training assignments.

Research has consistently shown that spatial skills are correlated with entrance, persistence, and success in STEM fields (McGrew and Evans, 2004; Wai et al., 2009; Lubinski, 2010; Höffler, 2010). We found that training spatial skills has a small effect on students’ reported science and course self-efficacy, as well as their course value; per situated expectancy-value theory, each of these increases affects their achievement-related choices. These findings suggest that educators should consider exchanging some of their course materials for those that deliberately train spatial skills. Although this change in teaching may not lead to substantial improvements in students’ grades, it does appear to positively alter their self-efficacy in geology courses and possibly in science more broadly, as well as prevent a decreased valuation of the course overall. Just as Stieff and Uttal (2015) suggest, even small improvements in spatial skills can create significant opportunities for students who otherwise could have faced this as a barrier. To increase their effects, instructors should consider contextualizing these assignments by informing students of the research that supports the benefits of specialized spatial training. Considering the relatively low cost to educators to implement these assignments, we argue that implementing them is a worthwhile investment to support students’ long-term engagement with geology and science more broadly.

We found that by implementing weekly spatial training assignments in place of standard geology course assignments, we were able to increase students’ perceived self-efficacy in their introductory geology course, and in science more broadly. We also found that students who completed these assignments maintained their perceived value of the introductory geology course across the semester, whereas students enrolled in the control group had reduced ratings of value from pre to post. These results indicate that by training spatial skills in short, weekly assignments, we are able to change students’ perceptions of the introductory geology class and science more broadly, which could have implications for student recruitment and retention. We hope that these changes in student beliefs and behaviors are also found to improve their long-term relationship with science, encouraging students to continue to ask questions and explore the world in which they live. We found no differences between the control and experimental groups’ reported science value or final grades. We suggest that the disconnect between students’ perceived value of the class and science was due to students not connecting their geology course to science more broadly. Within the experimental group, we modeled significant differences between the final grades of students who completed just one spatial training assignment and those who completed the maximum of ten assignments. Because there were no significant differences between the final grades of the control and experimental groups, we interpreted this finding to mean that students who completed more spatial training assignments were more likely to complete other assignments in the course (e.g., homework, laboratory assignments), which thus led to a higher grade. This could be due to increases in student self-efficacy, or a relic of student behavior. Our model predicted that experimental students who completed one assignment would have a measured decrease in their survey outcomes, but that students who completed ten would have an increase. The differences our model predicted for the four survey outcomes did not meet our predetermined level of significance (0.30 on the Likert scale). Because the change on a Likert scale survey needed to influence meaningful change in students’ attitudes and behaviors is unknown, we suggest that educators consider implementing these or similar spatial training assignments to support students’ perceptions of their courses and science in general.

1Supplemental Material. Efficacy-value survey used as a pre- and post-assessment of students’ perspectives. Please visit to access the supplemental material, and contact with any questions.
Science Editor: David E. Fastovsky
Associate Editor: Cinzia Cervato

This work could not have been completed without the generous support of introductory geology instructors. We are also grateful to the Ryker Research Lab for thoughtful feedback; we especially thank Georgina Anderson for her contributions to survey development and validation, and Connor Chilton for thoughtfully reviewing the manuscript. We also thank Anne Gold, Nicole LaDue, and Carol Ormand for their support during the conception of this project, and Timothy Ransom for his expertise in software engineering. Finally, the statistical analyses were made possible thanks to the excellent work of Walker Pedersen at Professional Data Analysts. This work was supported by the National Science Foundation Graduate Research Fellowship Program under grant no. DGE-1450810, as well as grant no. EAR-2029920. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Additional funding was provided by the Geological Society of America’s Graduate Research Award.

Gold Open Access: This paper is published under the terms of the CC-BY-NC license.