Spatial reasoning is critical for success in science, technology, engineering, and mathematics (STEM) disciplines and particularly for geoscience. To evaluate capabilities among U.S. students, we assessed spatial reasoning skills in 345 introductory geology and upper-level structural geology students. The test measured students’ abilities to disembed, visualize, and mentally rotate objects. The results highlight an uneven distribution of spatial skills with a minimum score of 6% and a maximum score of 75% correct responses. Spatial skills are positively correlated with standardized test scores, motivation for learning, STEM major declaration, and number of science courses taken. Our analysis also indicates that the cumulative, informal training of childhood play has the ability to increase spatial reasoning. Spatial skill scores were significantly higher among students who played action, construction, or sports video games in childhood. Male and female students display significant differences in spatial skills, especially for mental rotation, with males outperforming females. However, gender disparities are fully mediated after adjusting for a variety of academic factors and whether students frequently played with construction-based toys. This indicates that gender differences are experiential rather than biological in origin. This study suggests that both formal academic training and extracurricular activities appear to develop spatial skills throughout students’ lives and indicates that systematic testing of spatial skills and formal training opportunities for students would facilitate improved spatial reasoning among students. We hypothesize that formal training opportunities for spatial reasoning could increase the potential pool of students who successfully enter STEM careers, including the geosciences, especially among women.

Spatial relations are an integral component of science, technology, engineering, and mathematics (STEM) disciplines. For example, chemists study the spatial structure of molecules; geoscientists, the spatial dimensions and genesis of landscapes over time; biologists, the three-dimensional DNA structure; and engineers, the design and manufacture of three-dimensional structures. Furthermore, scientific communication utilizes visuospatial representations such as graphs, diagrams, and maps that demand spatial skills for interpretation. Therefore, students’ abilities to visualize spatial relations are critical in STEM disciplines, especially in the geosciences.

Spatial reasoning skills are more strongly correlated with entry, performance, and persistence in STEM disciplines than Suite of Assessments (SAT) scores (Shea et al., 2001; Webb et al., 2007; Wai et al., 2009); they even correlate with creative accomplishments as measured by patents and publications (Kell et al., 2013). Despite this convincing evidence that spatial reasoning skills are important for student success in STEM disciplines, spatial reasoning skills are not systematically instructed or tested in K–12 education (National Research Council, 2006; Kastens et al., 2014; Ormand et al., 2014) or assessed among incoming undergraduate students, and they are seldom explicitly included in instruction at the college level. Not surprisingly, spatial skills are unevenly distributed in the population (e.g., National Research Council, 2006; Kastens et al., 2014), which presents a challenge to university instructors teaching spatially demanding courses. This uneven distribution of skills allows some students to perform discipline-specific tasks or interpret scientific communication easily, while others struggle (Kastens et al., 2009, 2014). However, spatial reasoning is not an innate ability. Rather, spatial reasoning has been shown to be trainable with a lasting effect (see meta-analysis in Uttal et al., 2013). For example, spatial reasoning improves through participation in training interventions such as discipline-specific course work (e.g., Sorby, 2001; Kozhevnikov et al., 2007; Titus and Horsman, 2009; Nielsen et al., 2011; Ormand et al., 2014), video gaming (Green and Bavelier, 2003; Feng et al., 2007; Cherney, 2008; Terlecki et al., 2008; Adams and Mayer, 2012), participation in sports (Ozel et al., 2002, 2004; Moreau et al., 2011, 2012; Pietsch and Jansen, 2012), playing music (Pietsch and Jansen, 2012), and playing with construction-based toys (Brosnan, 1998; Wolfgang et al., 2003; Coxon, 2012). However, many prior studies on these ways of training have been short in duration (e.g., interventions of hours to weeks), with limited exploration of the long-term efficacy of these interventions. Furthermore, we don’t know how the combination of formal and informal life experiences influence the development of spatial skills.

Spatial skills have been studied extensively, and we know that spatial reasoning includes a “collection of cognitive skills” (National Research Council, 2006, p. 12). Embodied cognition research finds that “cognitive processes are deeply rooted in the body’s interactions with the world” and that “the mind must be understood in the context of its relationship to a physical body that interacts with the world” (Wilson, 2002, p. 625). Building on the findings that spatial reasoning is trainable and appears to be trained through one’s interactions with the world, but that spatial skills are very unevenly distributed in the population, we sought to measure baseline spatial skills in undergraduate students, including some geoscience majors, at an early stage of their college career. The early stage of students’ college careers is particularly important because a lack in spatial skills may discourage entrance or persistence in STEM majors (Hambrick et al., 2012; Hegarty, 2014). We collected demographic data about students’ lives, high schools, and college experiences to understand the environmental and social factors as well as childhood experiences that may influence baseline spatial reasoning skills. We employed regression analysis to distinguish those formal or informal life experiences that correlate with strong spatial reasoning skills. We hypothesize that social and environmental contexts (such as personal characteristics and academic or college preparation) as well as life experiences (such as childhood play) provide different training experiences for spatial reasoning skills. This study is in contrast to previous efforts, which have focused on the impact of interventions on students’ spatial skills or the predictive nature of early testing. With our approach, we explore the environment that has shaped the cognitive collection of students’ spatial reasoning skills. Our results help identify critical life experiences where spatial skills appear to be trained.

Theoretical Framework

Spatial skills can be defined as the reasoning that “concerns shapes, locations, paths, relations among entities and relations between entities and frames of reference” (Newcombe and Shipley, 2015, p. 179–180). Spatial reasoning skills can be isolated and are not the same as discipline-specific skills (Uttal and Cohen, 2012). Different approaches to categorizing the human understanding of space and spatial relations have been developed based on theoretical findings in cognitive psychology or deduced from statistical analysis of spatial assignments (Hegarty and Waller, 2005; Chatterjee, 2008; Newcombe and Shipley, 2015). Newcombe and Shipley (2015) propose a classification scheme that builds on and expands existing frameworks (e.g., Chatterjee, 2008) using a theoretical approach. They classify a wide variety of spatial skills in geoscience according to Chatterjee’s typology, differentiating between intrinsic (or within objects) and extrinsic (or between objects) as well as static and dynamic spatial skills. In our study, we focus on testing the intrinsic or within objects spatial skills (intrinsic-static and intrinsic-dynamic) but do not test extrinsic or between objects skills. With this approach, we follow others who have studied spatial skills in geology students (e.g., Titus and Horsman, 2009; Ormand et al., 2014), an example of one spatially demanding STEM discipline. The intrinsic-static skill assessed in this investigation is disembedding, the skill to isolate and attend to one aspect of a complex display or scene. Intrinsic-dynamic skills assessed include spatial visualization or penetrative thinking, which can also be described as visualizing spatial relations inside an object, and mental rotation, visualizing the effect of rotating an object.

Cognition research findings have demonstrated a dynamic connection between cognition and the environment (see reviews by Wilson, 2002; Barsalou, 2008; Lakoff, 2012) and with the social and cultural worlds (Anderson, 2003; Suitner et al., 2015). Spatial reasoning has been described as embodied cognitive processes with a dynamic connection between the brain and the environment (Amorim et al., 2006; Tversky and Hard, 2009; Kessler and Thomson, 2010; Herrera and Riggs, 2013). Following the embodied cognition theory that the brain is dynamically connected to the environment, a review of the existing literature highlights three factors that appear to form or train spatial skills (meta-analysis in Uttal et al., 2013). First, personal characteristics such as gender and motivation are one group of factors. Many studies have explored gender differences in spatial skills (e.g., Linn and Petersen, 1985; Baenninger and Newcombe 1989; Voyer et al., 1995; Terlecki et al., 2008). In line with our hypothesis that spatial skills are trained informally throughout life, we include a measure of motivation for learning (Pintrich et al., 1991) as a personal characteristics variable. The second set of factors is broadly grouped under academic or college preparation. We explored standardized test scores, selection of major, and impact of course work on spatial skills. The third category that we hypothesized could possibly train spatial reasoning skills is labeled play experiences, and includes playing video games, playing with construction-based toys, and playing sports. Many intervention studies exist that indicate that these activities train spatial skills (see summary in Uttal et al., 2013). In this study, we explore which of these variables correlate with spatial reasoning skills in undergraduate students, and also to what extent differences commonly attributed to personal characteristics can be accounted for by academic preparation and play experiences.

Study Design and Student Population

We measured the spatial skills of 345 undergraduates in geology courses from a large U.S. research university with over 30,000 students (Table 1). In the first week of the Fall 2014 semester, all participants completed our spatial skills test instruments during class time. To assess baseline skills of the broad undergraduate population, the majority of participants (80.2%, n = 277) were enrolled in introductory geology courses; these are large lecture hall classes that many non-geology major students take to fulfill their science requirement. We also tested a smaller group of undergraduates enrolled in an upper-division geology course (12.5%, n = 43), which consists of lectures and weekly labs. Testing these upper-level geology majors allowed us to test whether spatial skills change with increasing course work. This study was part of a larger intervention study, but here we only focus on the analysis of pretest, baseline results. To measure the test-retest effect of taking the test twice under the pre- and post-test design, we also recruited a third group of undergraduate students (7.2%, n = 25) from across campus with the prerequisite of not being geology or engineering majors to serve as an assessment-only comparison group. These students were all lower-level undergraduates and were included in this study as part of the introductory student group.

Spatial Skills Instrument Description and Validation

Our spatial skills instrument consisted of three spatial skill tests, each assessing a different key intrinsic spatial skill—mental rotation, disembedding, and penetrative thinking. We administered this three-part instrument following the testing protocol from Ormand et al. (2014) in their study of geology students’ spatial skills. The instrument contains abstract, non-discipline–specific test items so that students would not perceive a lack of content knowledge as a barrier for selecting an answer. Each part of the test instrument included instructions on how to answer the items, including a description of possible strategies to solve the exercises and an example. These instructions were readable at the middle school reading level (grades 6–7, Flesch-Kincaid readability index of 75). Each of the tests was time-limited, with students given three minutes for each test. All questions had a multiple-choice format. The test was completed using pencil and paper; students provided their answers on scantron sheets.

The first part of the test measured students’ mental rotation skills (Fig. 1A), consisting of ten items taken from the Purdue Visualization of Rotation Test (Guay, 1976). For each item, students looked at an abstract three-dimensional figure and at a rotated version of the same figure. They were then asked to look at a new three-dimensional figure and choose from five rotated versions of the same figure to identify the one that was rotated in the same way. Items were rotated both around single and multiple axes. The Purdue Visualization of Rotation Test and a revised version have been validated psychometrically by Yoon (2011) and Maeda et al. (2013).

The second test measured students’ disembedding skills (Fig. 1B), requiring participants to search a collection of patterns to find a given configuration. Eight items were chosen from the Hidden Figures test (Ekstrom et al., 1976). The instrument was developed and validated by Educational Testing Services (ETS), which rated its difficulty level as “high” (Ekstrom et al., 1976).

The third test measured students’ penetrative thinking skills (Fig. 1C) using 15 items from the Planes of Reference test (Titus and Horsman, 2009). Items on this test showed a three-dimensional figure and a cutting plane. Participants had to envision the cutting plane from a view looking straight at the plane surface and identify the correct two-dimensional shapes that outline the intersection of the object and the cutting plane from five answer choices. The test was developed based on items from Myers (1953) and Crawford and Burnham (1946) and has been validated by Witkin et al. (1977). It has previously been used with geology students (e.g., Titus and Horsman, 2009; Almquist et al., 2011; Ormand et al., 2014).

All three tests are drawn from the empirical literature and have reasonable surface and content validity. To ensure that the measures were appropriate for our sample, we calculated Cronbach’s alpha using imputed values for questions that were left blank. We used full-information maximum likelihood estimation to create ten imputed data sets. Average scores across these data sets are used to calculate reliability estimates. The instrument as a whole demonstrated sufficient reliability (a = 0.7), with two of three subscales scoring just below conventional thresholds (mental rotation and penetrative thinking, a = 0.6). The final subscale had low internal consistency (disembedding, a = 0.3); therefore, while reported, results on this aspect of spatial reasoning are interpreted conservatively.

Data

In additional to the assessment of spatial skills, we collected demographic and educational data for students around students’ personal characteristics, academic and college preparation, and play experiences that may have provided spatial skill training. Some academic information (e.g., standardized test scores, college courses completed, and declared major) and demographic data (e.g., gender) were provided by the university’s Office of the Registrar. All other student data (e.g., motivation, high school course work, and game playing experiences) were derived from a self-report survey administered to students following the spatial skill assessment.

The widely used and statistically validated Motivated Strategies for Learning Questionnaire (MSLQ; Pintrich et al., 1991) was used to document students’ motivations for learning and their learning strategies and cognition. Following Hilpert et al. (2013), we administered a 43-item version of the full MSLQ to calculate values for the six subscales that explore students’ affective domains with respect to learning (i.e., self-efficacy, control of learning, intrinsic goals, task value, metacognitive regulation, and effort regulation). Students completed the questionnaire as part of their course homework in the first week of classes.

Academic preparation of the participating students was measured using the index score of the Colorado Commission of Higher Education (CCHE, 2015); this score combines students’ high school grade point averages and performance on standardized tests. The possible index values range from 45 to 146, with study participants having index scores ranging from 75 to 146 (mean [M] = 114; standard deviation [SD] = 12.3). We centered student indices around the mean, assigning scores higher than 114 positive values (ranging from 1 to 32), indicating achievement above the mean, and scores lower than 114 as negative (ranging from −1 to −39), indicating student achievement below the mean, to aid in interpretation of linear regression models. By doing so, the intercept in a regression analysis indicates the predicted spatial test score for students with average achievement scores.

On the survey, students provided the type and number of pre-college science (physics, chemistry, and biology), engineering, and Earth science (including geography, environmental science, and astronomy) courses they had taken. The registrar’s office further provided the completed college-level courses. We counted the number of completed (minimum C grade) science, Earth science, and engineering and/or architecture courses both in college and high school for each participant.

On the survey, students further provided information about their experience with video game play. We asked the students to “list the top three video games they play(ed) the most” and to provide three open-ended answer options. We combined all video games that were listed in the responses. A total of 285 different video games were listed. Following Adams’ (2009) video game categories in Fundamentals of Game Design, we created a rubric that included common video game types and structures. Combining Adams’ definitions with work from Spence and Feng (2010) and Granic et al. (2014), we were able to identify six video game categories (Table 2). Five coders classified the 285 games into these six categories. We compared the five coders’ classifications and accepted game assignments with an agreement between coders of 80% or more. In cases where fewer than 80% of the coders agreed, a master coder revisited the code and assigned a category based on an additional review of the code and external sources (e.g., categories that are assigned by the gaming companies or by vending platforms such as Amazon). To create the video gaming variable, we summed for each participant the number of video games in all six categories of those students who played video games (n = 268). Thus, any student could be assigned a value from 0 to 3 for each of the six game categories.

On the survey, students also indicated how often they had played with construction-based toys, listing Legos®, blocks, connectors, and Magna-Tiles® as examples of construction-based toys. Students selected from five frequency options ranging from “Very Often” to “Never.” We dichotomized the measure to reflect the qualitative difference between playing “very and quite often” versus “sometimes, rarely, and never” into frequent or infrequent play (Table 3).

Lastly, we also asked the students to “list the three sports or physical activities that they participated in the most throughout their life.” Students listed a total of 92 unique sports or physical activities. We sorted the sports by the number of times each sport appeared; Table 4 lists the top 11 most popular sports.

Analysis

All descriptive and inferential statistics were calculated using SPSS Statistics Version 23. The spatial score was calculated as a percentage of correctly solved test items. Blank responses were treated as incorrectly solved items. This approach allowed us to compare the number of items a student can correctly solve under a given time restriction. Normality of the distribution of the spatial skill scores (Table 5) was tested to ensure that our analyses are valid. We then examined bivariate relationships between the resultant spatial skill scores and all other potential explanatory variables (i.e, personal characteristics, academic and college preparation, and play experiences). These bivariate associations generally suggested that ordinary least squares (OLS) regression was appropriate for examining and explaining differences in spatial reasoning, with results from residual plots indicating linear relationships and constant error variance. Thus, we proceeded to fit baseline OLS regression models that describe the distribution of spatial skills. We built a series of seven OLS regression models to explore which factors explain the distribution of students’ spatial skills. In each model, we introduced a new explanatory variable. Thus, OLS regression models A through G (Tables 69) helped us estimate, on average, how a variety of characteristics influenced observed spatial skill scores in a large undergraduate student sample. In two summary OLS regression models (models H and J; Table 9), we explored the overall factors that appear to influence spatial skills.

Distribution of Participants’ Spatial Skills

The time limit for completion of each spatial test resulted in overall low total scores. Students left, on average, 31% of the questions blank. The mean score for the three-part instrument was 33.9% (SD = 11.9) correct responses (low kurtosis and skewness scores, <±1, indicate normal distribution and justify use of means rather than medians). The total scores ranged from 6% to 75% correct, showing substantial variation in participants’ spatial reasoning skills. Descriptive statistics for the three tests in the instrument are listed in Table 5.

One possible explanation of the distribution of spatial skill scores is that they may be partially explainable by general intelligence. Accordingly, we examined the partial correlations between the three different aspects of spatial thinking and our closest measured proxy for intelligence, standardized test index scores. Participants’ mental rotation and penetrative thinking skills showed a significant (p ≤ 0.001) but weak positive partial correlation of 0.333, indicating that they are somewhat related but different skills associated more closely with each other than with standardized test scores (partial r = 0.070). However, disembedding appeared to be a different, unrelated skill (bivariate correlations: mental rotation and disembedding 0.002; penetrative thinking and disembedding 0.076). Low scores on the disembedding test indicated this was the most difficult test, which is corroborated by students’ comments after taking the test and is in line with the “high difficulty” rating by ETS.

Personal Characteristics

Demographic and educational data for students are summarized in Table 1. Because the student population is limited in its racial and ethnic diversity, those factors are not considered in the reporting of the results.

Gender

The majority (61.2%, n = 211) of the students were males. Our data suggested a statistically significant difference between male and female students in overall spatial skills (b = −3.4, p ≤ 0.001 regression model A1; Table 6 and Fig. 2) and on one of three subscales (mental rotation: b = −8.3, p ≤ 0), with small to medium effect sizes (Cohen’s d = 0.29 and 0.42) for overall spatial reasoning and mental rotation, respectively. Because of the significant gender difference for mental rotation, we ran all subsequent OLS models with adjustments for gender on both the total spatial skill score and the mental rotation test score.

Motivation

Using students’ MSLQ scores, we calculated values for the six subscales of that instrument (self-efficacy, control of learning, intrinsic goals, task value, metacognitive regulation, and effort regulation) and entered those subscales as explanatory variables in regression models of spatial skill scores (Table 6). Only two of these six subscales yielded statistically significant results at the p ≤ 0.05 level after controlling for gender. Students with higher scores on the subscale “task value” (measured on a 7-point Likert scale) scored higher on the spatial skills tests (b = 1.548, p ≤ 0.05, regression model B1, Table 6). “Task value” indicates that students view their learning of a task or content as something important and useful. Similarly, students with higher confidence in their ability to master a task or learning content (“self-efficacy”) demonstrated higher spatial skills (b = 3.894, p ≤ 0.0001). “Effort regulation,” a student’s ability to continue with their study efforts despite difficulties or distractions (Hilpert et al., 2013), did not quite reach conventional criteria for significance (Table 6). When using mental rotation as the dependent variable instead of the overall score, metacognition showed a negative association with spatial skills (b = −3.98, p ≤ 0.05, regression model B2, Table 6).

In regression model B1, motivation appears to explain the gender gap in overall spatial skills because there is no statistical significance of being female as an explanatory variable. However, when we further control for standardized test scores in regression models C1 and C2 (Table 6), the effect of gender is no longer mediated, and men display higher spatial scores, on average, than women with comparable CCHE index scores and levels of motivation. This suggests some suppression of gender differences in spatial test scores when we do not account for previous performance on standardized tests, which may be related to test-taking skills or general intelligence. In fact, women in the sample displayed significantly higher index scores than men (t = 3.467, p < 0.001), but, as indicated by model C1, this advantage is not enough to account for the gender gap in spatial skills.

Academic and College Preparation

Standardized Test Scores

Regression analysis suggests that students with higher standardized test scores performed significantly better on the spatial skills tests (b = 0.186, p ≤0.002, regression model C1, Table 6). However, bivariate correlations between the CCHE academic preparation index and the spatial skills test indicate only a weak positive correlation (r = 0.242, p ≤ 0.01). We interpret these contrasting results to mean that academic preparation test scores cannot fully explain the variation in spatial skills, but may act as a proxy for general intelligence, which is known to correlate with spatial skills (Coyle and Pillow, 2008; Koenig et al., 2008). While selected items on some standardized tests may require spatial reasoning (e.g., Kastens et al., 2014), the standardized test index used in our study was derived from SAT or ACT scores that have not been shown to accurately measure spatial skills.

Given the theoretical relevance of standardized test scores for spatial reasoning, as well as selection into college (and thus, our sample), we control for academic preparation (i.e., general intelligence) in all subsequent regression models discussed below. However, because this paper is primarily focused on sources of informal spatial skills training rather than innate traits of students, we do not formally discuss the findings for standardized index scores in great detail in subsequent models. It is important to note that any remaining significant associations with spatial skills in the following models are indicative of meaningful differences in spatial skills beyond those that can be explained by general intelligence, alone, as is demonstrated by the persistence of gender differences in models C1 and C2.

Selection of College Major

In this study, 50.7% (n = 175) of the participants were non-STEM majors, 37.7% (n = 130) were STEM majors (13.3% geology majors; n = 46), and 11.6% (n = 40) had not yet declared their major. Students pursuing a STEM major displayed ∼5% higher spatial skills when compared to the non-STEM majors (b = −4.7, p ≤ 0.0001, regression model D1, Table 7) even after adjusting for standardized test scores (i.e., proxy for general intelligence) and gender. Spatial skill test scores did not differ significantly between the full sample (N = 345) and introductory course participants (n = 276). The students who had not yet declared a major showed significantly lower spatial scores (b = −4.7, p ≤0.05, regression model D1, Table 7) than the STEM majors.

Introductory versus Advanced Geology Course Students

Our data set does not suggest a significant difference between geology majors in introductory geology classes (n = 17) and geology majors in advanced geology courses (n = 29), the majority of whom are male students (84%). Whether this means spatial skills do not increase with increased training in the discipline at the studied university is unclear. The lack of a significant difference is likely a result of having inadequate statistical power due to the small participant numbers. Differences in spatial reasoning skills between introductory and advanced students would be a fruitful avenue for future research with a larger sample.

Completion of Prior STEM Course Work

Analysis of the number of completed science, Earth science, and engineering and/or architecture courses both in college and in high school for each participant showed, not surprisingly, the number of science courses taken was moderately correlated with choice of a STEM major (r = 0.51, p ≤ 0.0001). Thus, we tested the impact of prior STEM course work in regression models E1 (overall skill score) and E2 (mental rotation score) with adjustment for differences in gender and standardized test scores (Table 7). On average, participating students had taken 2.5 science courses (SD = 1.3; median [Mdn] = 1.0), 1.4 Earth science courses (SD = 3.5, Mdn = 1.0), and 0.5 engineering courses (SD = 1.8; Mdn = 0.00). For each additional science course taken, students’ spatial scores improved by ∼0.6% independent of other controls (b = 0.572, p ≤ 0.0001, model E1, Table 7). No significant correlation was observed for engineering coursework.

Life Experiences and Childhood Play

Experience with Video Gaming

When asked about their video gaming experience and habits, 22.3% of the participating students indicated that they never played video games. Men (79%) were nominally more likely to report playing video games than women (75%), though the difference in play was not significant (z = 0.821, p = 0.412). Table 10 shows the categories of games that students most frequently played as well as a weighted average that signifies whether students prefer games from a certain category. Students who indicated that they played action games showed significantly higher spatial skills on the combined test (b = 1.938, p ≤ 0.05, regression model F1, Table 8) than those who did not play games. Construction games (b = 3.57, p ≤ 0.1) and sports games (b = −2.46, p ≤ 0.1) were marginally significant predictors for mental rotation scores. No other video gaming appeared to impact spatial skills in our student sample.

Experience Playing with Construction-Based Toys

Playing with construction-based toys may be an important source of informal spatial training for children that is likely to vary systematically by gender due to the socialization process. Not surprisingly, men (76%) were much more likely to have frequently played with construction-based toys such as Legos as a child than were women (42%) in the sample, with the difference being highly significant (z = 5.862, p < 0.001). Moreover, our data suggest a significant positive association between frequent play with construction-based toys and spatial skills (b = 3.65, p = 0.017, regression model G1, Table 8) after controlling for video games, gender, and index test scores, displayed in a 4% increase in spatial skill scores.

Experience with Sports or Physical Activities

Each of the most popular sports or physical activities (Table 4) was entered as a variable into similar regression models (not displayed). However, none of the sports appeared to explain the distribution of spatial skills, because they did not reach statistical significance. It may be that the frequency of play, rather than sport preference, is a more critical source of informal spatial skills training. Future research should consider this dimension of physical activity when investigating its potential association with spatial reasoning.

Combined Experiences

We combine the statistically significant variables identified in models A through G into new multivariate regression models (model H and J, Table 9) to more fully examine how all of these factors, together, explain a wider distribution of students’ measured spatial reasoning skills. These final two models (Table 9) include gender, standardized test index scores, two motivation factors (task value and self-efficacy), major, the frequency of playing with construction-based toys, and the frequency of playing certain types of video games (i.e, action, construction, and sports games. Importantly, gender (bfemale = −3.3, Cohen’s d = 0.28, p < 0.05), standardized tests (b = 0.137, p < 0.05), self-efficacy (b = 2.1, p < 0.01), and selection of major (bnon-STEM = −4.1, p < 0.01) continue to be important predictors of overall spatial skills (regression model H1). However, the gender effect is fully mediated (b = −2.142, Cohen’s d = 0.18, p > 0.10) once we adjust for the frequency of playing with construction-based toys (model J1; b = 2.712, p < 0.10).

Figure 3 depicts what the predicted gender differences in total skill scores (from regression model J1, Table 9) would look like for three different hypothetical scenarios. First is a normative scenario in which men “frequently” played with construction-based toys and women did not. This highlights the gender gap present in the earlier models. The second and third scenarios in Figure 3 demonstrate that there is no gender gap between women who “frequently” played with these toys and men who did not, or between women and men who both played frequently with those toys. Importantly, the other characteristics of the hypothetical male and female students depicted in these scenarios are held constant and reflect the model-predicted spatial skill test scores of a STEM-major student with average CCHE index scores (M = 114), “high” self-efficiency (6 out of 7), and one action game listed among their favorite video games. The height of the bars reflects the resulting predicted values for overall spatial test scores using the regression equation represented by model J1 in Table 9, with only toy and gender manipulated to demonstrate the above-described scenarios; error bars display the 95% confidence interval for gender in the model. The results in Figure 3 imply that, while the variables in models H1 and J1 overall fail to explain the vast majority of the variation in student spatial skill test scores (R2 = 0.19), they do suggest that the widely observed gender disparity in spatial skills may be largely the result of different socialization practices (including different styles of play) during childhood and adolescence.

Prior research demonstrated the importance of spatial reasoning skills for performance, success, and retention in STEM disciplines, particularly in the geosciences. Several meta-analyses have established that spatial skills can be trained through indirect (e.g., course work or video games) and direct interventions (e.g., spatial skills training) (Baenninger and Newcombe, 1989; Uttal et al., 2013). Most studies measured the effect immediately after the intervention; retention of trained skills is much harder to measure and was not part of most prior studies (Uttal et al., 2013). Herein, we focus on the retention of spatial skills through cumulative life experiences as measured in baseline skills early in undergraduate course work. We identified personal characteristics, academic preparations, and life experiences that appear most successful in developing spatial skills by early adulthood. Understanding which factors correlate strongly with spatial skills in undergraduate students enrolled in a geology course provides insight into effective ways that formal and informal training can be shaped.

As summarized in our final regression model (Table 9), our results indicate that among personal characteristics, we found gender to be a consistent and significant predictor of spatial skills, among undergraduate students enrolled in geology courses, with women demonstrating lower spatial skills than men even after controlling for several other factors related to spatial reasoning skills. We also found correlations between motivation for learning and spatial skills, suggesting that training and retention of spatial skills throughout life are guided by motivation. Spatial skills are also associated with academic preparation and training factors such as standardized test scores, selection of a STEM versus non-STEM major, and number of STEM courses taken. Among other life experiences, we find that spatial skills are higher among those who play action-oriented video games and those who played with construction-based toys as children. Importantly, after adjusting for all of these factors, the gender difference in spatial reasoning is no longer significant, implying that gender differences may be a result of greater informal training of boys in childhood and adolescence due to differences in the gender socialization process (Epstein and Ward, 2011).

Many STEM Students Enter College without Adequate Development of Spatial Reasoning Skills

The wide range of spatial reasoning skills displayed among undergraduate students in this study implies that K–12 education does not provide effective formal training of spatial skills for these students. In fact, we likely underestimate the range of spatial skills among graduating high school students because participants in this study have already passed a rigorous research university admissions process. Other studies have suggested significant positive benefits on career outcomes for individuals with higher spatial skills (e.g., Shea et al., 2001; Webb et al., 2007; Wai et al., 2009; Kell et al., 2013). K–12 education provides an obvious opportunity for training spatial skills, especially given the range of simple interventions that are likely effective in improving academic performance when provided at early ages.

The observed weak correlation between standardized test scores and spatial skill performance indicates that a systematic assessment of spatial skills in all students would reveal information about a yet-untested layer of students’ academic potential. Similar findings have been reported from longitudinal studies (Shea et al., 2001; Webb et al., 2007; Wai et al., 2009). Thus, quantifying spatial skills could facilitate optimized support for all students (e.g., through remediation or additional training for students with low spatial skill scores) and also provide opportunities for identifying talent that may not be apparent in standardized test scores alone (e.g., Kell et al., 2013). Providing formal training opportunities might also increase the number of students who are cognitively able to succeed in spatial tasks and, thus, increase the potential pool of students who successfully enter a geoscience or STEM career. Also, students with high standardized test scores but low spatial skills could potentially participate in training interventions before entering a geoscience or STEM discipline to increase their success of solving spatially demanding tasks that might otherwise prove difficult.

We find that students with an interest in and motivation for learning (task orientation), especially those with high self-efficacy about their learning success, display significantly higher spatial skills even after controlling for a proxy for general intelligence (standardized test scores). This result supports previous findings that have documented the positive influence of the affective domain (and, thus, motivation) on student learning in general (e.g., Wolters and Pintrich, 1998; Robbins et al., 2004; Dweck, 2006; McConnell and van Der Hoeven Kraft, 2011). Our findings imply that students with an interest in learning and high confidence about their learning success might have developed stronger spatial reasoning skills than their less motivated, less confident peers.

Our results also suggest that STEM college majors have statistically significantly higher spatial skills than students pursuing non-STEM majors. However, our study design does not identify causality. Are students with more developed spatial skills more likely to enter STEM disciplines? Or, do the science courses students take as part of their major train spatial skills in students? A closer look at the students who had not yet declared a major (n = 40) provides some insight. A year after the assessment, seven of the students had selected a STEM major, and 14 had selected a non-STEM major (12 students dropped out; eight were still undeclared). An independent samples t-test reveals that while these students’ overall spatial skill scores do not exhibit a significant difference by newly declared major (t = 1.15, p = 0.264), the STEM group’s mean scores for mental rotation (M = 38.5, SD = 8.15, overall; M = 50.0, SD = 17.32) are higher than those of the non-STEM group (M = 32.9, SD = 11.52, overall; M = 37.1, SD = 15.90), implying that some aspect of spatial skills may drive the selection of major. Further work is necessary to confirm this trend with a larger sample size and with a research design more suitable for identifying causality.

Spatial Reasoning Skills Need Not Be Gender Specific

Increasing the diversity of geoscience or STEM graduates (and, subsequently, the geoscience or STEM workforce) is an important goal, both with respect to equity concerns and because the potential for increasing participation in STEM is highest in populations that are currently underrepresented. Females are an underrepresented group frequently targeted by programs designed to increase participation in the STEM workforce (Burke, 2007; Griffith, 2010; Fealing et al., 2015). As predicted by previous research (Linn and Petersen, 1985; Baenninger and Newcombe 1989; Voyer et al., 1995; Parsons et al., 2004; Terlecki et al., 2008), males in our sample significantly outperformed females on spatial skill scores (Tables 69). However, the majority of the observed gender difference stems from different mental rotation skills. Depending on model specifications, males achieve 8% to 16% higher performance on mental rotation tests than females with medium to medium-large effect sizes (Cohen’s d = 0.42 to 0.63). No statistically significant gender differences were apparent for the other spatial skills tested (disembedding and penetrative thinking) or for the overall measure in the most fully developed model (model J1, Table 9).

In thinking about mediating the gender difference, an obvious question is whether observed disparities can be attributed to biological differences between the sexes or to sociocultural experiences inherent in the gender socialization process. Our findings indicate that sociocultural experiences, such as types of childhood toy play, can explain gender differences in spatial skills. In our study, 62% of students (75.6% of males and 41.5% of females) self-reported that they played frequently with construction-based toys. Students who played with construction-based toys as children showed significantly better performance on spatial skills tests, especially the mental rotation test. Once we also accounted for differences in motivation, test scores (i.e., intelligence), and major (model J1), play type fully mediates the gender disparity in total spatial skill scores (Table 9). Further, this mediation is not just an artifact of multicollinearity, since our analysis reveals only a weak correlation between gender and the frequency of playing with construction-based toys (r = 0.344, p ≤ 0.01). The data imply that childhood play with construction-based toys has a lasting training effect on males and females. Our study thus expands on the findings of Richardson (1994) and others (e.g., Moè, 2009) who found that gender differences observed for spatial skills may be attenuated by educational and sociocultural experiences, and that those experiences may be more important than biological predisposition for explaining differences in spatial skills. This is given further support in our sample, which exhibited significantly higher scores among female students (t = 3.467, p < 0.001) than their male counterparts on a proxy for general intelligence. These results point to the importance (and opportunity!) of sociocultural experiences over biological explanations. Thus, educational interventions, as well as less formal interventions such as encouraging construction-based play in girls, are likely to eliminate gender differences in spatial reasoning that may deter entry and success in geoscience or STEM fields.

In contrast, when we focus on mental rotation skills, the gender difference in our study does not diffuse with frequency of childhood play (model J2; Table 9), likely because the gender difference for mental rotation skills is very strong (model H2; Table 9). A controlled experimental study (Wolfgang et al., 2003) in which nine- to 14-year-olds were engaged in 20 hours of Lego play similarly implied persistence in spatial skill differences between the genders. This may imply that the cumulative effect of childhood play, including play at a young age, may be important for forming spatial reasoning skills for both genders. This result is important because it demonstrates the persistence of the training effects of early childhood activities.

Spatial Thinking Skills Can Be Developed in Many Ways

The cumulative effect of video gaming throughout an individual’s life appears to be correlated with an increase in overall spatial skills for those who frequently played action games, with weaker evidence suggesting potential impacts on mental rotation for those playing construction-based and sports-focused video games. This implies that the cumulative training of video game play over many years may have a lasting effect. It also appears to be detectable even outside a controlled setting, though the effect does seem to differ by the types of games being played. Our results corroborate findings from others who suggested that spatial reasoning skills improve with exposure to video gaming, especially with action games (Subrahmanyam and Greenfield, 1994; Feng et al., 2007; Adams and Mayer, 2012), sports games, and construction games (e.g., Okagaki and Frensch, 1994; De Lisi and Wolford, 2002; Sims and Mayer, 2002; Terlecki et al., 2008). However, we did not collect the needed data to test whether these findings are attributable to genre or frequency of play (e.g., students preferring action games may play more frequently than those playing other types of games).

Most of the prior studies on gaming explored the effect of controlled interventions (consisting of video game play) on gains in spatial skills using experimental designs. Such studies suggested that gaming’s short-term effect can account for the gender gap in spatial reasoning (Feng et al., 2007; Terlecki et al., 2008). However, the long-term data about video gaming as captured in our study suggest that the mediation effect of gender from video gaming may be more modest than gaming’s short-term effect, because we found no evidence for such mediation. While the self-reported data we collected for childhood play are subject to recall bias, there is little reason to suggest that this would systematically influence our findings, which are largely in line with prior work. Video gaming thus may offer one potential avenue for lasting interventions to develop spatial skills. The play-based nature of such training might appeal to many student populations, even students who are difficult to motivate with more traditional learning approaches.

We did not detect a statistically significant correlation between sports play and students’ spatial skills. A lack of specificity in the questionnaire may explain this result. Specifically, we did not document the amount of time that participants spent playing sports. Therefore, the results do not distinguish between athletes who engaged in a sport multiple times a week over years versus students who recreationally played the same sport on an occasional basis. Studies that found improvements in spatial skills with sports play (Ozel et al., 2002, 2004; Moreau et al., 2011, 2012) only included athletes who play sports frequently or students who were enrolled in regular sports programs, which may help account for why we did not observe a significant association of sports with any spatial skill outcomes. The long-term training effects of different sports should be the subject of future research, though it would be important to distinguish between type and frequency of play.

Limitations

The factors explored in the OLS models combine to explain only ∼20% of the observed variability in the spatial skill test scores (R2 = 19% for total spatial skill test results, R2 = 24% for mental rotation test results; models H and J; Table 9). Yet, the statistically significant trends in our data imply that the associations described are meaningful. The remaining variability in spatial skills and/or test performance must be associated with factors not isolated in our study. These could include other training experiences such as prior development of navigation skills, training in music, or other arts experience. It is also possible that our instrument did not accurately measure students’ maximum spatial reasoning potential, given the low-stakes character of the test (results were not part of the course grade), and, consequently, students might not have given the tests their best efforts. Alternatively, students subject to test anxiety might have been stressed by the time limit we put on our test.

The three spatial skills tested in this study only address visualization of intrinsic (within objects) spatial relations and not extrinsic (between objects) skills. However, extrinsic skills are also highly relevant for success in some STEM disciplines (Newcombe and Shipley, 2015). This may explain, in part, why two of the three spatial skill tests we conducted had relatively low internal consistency—there may be some omitted factors related to extrinsic skills. Future work should expand testing strategies to include extrinsic spatial skills.

Implications

Our findings suggest an opportunity to train spatial skills and provide all students with the necessary tools to succeed in STEM disciplines, including the geosciences. Newly developed science education standards for K–12 education in the NGSS Lead States, 2013) focus on integrating “scientific practices” and “cross-cutting concepts,” providing an opportunity for regular training of spatial skills. Spatial reasoning, however, is only explicitly described for the Earth science curriculum (standard code: MS-ESS2-2, HS-ESS2-1). Curriculum developers who are creating new classroom materials to support teaching the Next Generation Science Standards (NGSS) may address this shortcoming by including spatially-demanding components in other STEM curricular materials. For example, mental rotation could be part of engineering-focused instruction, or penetrative thinking could be included in chemistry or life science when discussing molecular or cell structures. Even more simply, informal training through instruction that incorporates construction-based toy play or video gaming has the potential to improve spatial reasoning skills and reduce gender disparities without a need to fundamentally restructure curricula.

Our sample of 345 undergraduate students at a large U.S. research university documents an uneven distribution in intrinsic spatial skills among students—a set of skills that are shown to be an important predictor for success in STEM disciplines. The wide distribution of spatial skills observed demonstrates the need for both formal and informal spatial skill training in order to provide equal opportunities for students in STEM disciplines, including the geosciences. Our results suggest that the lack of formal spatial skill training is mitigated among certain student groups through informal training during extracurricular activities throughout childhood and early adulthood. We find a statistically significant association between spatial skill scores and gender, motivation for learning, standardized test scores, selection of STEM or non-STEM major, prior STEM course work, playing with construction-based toys, and video gaming.

When several important motivational and play experiences are considered, including childhood play with construction-based toys, female students’ average spatial skills scores become statistically indistinguishable from male students’ scores. This finding indicates that the informal nature of childhood spatial skill development might systematically disadvantage females, who are often socialized to engage in activities that do not prioritize spatial interaction in the same way as traditionally male-affiliated activities (e.g., video gaming or playing with construction-based toys). Our results also highlight opportunities for reducing the gender gap; for example, early interventions, such as encouraging construction-based play in girls, may prove effective for improving spatial reasoning and facilitating success in STEM fields among the female population. Furthermore, systematic testing of spatial skills and formal training opportunities for students throughout their K–12 and college education might increase the number of students who are cognitively able to succeed in spatial tasks and, thus, increase the potential pool of students who successfully enter a geoscience or STEM career.

The authors would like to thank Thomas F. (Tim) Shipley for valuable input in the study design. We thank Jennifer Stroh, Alyssa Quintanilla, Peter Cirkovic, Alexander Swindell, and Mauricio Munoz for data entry; Samuel Gold for coding video games; and Susan Lynds for copyediting the manuscript. Funding for the study was provided to Anne Gold through a Chancellor’s Award of the University of Colorado in Boulder and a fellowship from the German Exchange Council (DAAD). Anke Friedrich hosted Anne Gold during the academic year 2015–2016 at the Ludwig-Maximilian University in Munich, Germany. Comments and suggestions from anonymous reviewers and Guest Associate Editor Julie Libarkin have significantly improved the quality of the manuscript.

Science Editor: Shanaka de Silva
Guest Associate Editor: Julie Libarkin