Regional till and weathered bedrock geochemical datasets provide a basis for data analysis and modelling in glaciated terrains. These large-scale surface geochemical datasets have great potential in mineral exploration, especially when machine learning and clustering methods are used to reduce the dimensionality of multivariate datasets. Here, self-organizing maps (SOMs) followed by k-means clustering were used to create SOMs of the target areas for initial modelling and prospectivity mapping of gold (Au) prospecting in central Lapland, northern Finland. Because the till and weathered bedrock datasets are legacy data, an effort was made to level the data between the map sheets. The targeting till geochemical dataset did not contain the typical indicator elements for Au. Instead, elements associated with the Au deposits, within the study area, were used. Indicator associations in this study were Ni–Co with possible Cu. Resulting elemental clusters from the SOMs were assigned as interesting clusters according to their distribution of elements. For the till, two potential clusters were identified: Ni–Co–Cr and Cu–V–Co. For the weathered bedrock, three clusters were specified: Ni–Co–Cr, V–Cu and Cu–Co. This study shows the potential of using legacy datasets for early targeting stages of mineral exploration, potentially reducing the footprint of mineral exploration.

For the past few years, Finnish Lapland has attracted increased interest for mineral exploration. For example, exploration for gold in Finnish Lapland has been ongoing for several decades (i.e. Eilu et al. 2007). Despite the decades-long history of exploration, the region is still considered to be under-explored, and highly prospective for various metals (e.g. Niiranen et al. 2015). The most prospective area for economic mineral deposits in northern Finland is the Central Lapland Belt (CLB), representing a Paleoproterozoic volcano-sedimentary sequence formed in a rifting Archean basement during a ∼500 Ma depositional interval between ∼2.44 and 1.90 Ga, and subsequentially deformed during the Lapland–Kola orogeny between 1.93 and 1.77 Ga (e.g. Berthelsen and Marker 1986; Ward et al. 1989; Lehtonen et al. 1998; Tuisku and Huhma 1998; Daly et al. 2001; Evins and Laajoki 2002; Lahtinen et al. 2005; Hölttä et al. 2007; Köykkä et al. 2019). The CLB hosts the largest gold deposit in Europe (the Suurikuusikko Au deposit) as well as the Kevitsa Ni–Cu–PGE mine, along with several smaller metallic occurrences (Saattopora Au–Cu, Pahtavaara Au, Iso-Kuotko Au, Koitelainen V–Ti–Cr–PGE). Other notable discoveries include the Ikkari gold deposits (>4 MoZ Au; Rupert Resources 2022) and the Sakatti Ni–Cu–PGE deposit (e.g. Brownscombe et al. 2015). The CLB is analogous to other greenstone belts, such as the Superior province in Canada, and Yilgarn in western Australia (e.g. Robert and Poulsen 1997; Witt and Vanderhor 1998; Goldfarb et al. 2001; Patison 2007).

Multiple glaciation events have occurred in the Northern Hemisphere during the Quaternary period. This can be observed as several metres thick glaciogenic sediment cover within glaciated terrains, such as in Fennoscandia. In the centres of large glaciers, under cold-based subglacial conditions, together with the forming of an ice divide zone, thick glacially transported deposits are preserved (Hirvas 1991; Johansson et al. 2011). Such conditions were prevalent repeatedly in central Lapland. As such, some older glaciogenic sediment layers and underlying pre-glacial, deeply weathered bedrock have survived the glacial erosion. It is thus difficult to study the local bedrock without soil excavation or drilling. Consequently, exploration in Finnish glaciogenic terrain has often relied upon geochemical till sampling, surface boulder studies and morphological mapping techniques based on Light Detection and Ranging (LiDAR; e.g. Lindroos 1981; Kauranne et al. 1992; Sarala et al. 2007; Sarala 2015a). In addition, in the case of the ice divide zone, pre-glacial weathered bedrock sampling and research have also been used. The same types of methods are largely used in other glaciated terrains, such as in parts of Fennoscandia, Canada, northwestern Russia (e.g. Klassen 1997; McClenaghan 2005; Broster et al. 2009).

The Geological Survey of Finland (GTK) has collected and analysed regional till samples (specifically, till and pre-glacial weathered bedrock) from the central Lapland area. These datasets are part of the targeting till geochemical data, also called the line till geochemical data, which were gathered and analysed in the early 1970s to early 1980s (Gustavsson et al. 1979). The data have been the subject of several studies (Hall et al. 2015; Sarala 2015a; Taivalkoski 2017, 2019; Raatikainen 2021) and have been evaluated for their accuracy and reliability (Taivalkoski 2017, 2019), and for their useability in mineral exploration (Taivalkoski 2017; Raatikainen 2021; Puchhammer et al. 2024). Taivalkoski (2017, 2019) concluded that the existing geochemical data performed well in the identification of metal anomalies, with some challenges related to the spatial distribution of element concentrations, mainly between different map sheets.

Modern mineral exploration utilizes computer-based data analysis and simulated models during the early stage of target generation. Over the years, computer capacity and modelling methods have developed a lot. Simultaneously, the amount of data produced yearly has also increased. To manage the staggering amounts of data, the development of various software, such as ArcGIS, IoGAS, LeapfrogGEO and R, has made the visualization, classification, handling and modelling of relevant data rapid and efficient. Recently, methods based on neural networks and artificial intelligence, such as machine learning or self-organizing maps (SOMs), are starting to be used in mineral exploration (e.g. Porwal et al. 2003, 2004; Fraser and Dickson 2007; Cracknell et al. 2015; Hulkki 2015; Carranza and Laborte 2016; Zekri et al. 2019; Torppa et al. 2021; Chudasama et al. 2022; Daviran et al. 2022; Piippo et al. 2022; Zuo and Carranza 2023).

SOMs are based on neural networks, and are used in an unsupervised clustering method capable of processing multivariate data (Kohonen 2001). As such, this method has gained popularity in mineral exploration (e.g. Fraser and Dickson 2007; Fraser and Hodgkinson 2009; Cracknell et al. 2015; Hulkki 2015; Tayebi and Tangestani 2015; Torppa et al. 2015; Carranza and Laborte 2016; Leväniemi et al. 2017; Daviran et al. 2022, 2023; Piippo et al. 2022). Here, SOMs were combined with k-means clustering to provide clustered and grouped metal associations as indicators for Au. The associations were needed due to the dataset lacking Au values and other indicator element values. The basis of the SOM is to randomly assign data points to a grid, subsequently testing all other points against these randomly assigned points. Clusters form when the best matching units are found. The best matching units have the most similar values with the randomly assigned point. Clustering is used to classify and group the data (Vesanto and Alhoniemi 2000; Bierlein et al. 2008; Torppa et al. 2021; Chudasama et al. 2022).

SOMs and equated methods are effective for green field exploration efforts and areas without prior deposit knowledge. This is due to the data mining and pattern recognition abilities of such methods, after which experts can interpret the results. The method could be described as a non-invasive method, and furthermore it adheres to the green data principles. This is achieved with the use of old datasets, which reduce the need to go to the field for sampling. In other words, contaminants that would be associated with transporting samplers from place to place are eliminated, thus minimizing the impact on the environment.

The aim of this study was to evaluate the usage of targeting till geochemical data, collected from the CLB by the GTK (GTK 2024), for mineral exploration targeting. Coupled with multivariate geochemical data processing methods, in this case, SOMs were used for making prospectivity maps for the target area.

The study area is approximately 11 300 km2, located primarily within the municipality of Sodankylä in central Lapland (Fig. 1a). Geologically, the study area belongs to the CLB and is located within the ice divide zone of the last glaciation event, which resulted in an average of roughly 2–3 m of soil cover overlying the pre-glacial weathered bedrock (Hirvas 1991).

Bedrock geology of the Central Lapland Belt

The CLB is comprised of Paleoproterozoic (∼2.50–1.88 Ga) supracrustal rocks, with mafic to ultramafic intrusives (2.44–2.05 Ga), as well as felsic porphyritic and lamprophyric rocks (1.92–1.88 Ga), together with syn- and post-orogenic granitoids (1.88–1.80 Ga) (Fig. 1; Lehtonen et al. 1998; Hanski and Huhma 2005; Hölttä et al. 2007; Brownscombe et al. 2015; Köykkä et al. 2019; Köykkä and Luukas 2021). The latest stratigraphical division of the CLB was presented by Köykkä et al. (2019), who divided the stratigraphy into six groups, from oldest to youngest: the Salla, Kuusamo, Sodankylä and Savukoski groups, the allochthonous Kittilä suite and the Kumpu group. These are correlated with tectonic stages starting from (1) initial rifting/early syn-rift (Salla group and Kuusamo group), (2) syn-rift (Sodankylä group), (3) syn-rift to early post-rift (Sodankylä group), (4) a passive margin stage (Kittilä suite and Savukoski group), and (5) a foreland basin system (Kumpu group) (Fig. 2).

Three main stages of deformation, during the Svecofennian orogeny between c. 1.92 and 1.77 Ga (D1–D2), as well as a subsequent brittle deformational stage (D4), affected the rocks in the CLB (see e.g. Sayab et al. 2020 and references therein). Metamorphic facies in the CLB range from greenschist to amphibolite facies (e.g. Hölttä and Heilimo 2017).

Surficial geology of the Central Lapland Belt

The study area is covered 90% by till and peat deposits, and 5% by sand, together with gravel. Boulder fields cover about 4% of the study area, and exposed bedrock the remainder (Johansson and Kujansuu 2005). Glacial till is the most common type of surficial sediment in the study area, due to the multiple glaciation events that have occurred during the Quaternary period. Evidence of these glaciation events are seen in a number of observed till beds; there are six recognized till beds (Hirvas 1991). According to Hirvas (1991) and Johansson et al. (2011), the first three (I–III) till beds are interpreted to represent the most recent glaciation event, the Weichselian glaciation (c. 110–11.7 ka), the fourth (IV) till bed is ascribed to the Saalian glaciation (c. 300–130 ka) and the fifth and sixth (V and VI) till beds may represent the Elsterian (c. >300 ka) glacial stage, or a pre-Elsterian (c. >500 ka) glacial event. The sixth (VI) till bed may represent the Cromerian stage (c. >500 ka) (Hirvas 1991).

The mean depth of glaciogenic sediments within the study area is generally from 2 to 3 m (Hirvas 1991). However, in some areas, the depth of sediments may exceed 40 m (Hirvas 1991; GTK's targeting till data).

The till beds are underlain by pre-glacial weathered bedrock, which is a relatively unique feature in northern, glaciated terrains. This is due to cold-based glacial conditions, and the existence of the ice divide zone in central Lapland leading to minimal glacier movement and basal glacial erosion. As such, the weathered bedrock remains well-preserved under the historical ice divide zone. Some mixing of till and weathered bedrock has occurred, indicated by the presence of weathered bedrock before the deposition of till (Hirvas 1991; Johansson et al. 2011).

The weathering of the local bedrock predates the deposition of till. For example, the Sokli area, in eastern Lapland, contains one of the deepest weathering profiles (>100 m). Most of the weathering of local bedrock is believed to have occurred when the study area was at considerably lower latitudes (Hirvas 1991; Islam et al. 2002; Johansson and Kujansuu 2005). While it is possible for chemical weathering to occur in such cold climates due to the presence of organic acids (Munroe et al. 2007), weathering on such a large scale requires a warm and wet climate.

A definitive time frame for the weathering of the local bedrock is still ambiguous (Hyyppä 1983). The earliest possible starting period for this weathering process is the Cambrian period (>541 Ma). Other suggested starting periods are the Carboniferous to Permian (280–345 Ma), Jurassic to Cretaceous (135–200 Ma) and Eocene to Oligocene (25–50 Ma) (see Johansson and Kujansuu 2005). The latest starting period suggested is Neogene (early Tertiary) (<23 Ma) (Hirvas 1991; Hall et al. 2015). Note that different areas of central Lapland may have experienced weathering at different times (Hirvas 1991; Hall et al. 2023).

In central Lapland, further contributions to the soil are made by bogs. Notably, the Sodankylä area is the primary aapa mire occurrence area in central Lapland (Johansson and Kujansuu 2005). This could be problematic with regard to the interpretation of geochemical data in this area, as peatlands can be enriched in heavy metals due to organic metal complexing. This may cause false anomalies (i.e. too high a concentration of heavy metals).

Mineral deposits of the Central Lapland Belt

The CLB is one of the most important metallogenic belts in the Fennoscandian Shield, hosting both precious and base metal-rich mineral deposits (Fig. 1). For example, the Kevitsa mafic–ultramafic layered intrusion hosts the Kevitsa Ni–Cu–PGE deposit. The deposit is in the centre of the study area (Mutanen 1997; Mutanen and Huhma 2001).

The Sakatti Ni–Cu–PGE deposit is located in the municipality of Sodankylä. The Sakatti deposit is associated with komatiitic to picritic magmatism (Brownscombe et al. 2015; Konnunaho 2016). The mineralization occurs as disseminated, vein, semi-massive and massive sulfide mineralization. The principal ore minerals in the Sakatti deposit are pentlandite, pyrrhotite, chalcopyrite and pyrite (Brownscombe et al. 2015)

The Pahtavaara Au deposit is located within the municipality of Sodankylä. The host rock is a pyroclastic, ultramafic volcanic unit (Korkiakoski 1992), which is part of the Savukoski group (Mutanen 1997), formerly classified as the Sattasvaara komatiite complex (Saverikko 1985). Gold is present in the native form and is hosted by carbonate-rich veins (Korkiakoski 1992).

The Ikkari Au deposit occurs within a structurally complex area with altered and metamorphosed ultramafic-mafic and sedimentary rocks belonging to the Savukoski and Kumpu groups, respectively. Gold occurs in the native form, and is strongly associated with pyrite (Rupert Resources 2022).

The Koitelainen deposit is a Cr–V–Ti–PGE–Au deposit. The deposit is located in the municipality of Sodankylä, to the south of the Loka reservoir, within the Koitelainen layered intrusion. The main occurrence of chromium and PGEs is hosted by oxide minerals, such as chromite, ilmenomagnetite and ilmenite (Mutanen 1997; Mutanen and Huhma 2001).

Geochemical data

The geochemical dataset was provided by the GTK. The dataset is part of the targeting till dataset collected and analysed in the 1970s and 1980s (Gustavsson et al. 1979; Taivalkoski 2017). The dataset consists of 17 elements: Si, Al, Fe, Mg, Ca, Na, K, Ti, V, Cr, Mn, Co, Ni, Cu, Zn, Pb and Ag.

The sampling method has varied between different sampling campaigns. Most of the samples were collected using a percussion drilling method (Cobra drill), and from test pits. In addition, samples were collected using Auger drilling (not recommended due to the poor stratigraphical control).

The collected samples include upper sediment (i.e. till, sand, gravel, silt) and weathered bedrock. The selected dataset from the study area includes around ∼10 200 sampling points (drilled holes), consisting of about 54 300 till samples, and around 3600 weathered bedrock samples. A single sample point typically includes several till samples, and often the weathered bedrock sample (if the bedrock surface was reached). The data also include the profile sampling points, where the till samples were collected in 1-m intervals from the surface to the bedrock surface. For till, the deepest sample points were included, reducing the total number of samples to 52 700. For weathered bedrock data, the uppermost samples were included. This reduced the data points to 4200.

The analytical method was based on semi-quantitative (Gustavsson et al. 1979) optical emission spectrometry using an ARL Model 31000 emission quantometer (EKV) (Gustavsson et al. 1979). The feeder and excitation unit were Danielsson's tape machine (Danielsson and Sundkvist 1959a, b; Danielsson et al. 1959). This instrument uses an adhesive tape on to which the sample material is fed, and subsequently transported to the spark gap where the sample is excited/vapourized (Danielsson et al. 1959). The excitation happens in Argon gas by a low-voltage electric arc for 10 seconds (Gustavsson et al. 1979).

Given the different sampling campaigns, problems with the data levelling between different map sheets have been found (Salminen and Hartikainen 1985; Salminen 1995; Taivalkoski 2017, 2019). Further difficulties have been encountered with some data symbology, specifically the question mark (?) and asterisk (*) symbols. To manage these difficulties, the data marked by question marks and asterisks were removed.

Pre-processing workflow for the geochemical data

The targeting till data obtained by sampling techniques other than percussion drilling or test pits were rejected. Of the sediment sample data, only the till sample data were chosen. All available data from weathered bedrock samples were accepted for further processing. The targeting till dataset was separated into till data and weathered bedrock data.

Data points recorded as below the detection limit were replaced with half of the reported detection limit. For weathered bedrock, it was possible for this to be done more precisely, and thus values below the detection limit were replaced according to their respective map sheet. Values over the upper detection limit were used as recorded, for both datasets. Zinc, Pb and Ag with >85% of the samples below the detection limit were not used in the data analysis. Aluminium was also not used in the analysis because levelling its values gave questionable results. Finally, 13 elements were used for further treatment, and these were Ti, V, Ni, Co, Cr, Cu, Si, Fe, Mg, Ca, Na, Mn and K.

As stated, the data had problems in concentration levels between map sheets, and thus a levelling effort had to be made. Both datasets were subjected to the response ratio (rr) technique (Mann et al. 2005; Folger and Gray 2013). For the rr technique, a map sheet is selected. Values of an element within that map sheet are arranged in consecutive order so that values of the first quarter are chosen, and their mean is calculated (Mann et al. 2005; Folger and Gray 2013). This mean value is then used to divide every concentration value of that particular element within that specific map sheet.

After the rr technique, levelling was done between map sheets as proposed by Daneshfar and Cameron (1998) (Fig. 3). Two adjacent map sheets are chosen (Figs 3, 4). To each side, from the boundary of the map sheets, equally wide bands are used (Fig. 3). In this case, 5 km bands were used. Every data point within the sampling area (Fig. 3) was selected and compared against the points of the other map sheet in a qq plot. Before subjecting the rr-transformed values to the qq plot, they were logarithmically transformed (Daneshfar and Cameron 1998; Williams 2021). The qq plot was drawn with the ioGAS software. The reduced major axis (RMA) was utilized in ioGAS for plotting the regression line (Till 1974; Demetriades et al. 2022). The regression line is used as a levelling function for the map sheet, in which the values are deemed skewed. After levelling a map sheet, the calculated fixed values are used to level the next adjacent map(s) (Fig. 4).

The levelling, for both datasets, started from map sheet 3714. This was because it had distinguishable characteristics, such as the Pahtavaara area, which is noticeable in many of the geochemical maps. Using map sheet 3714 as a reference, the neighbouring map sheets 3712 and 3732 were levelled (Fig. 4). Depending on how pronounced the differences between 3714 and 3723 were, 3723 could be levelled by using 3714 as reference. For certain elements, 3713 was more suitable to be levelled by 3714 or 3731. The same style of levelling was prevalent for 3722, where the usual route was through 3721. Ηowever, for certain elements, 3724 resulted in better results.

Previous levelling routes were most common for the till dataset. For the weathered bedrock, several changes were made. Map sheets 3711 and 3713 were removed (Fig. 4). Together these two sheets had about 30 data points. Furthermore, there were very few data points spread along the 5 km boundary bands, thus making the results questionable, at best. It was also noticed that the best way to level 3721, 3722 and 3732 was the dotted line direction shown in Figure 4.

Self-organizing maps

SOMs are an unsupervised, neural-network-based clustering method proposed by Kohonen (2001). As with any neural network, the SOM is based on mathematical functions that are intended to emulate the neuron networks of a brain. The SOM, specifically, is based on self-organizing phenomena (Kohonen 2001). For geology, SOMs are useful for number of reasons; one of the main reasons is that with SOMs the input multidimensional data can be reduced to 2D (and, in some cases, 3D) format (Fig. 5). In other words, multivariate data can be reduced to a simple map. The pattern recognition capabilities of SOMs are the other reason. This is based on the SOM algorithm finding similarities between the vectors of the input data. In this study, a software developed by the GTK was used. The software was GisSOM and it can be downloaded from GitHUB. For specific questions about the functions and processes of the software, it is recommended to read the manual by Hautala et al. (2021) or contact GTK.

In the SOM method, the variables of data points are converted into n-dimensional vectors (nD). After which, the number of nodes, their geometry and architecture are decided; this is referred to as the grid space. According to Vesanto and Alhoniemi (2000) and Hautala et al. (2021), the number of nodes and the grid space architecture can be calculated using the following function,
(1)
where A is the rounded number for SOM architecture. This means that if the result of function (1) were 10.56, the architecture for the SOM is rounded to 11. Thus, the overall architecture is 11 rows by 11 columns. The number of data points (or vectors) in the inserted dataset is denoted by n.

For every node i, a random data vector is assigned from the input dataset. This data vector is referred to as a codebook vector (Kohonen 2001), or a seed vector (Fraser and Dickson 2007). Subsequently, vector x from the input layer is compared against all codebook vectors (Fig. 5). The i on the grid space that has the most similar vector values (shortest Euclidean distance) with the compared x is referred to as the best matching unit (BMU). When the BMU is found, the properties of the weight vector w, which is related to the codebook vector, are modified so that the resulting w more closely resembles the vector x (Fig. 5). Subsequently, the properties of the neighbouring nodes are increasingly altered depending on their proximity to the BMU (Fig. 5). Furthermore, here the chosen geometry has an effect. If a square geometry is decided, then there are four starting neighbours, whereas if a hexagonal shape is chosen, then there are six initial neighbours. The same procedure is repeated for every x in the dataset for N iterations. The goal of the process is for the SOM output layer to resemble the distribution of the input layer in a d-dimensional (2D or 3D) surface map.

The result can be separated into individual weights or feature maps. If a clustering function was used, a cluster map can be extracted, and a u-matrix. These are the SOMs.

After completing the SOM iterations, a simplified codebook vector clustering for the results can be compiled. Here, k-means clustering was used to simplify the results. In k-means clustering, central points, or centroids, are created. First, the data points are assigned randomly to the centroids. These groups of points function as proto-clusters. The number of centroids needed is decided by giving a value to k. In other words, k is the number of centroids, and thus the number of clusters. Assigning a value to k might need to be done multiple times. When a value is decided, the cluster centroids are computed for all k number of clusters. The data points are moved to the cluster whose centroid is the closest based on Euclidean distance. The aim is to create a k number of clusters with properties that discriminate clusters substantially from each other.

To evaluate how well the clustering has been achieved, several indices can be used. In this study, the Davies–Bouldin (DB) index (Davies and Bouldin 1979) was used as this is the indexing method included in the GisSOM software. This is to say that, at the time this study was performed, the software only has the DB index as the indexing method. The DB index is an indexing method to infer relative appropriateness of data partitioning (Davies and Bouldin 1979). The resulting value is a system-wide average of similarity of each cluster with its most similar cluster. The assumption for similarity is that the data density is a decreasing function of distance from a vector characteristic of a cluster. Generally, with the DB index, the lower its value, the better the clustering result (Davies and Bouldin 1979; Torppa, et al. 2021).

A hexagonal node shape was utilized. In total, 1000 learning cycles were used for the SOMs (termed epochs in the software). For k-means clustering, 100 iterations were used with a maximum k value of 25 (Fig. 6). These configurations, in the software, were used for both datasets.

For the till geochemical data, a 34 × 34-size architecture (Function 1) was used to construct the SOMs. Elemental component maps for till are shown in Figure 7a and Figure 8a. The u-matrix and clustering results are shown in Figure 9a and b, respectively. The DB index values for all 25 clustering variations are shown in Figure 6. The boxplots are shown in Figure 10a and Figure 11a. Based on k-means clustering, the data were classified to eight clusters (Fig. 9b), using the DB index as a tool to evaluate the optimal number of clusters (Fig. 6a). Figure 12 shows the resulting till prospectivity map.

Comparing the u-matrix (Fig. 9a, c) and the clustering results of the SOMs (Fig. 9b, d), clustering can be verified as follows: the u-matrix (unified distance matrix) indicates the closeness (or similarity) to the adjacent nodes on the map (Ultsch and Vetter 1994; Fraser and Dickson 2007). This is denoted by cool colours (shades of blue) for similar adjacent nodes, and warm colours (shades of red) for larger differences. A coloured u-matrix may represent a topographical region in which cool colours (blue) form a topographic low, or a ‘valley’ of similar nodes, and warm colours, topographically higher regions (or ‘walls’) that may represent class boundaries or different groups. The clusters from k-means clustering correlate relatively well with the areas in the u-matrix map. Furthermore, component planes of individual elements can also be used to predict the forming clusters (Fig. 7). In component planes, regions of higher weights of an element are indicated by warmer colours, and vice versa. All the component planes represent the same SOM space. Thus, higher weights in the same locations between two component planes indicate a correlation between those elements. As an example, in Figure 7a, Co, Cr and Ni have generally higher weights within the same location, implying a positive correlation, whereas Ti has very low values within the same area, implying a negative correlation. This same location in Figure 9b is marked as cluster 5.

For till, the most interesting or the most meaningful clusters are deemed to be numbers 5 and 4 (Fig. 12b, c). This is based on elemental boxplots and their distribution (Fig. 10a), and the location of the bedrock units together with known deposits (Figs 1b and 12).

For the weathered bedrock data, a 19 × 19 SOM was calculated (Function 1), followed by the determination of seven clusters (Fig. 9d). The u-matrix can be seen in Figure 9c. The corresponding elemental component maps are in Figures 7b and 8b. The DB values for all weathered bedrock cluster variations can be found in Figure 6b. The boxplots for each cluster are shown in Figures 10b and 11b. The resulting weathered bedrock prospectivity map can be found in Figure 13. As can be seen, in the case of weathered bedrock, lowest DB value for cluster amounts was not taken, rather the second lowest (Figs 9d, 6b). This is due to the assumption that weathered bedrock should be very representative of the underlying bedrock. Thus, two clusters cannot describe the bedrock in Sodankylä/CLB area. After choosing the second lowest DB value, three interesting clusters were recognized. These clusters were cluster numbers 2, 4, and 3 (Fig. 9d). This identification is based on the element boxplots and their spatial distribution (Figs 10b and 11b).

The evaluation of mineral potential and the targeting of different mineral deposit types in northern Finland using SOMs have previously been conducted by Chudasama et al. (2022) and Laakso et al. (2022). Regional till geochemical data have also been utilized in mineral exploration and targeting throughout Finland, as most of Finland is covered by surficial glacial deposits. Earlier studies focusing on the targeting of geochemical data have encountered problems with the artefacts and inequalities between the different map sheets, resulting from gradual changes in sampling methods, sample handling and complications regarding analysis (Gustavsson et al. 1979).

In this study, pre-processing, rr-transformation and the use of the method outlined by Daneshfar and Cameron (1998) for targeting till geochemical data resulted in more equal map sheets (e.g. Singer and Kouda 2001; Raatikainen 2021). The use of SOMs is an effective tool for the processing of multivariate data, and this has been noted previously (Vesanto and Alhoniemi 2000; Fraser and Dickson 2007; Bierlein et al. 2008; Fraser and Hodgkinson 2009; Hulkki 2015; Tayebi and Tangestani 2015; Torppa et al. 2015, 2021; Leväniemi et al. 2017; Ranta et al. 2021; Chudasama et al. 2022). Integrating SOMs with k-means clustering allows for the identification of most prospectively promising areas.

The results of the promising areas in a spatial context are shown in Figures 12 and 13, for the till and weathered bedrock geochemical data, respectively. The interesting clusters for till are clusters 5 and 4 (Fig. 12b, c). As highlighted in Figure 10a and the rest of the element distribution maps in Figure 11a, cluster 5 is characterized by high values for Co, Cr, Ni and Mg. Furthermore, cluster 5 has the lowest values for Ti, Na and K. Through geochemistry, this could be described as mafic-derived till debris. The spatial distribution of the data points of cluster 5 supports this deduction. Furthermore, some mixing with metasedimentary sources seems possible, such as graphite paraschist.

Cluster 4 can be seen as having high values of Cu, V and Co (Fig. 10a). It is associated with mafic to ultra-mafic sources, due to data points being in close proximity to or within bedrock areas in which mafic to ultra-mafic rocks occur (Fig. 12c). Furthermore, the research area has four known Cu mineralization occurrences within it. Cluster 4 is located on top of two such mineralizations, east of Madetkoski. At Tepsa, cluster 4 is within 2 km of the Cu mineralization. For the three other mineralizations, the closest cluster 4 is within 7 km or more.

Another interesting cluster is 0, which has its points in the eastern, SE and south corners (map sheets 3732, 3731, 3713 and 3711; see Fig. 4 and Fig. 12a). This cluster is interesting mainly because of its Ti distribution. Apart from Ti, the boxplots are interpreted so that the sources for cluster 0 are feldspar-rich rocks such as granite or arkose quartzite. In other words, cluster 0 is a feldspar rock derivative with a heightened average distribution of Ti.

Looking at other clusters such as numbers 1, 3 and 6, which are in the same areas as cluster 0 (Fig. 12a), the only major differences between these three clusters are the different amounts of Na and K, and to a certain extent Si and Mg. The source for these clusters is expected to be the same as for cluster 0, namely the feldspar abundant rocks, which commonly occur within the area occupied by clusters 1, 3 and 6.

Clusters 2 and 7 are distributed across the whole study area (Fig. 12a). Due to this feature, these two clusters are deemed introspective. Looking at the scatterplots (Fig. 14), the difference between these clusters can be seen. The scatterplots show that cluster 2 has distinct levels of Ca, K, Mn and Na when compared with cluster 7. For cluster 7, the Si content sets it apart from cluster 2 (Fig. 11a).

In the case of weathered bedrock, the most interesting clusters were 2, 4 and 3 (Fig. 13b, c). Cluster 2 is deemed the most interesting and thus the most prospective. This is due to the data points of cluster 2 indicating the mineralization very well. Well-indicated locations are the Pahtavaara area (Au), Tepsa (Cu) and Kutuvuoma (Au, Cu), Visasaari occurrence (Fe), Naattua (Au), Maaselkä (Cu, Co), Lomalampi (Cu, Au, Pd, Pt), Kirakka-aapa (Au as main, Ni, Cu, Co) and Allivuotso (Ni, Co) (Fig. 13b, c). Sakatti (multi-metal, main Ni, Cu), Kevitsa and Koitelainen have a data point close by (Fig. 13b, c), thus having a less defined indication. Cluster 4 is the second interesting cluster. It is situated on top of two out of three Cu mineralizations. Most of the data points are located within mafic to ultra-mafic bedrock areas; furthermore, map sheets 3714 and 3723 (Fig. 4) host most of these cluster data points. Cluster 3 is the least numerous of all the weathered bedrock clusters, having only 11 data points. Particularly, Cu and Co values are very high, which has been seen as interesting. Moreover, the Maaselkä (Cu, Co) occurrence has been indicated by this cluster. The Kirakka-aapa (Au) occurrence has a data point less than 2 km away.

Cluster 0 mainly indicates mafic rocks with some outlier concentrations associated with arkose quartzite and other metasedimentary rocks (Fig. 13a). Cluster 1 resides mainly on top of gabbroic, granodioritic and other mafic volcanic rock areas (Fig. 13a). Looking at its elemental distributions (Figs 10 and 11b), a mafic influence is very easily spotted, although a sizeable number of the data points do reside on the arkose quartzite areas, mainly in the SE part of the study area. Cluster 5 follows mafic parts of the bedrock (Fig. 13a). It can be seen as having a trend towards the Sodankylä group's mafic volcanic rocks. Most of the data points are contained within map sheets 3714, 3723 and 3724 (Fig. 4). Cluster 6 is the most widely spread cluster (Fig. 13a), although it has fewer data points within the eastern areas (map sheets 3741, 3732, 3731).

Some element associations have been recognized in earlier studies, such as the association of Co–Ni with Cu, which collectively indicate Au occurrences (see table 10.2.1 in Niiranen et al. 2015, and references therein). According to this, cluster 5 of till and cluster 2 of weathered bedrock could be described as the most reliable geochemical indicator clusters for Au. This is further supported in Figures 12c and 13b, in which the markers of Au deposits appear to follow clusters 5 and 2, till and weathered bedrock, respectively. Thus, the dataset can be used for Au prospecting, despite the absence of various accepted pathfinder elements such as As, Bi, Sb and Te.

SOMs have the potential for use in the targeting of other metal deposits. These include Ni–Cu–PGE and Cr–Co–Ni deposits, based on the known deposits being indicated well with the clusters. The highest priority targets would be in the areas where the till geochemical data and weathered geochemical data contain overlapping clusters within the same associations. For example, in this study, one interesting area is in the NW side of Tepsa (Figs 12c, 13b), where cluster 5 of till geochemical data and cluster 2 of weathered bedrock meet. In that area, both datasets could be described as indicators of Ni–Co–Cr clusters. Although the clusters are not precisely at the same location in the glaciated terrains, glacial transport could account for the difference between the locations in the prospectivity maps. Thus, knowledge of the glacial history of target areas is essential in the interpretation and implementation of the results of relevant datasets (see Sarala 2015b).

A possible source of errors in the results could be related to the methods used for collecting the samples and assigning soil types. In certain situations, summer trainees were employed or people who were less trained. This means that, in the case of weathered bedrock and till samples, the soil type assignments could be erroneous. This is due to the difficulty of distinguishing highly weathered bedrock from sand and silt, or weathered bedrock with rock fragments from till, and vice versa. In other words, it is highly likely that some samples have been mislabelled. Because the till samples in this study represent the deepest till samples, it is possible that some of the till samples are misidentified weathered bedrock samples. Furthermore, it is possible that some weathered bedrock samples are misidentified till or other sediment samples. This could explain why the till geochemical results follow the local bedrock composition better than the weathered bedrock results.

Data analysis using rr-transformations and SOMs, followed by k-means clustering of the existing multivariate regional till geochemical and weathered bedrock geochemical datasets, shows their potential use in mineral exploration.

The clustering results could be assigned into, prospection-wise, interesting and uninteresting clusters. For till, the interesting clusters were 5 and 4. This was because these clusters have an element and spatial distribution favourable for prospection purposes. Using these distributions, it was possible to assign element combinations for these clusters. For cluster 5, Ni–Co–Cr association could be recognized. Many Au deposits lay within close proximity to cluster 5. For cluster 4, Cu–V(–Co) association was recognized, with the spatial distribution indicating many Cr, Fe and V deposits as well as some Cu, Ni and Au.

For the pre-glacial weathered bedrock, clusters 2, 4 and 3 were the most interesting for mineral exploration purposes. Of these three clusters, cluster 2 was the most interesting due to its elemental distribution pointing towards the Ni–Co–Cr combination together with its spatial distribution indicating many Au, Cu, Ni and Co deposits. Furthermore, cluster 2 exhibits similar properties as cluster 5 in the till dataset. For these reasons, cluster 2 was the most useful cluster for Au exploration. Based on the same distribution factors, cluster 4 was deemed interesting for its V–Cu element association and Cu deposit indication. Cluster 3 was interesting for Cu–Co element association and spatially indicating Cu and Co deposits.

The use of till and weathered bedrock data in tandem provides an effective tool for targeting of various mineral deposit types in central Lapland, including Au, Ni, Co, Cr, Cu and V.

All data (open access), as well as the GisSOM software, were provided by the Geological Survey of Finland (GTK). Johanna Torppa is thanked for her help with using the GisSOM software. Brandon Datar is thanked for checking the English of the manuscript. Anonymous experts are thanked for proposing valuable improvements to the earlier version of the text.

MR: conceptualization (equal), formal analysis (lead), investigation (lead), methodology (lead), visualization (lead), writing – original draft (lead); PS: conceptualization (equal), funding acquisition (equal), project administration (lead), supervision (lead), writing – review & editing (equal); J-PR: conceptualization (supporting), formal analysis (supporting), investigation (supporting), methodology (supporting), supervision (equal), writing – review & editing (equal).

This work was funded by the K. H. Renlund Foundation.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

The datasets generated during and/or analysed during the current study are available in the Geological Survey of Finland repository, https://hakku.gtk.fi/en/locations?id=13.