SAMPLING OF CHEMICAL ATTRIBUTES IN FOREST SOILS

Information about sample adequacy that represents soil chemical attributes distribution are fundamental for a better rationalization of the use of correctives and fertilizers. The objective was to evaluate the variability of these attributes and to size the minimum number of composite samples to represent the fertility of forest soils. The total area planted was 9,101ha, constituted of 265 commercial eucalypt stands. The 687 soil composite samples obtained were for chemical analysis. It was evaluated the performance of two exploratory analysis techniques and six sampling procedures. The attributes P, K, Ca, Mg and S presented higher coefficient of variation (>35%). In contrast, the distributions of Al, organic matter and, mainly, pH were the most homogeneous. The sample error was smaller as the amount of composite samples increased. The representative of all chemical attributes (sample error of 5%) was achieved with a minimum of 309 (one each 29ha, 1:29) and 295 (1:31) composite samples from sampling procedures simple casual and stratified by altitude class, respectively. Both procedures were promising for soil sampling, especially, when applying the boxplot for identification and removal of outliers.


INTRODUCTION
Corrective and fertilizer recommendations are based on information from sampling procedures that, if properly carried out, contribute to desired forest productivity achievement.The samples should be collected in a planned manner and adequately represent the fertility of the site to be fertilized (HEIM et al., 2009;OLIVEIRA et al., 2014).The distribution of the sample units needs to be performed in sufficient quantity so that the estimates of nutrient averages are accurate and reliable (CANTARUTTI et al., 1999;BRUS, 2015).
Brazil has 7.84 million hectares in which eucalypt and pinus stands and, in general, soil sampling has been carried out per plot, usually with an area of less than 50ha.Although sampling is carried out per plot, the recommendations of fertilization occur in a generic way without considering stratification criteria, such as soil type, geological formation, textural class or topography.The reasons for the generalization are based on the operational ease or the technological limitations that the companies have in order to implement precision forestry.One one hand, it is not difficult to find companies that adopt between one and three fertilization recommendations per crop year.Often, these companies implant more than 1,000ha monthly.On the other hand, farmers who do not have adequate technological infrastructure rarely apply more than one fertilization recommendation, regardless of the size of the area to be implanted.
The chemical analysis is routinely done with composite soil sample, formed by the homogeneous mixture of several simple samples.Given the predominance of research aimed at the definition of the number of simple samples to form a composite (MACHADO et al., 2007;SANTOS et al., 2009;LIMA et al., 2010;SANTOS et al., 2013, GUARÇONI et al., 2017) it becomes relevant the dimensioning quantity of composite samples for the evaluation of soil fertility.The sample size is influenced by the purpose of sampling, precision required, method of selection and distribution of sample units, available resources and, above all, variation of the characteristic of interest (SHIVER; BORDERS, 1996).The difficulty in representing fertility results from the spatial unevenness of soil chemical composition, since variations can occur naturally due to pedogenetic processes and/or anthropic actions such as the use and management of the soil (GÓMEZ et al., 2009;WASTOWSKI et al., 2010;TIAN et al., 2017;JIMÉNEZ-AGUIRRE et al., 2018;LEOPIZZI et al., 2018).
There is a widespread thought which states that random sampling is not appropriate for determining fertility, due to spatial dependence of soil chemical attributes (GUARÇONI et al., 2017).However, it should be pointed out that the random sampling itself, based on "classical" statistics, can generate independence between sample units when made actually at random, which would justify their use (SHIVER;BORDERS, 1996;GUARÇONI et al., 2017).
The simplest way to do estimated calculation of soil attributes distribution is the use of simple random sampling.It is a probabilistic procedure in which each population unit has the same chance of being selected, independently of the other selected ones (SHIVER; BORDERS, 1996).It is a fundamental procedure to select sample units, from which all other random sampling procedures were derived in order to obtain greater economy and/or precision.The random or zigzag distribution of collection points is extensively used to obtain simple soil samples.In this perspective, several surveys have contemplated the analysis of chemical attributes representativeness applying simple random sampling or geostatistical analysis (HEIM et al., 2009;LIMA et al., 2010;OLIVEIRA et al., 2014;OLIVEIRA et al., 2015, GUARÇONI et al., 2017).It is emphasized that the strata definition is a laborious task compared to the unrestricted sample units distribution, but it has great potential for sample error control (SHIVER; BORDERS, 1996).
Another usual procedure is stratified casual sampling, which is based on dividing the population into more homogeneous subpopulations called strata.In this one, the sample units are selected randomly with a prior or post imposed restriction by the area of each stratum.The success in the stratification process depends on the knowledge of the area, adequately establishing the strata so they are as homogeneous as possible internally and distinct from each other.Traditionally, soil sampling is done by performing a prior division of the property into homogeneous lands for the distribution of collection points (CANTARUTTI et al., 1999;IAC, 2018).
The fertility spatial variability between distant points is mainly a consequence of the pedogenetic processes action (SANTOS et al., 2009).Therefore, the stratification of extensive areas regarding the factors that influence pedogenesis (climate, material of origin, relief, time and organisms) can be an alternative to reduce the amostral error, the one which incurs when evaluating only part of the population (SIQUEIRA et al., 2017, SHIVER;BORDERS, 1996).The stratus attainment that discriminates the chemical attributes distribution reduces fertilizer wastefulnesses in ground without deficiency of nutrients and, when deficient, of underdosage, it makes possible to achieve excellent levels for bigger productivity (MACHADO et al., 2007).
Negligence in collecting fewer sample units when there is a requirement for larger sample size can cause, in some cases, misinterpretations that do not represent actual fertility conditions (LIEß, 2015).Lack of precision can result in unbalanced application of fertilizers and compromise the yield of forest stands or other crops.On the other hand, it is well known that more selected sample units with no trend provide lower variability and error estimates (SHIVER; BORDERS, 1996).It is expected, from a given moment, that the increase in sample intensity will no longer correspond to an increase in precision and that, beyond this point, each unit measured becomes costly to the sampling performed.
The records on the application of simple and stratified casual sampling procedures for the determination of the optimum amount of composite samples for soil fertility evaluation, specific for forest plantations, are rare.The following hypotheses were tested: i) the sampling intensity of the soil chemical attributes decreases with the removal of outliers; and ii) the number of samples composed to represent chemical attributes varies among stratification criteria.Thus, the objective was to evaluate the variability of soil chemical attributes and to size the minimum number of composite samples to represent the fertility of forest soils using different sampling procedures.

MATERIAL AND METHODS
The present study was carried out in stands of Eucalyptus sp.located in the municipalities of Carbonita, Capelinha, Itamarandiba, Minas Novas, Tourmaline and Veredinha.The predominant climate in this region of the Jequitinhonha Valley in the state of Minas Gerais is Cwa, temperate rainy (mesothermic) with dry winter and rainy summer, according to the international system of Köppen.According to the National Institute of Meteorology, the annual averages of precipitation and temperature are in the intervals between 850 and 1250mm and from 20 to 24ºC, respectively.
The total evaluated planted area was 9,101ha, constituted of 265 commercial stands.The plots were distributed in different soil types, geological formations, drainage densities (from 0.64 to 1.17km -2 ), texture classes and altitudes (from 819 to 1127m).The plots were concentrated in the quadrant formed between latitude coordinates 17ºS and 18ºS and longitude 42ºW and 43ºW.
Each sample unit (s.u.) was characterized by a composite sample of soil.The number of composite samples, or pilot sample, was defined according to the area of each plot, a routine procedure adopted by the company between the years of 2013 and 2015.Two composite samples were obtained in the fields with less than 30ha, three for those with an area between 30 and 50ha and four when they had an area larger than 50ha.Each composite sample consisted of the homogeneous mixture of five simple ones, randomly collected between the lines of planting (a simple one per interlining).According to this routine procedure, 687 samples composed of soil within 0 to 20cm depth layer were obtained.
The soil samples were conditioned in plastic containers, identified and sent to the chemical analysis.The analytical determination of the chemical composition had been made according to Raij et al. (2001): phosphorus (p), potassium (k), calcium (Ca) and magnesium (Mg) using the method of the ions exchanging resi; sulphur (S, SO4 -2 ) by turbidimetry extracted with calcium phosphate, aluminum (Al) in KCl, organic matter (MO) by colorimetry and potential hidrogen (pH) in H2O.
For each chemical attribute, the performance of two exploratory analysis techniques was evaluated; the first without removal of outliers (P1), using all the observations, and the second with removal of outliers applying the boxplot technique (P2).This technique is based on graphical analysis to represent the data variation through quartiles.All observations beyond the boxplot critical limits were identified as outliers.The critical limits were defined from the interquartile dispersion ( 3 −  1 ), with the upper limit represented by " 3 + 1,5( 3 −  1 )" and the lower limit by " 1 − 1,5.( 3 −  1 )", in which Q1 and Q3 are first and third quartile, respectively.
For each exploratory analysis technique, the following sampling procedures were evaluated: Simple Casual Sampling (ACS), Stratified Casual Sampling with strata according to soil type (ACE1), geological formation (ACE2), drainage density (ACE3), textural class (ACE4) and altitude (ACE5).Twelve combinations of the two exploratory analysis techniques and the six sampling procedures were performed.Descriptive statistics were performed for each combination, using coefficient of variation (CV), mean (̅ ) and, in percentage, sample error (E).
The sample size representing the mean value of each chemical attribute was calculated in order to meet the pre-established error of 5%.Assuming the population as infinite, the calculation of the sample size for the sampling procedures was performed according to Shiver and Borders (1996).
The exploratory analysis technique and the sampling procedure that resulted in a smaller sample size to meet the pre-established precision were selected for subsequent analysis (regression and calculation of the maximum size of the soil to obtain a s.u.).Also, the selection of simple random sampling was opted by virtue of its unrestricted random characteristic and wide use.
The minimum amount of composite samples was calculated to meet sample errors of 1, 5, 10, 15, 20, 30, 40, ... and 100%.These data were submitted to non-linear regression analysis using the Levenberg-Marquardt iterative method.The logistic model of three parameters,  = (1 +   − ) −1 , was adjusted to estimate the sample size (n) as a function of the sample error;,  and  are template parameters.
The adherence test employed was Graybill's F (F (H0)).The accuracy of the adjustments was evaluated through the Mean Error Square Root (RQEM) and Mean Absolute Deviation (MDA).Lower values of RQEM and MDA imply higher predictive quality.The data related to the number of outliers removed, coefficient of variation and asymptote of the equations were submitted to correlation analysis according to Pearson (r).The maximum size of the soil for obtaining a s.u. was obtained by the ratio between the planting total area (dividend) and the minimum representative quantity of composite samples.
For statistical effect diagnosis, 5% significance was used in all analysis.Those were performed with the help of the softwares ESRI ArcMap 10.3.1,Curve Expert 1.4 and R version 3.3 (R CORE TEAM, 2017).
In the application of the boxplot technique, there was no composite sample whose total chemical attributes were identified as outlier.The maximum number of outliers identified for the same composite sample was 4 (pH, P, Ca and S), a condition found in only one s.u.In the sequence, 12 s.u.presented 3 outliers (multiple attributes) and 35 s.u. had 2 outliers (multiple attributes).Al did not show outliers.
The effect of each outlier removed was more pronounced in P; assigning the maximum allowed error of 5%, each elimination reduced around 6 to 7 composite samples to represent it.The combination of the procedures that included the removal of outliers and the casual sampling stratified by altitude classes presented better performance (Table 2).For casual sampling stratified by altitude class, the removal of outliers decreased the minimum representative quantity of composite samples to be collected from 636 (one every 14ha, 1:14) to 295 s.u.(1:31) (Table 2).The difference of this sampling procedure relating to the simple casual was of 49 and 14 s.u. in the presence and absence of outliers, respectively.Both procedures, without outliers, were selected for subsequent analysis.
The absence of significant statistical effect by the Graybill's F test (p > 0.05) was observed in all equations generated to estimate the chemical attributes sample size (Table 3).The equations of those attributes of lower variability were more accurate, with lower values of RQEM and MDA.As the coefficient of variation increased (Table 1), the asymptote of the ACS equations (r = 0.95;p ≤ 0.05) and ACE5 (r = 0.92; p ≤ 0.05).Fewer composite samples implied larger sample errors (Figure 2).The maximum size of the soil to obtain 1 s.u.assuming different sample errors is found in Table 4. Table 3. Precision statistics of the equations obtained for estimation of the minimum representative quantity of composite samples (without outliers) as a function of the sample error.Tabela 3. Estatísticas de precisão das equações obtidas para estimação da quantidade mínima representativa de amostras compostas (sem outliers) em função do erro amostral.

DISCUSSION
The coefficients of variation indicated that P, Mg, Ca, S and K presented greater heterogeneity (CV > 35%), requiring a greater sample effort to represent them.This effect is due to the wide variation of their contents in the sampled area.The variability of these attributes corroborates the one observed by Machado et al. (2007); Lima et al. (2010) and Oliveira et al. (2015).
The initial amount of composite samples represented the total planting area.Although this area was composed of genotypes with different ages and planting spaces, the OM and pH distributions were the most homogeneous (CV < 20%), with few outliers (0.15 to 1.02% of the observations) being identified.Lower variability of these chemical attributes (CV <12%) for a eucalyptus crop was also verified by Lima et al. (2017).The outliers were removed with boxplot application only in order to concentrate the analytical results closer to the respective central trends, reducing the sampling error.Although nutritional mean values were relatively close between the techniques of exploratory analysis for most soil chemical attributes, the probability of obtaining the same value from the mean in a new sampling was higher after the removal of outliers.It is emphasized that the elimination of outliers should be viewed with caution, if possible, considering statistical and biological aspects.The use of reference limits for the interpretation of chemical analysis makes it possible to evaluate the consistency of analytical results and of inconsistent values identification (CANTARUTTI et al., 1999).
As expected, the elimination of the outliers reduced the coefficient of variation of the chemical attributes; this reduction (in percentage points) was intensified with the increasing of removed outliers quantity (r = 0.95, p ≤ 0.05).There were more outliers related to Mg (10.77%), followed by P (8.73%) and Ca (5.68%) (Table 1).The other chemical attributes had less than 5% of their data identified as outliers.The sample error decreased by up to 1.66 percentage points (P, equivalent to a reduction of 33.40%).
It is important to emphasize the need to control sample and non-sample errors to minimize variation in soil properties.Non-sampling errors occur when samples are collected, recorded or analyzed erroneously, giving rise to values that deviate from the central tendency (SHIVER;BORDERS, 1996;HEIM et al., 2009;OLIVEIRA et al., 2014).Different instruments and collection professionals, nutrient mobility, irregularity of factors responsible for pedogenesis and inappropriate soil management contribute to the variability of the analytical results (ACQUA et al., 2013;NICOLITCH et al., 2016;TIAN et al., 2017).Even in the minimum cultivation system, haulage fertilizers, when ununiform or in rows, are activities that accentuate variability.The elimination of the entire composite sample should be considered when there is evidence of contamination in the collection or preparation of material.Whenever possible, it is recommended to compare analytical results with the history of chemical analysis and to opt for qualified laboratories.According to IAC (2018), this history allows observing trends, monitoring the evolution of fertility, detecting possible analytical problems and, if necessary, correcting soil management.
The largest sample error found in the collected composite samples was P (E = 4.99%, P1 + ACS) (Table 1).The minimum representative quantity of composite samples calculated to meet the maximum error of 5% varied between the sampling procedures.When simple random sampling without outliers was adopted, the representativeness of all chemical attributes was obtained with 309 s.u.(1:29) (Table 2).This sampling intensity was defined in agreement with the attribute of greater variability; in this case, Mg (CV of 44.70%).In established forest stands, it is suggested a division of the area into homogeneous areas smaller than 29ha for better distribution of collection points.Similarly, other ways of distributing collection points can be found in the literature, such as the division into areas not greater than 10 (CANTARUTTI et al., 1999) or 20ha (IAC, 2018).The minimum of 2 composite samples at sites with a total area of less than or equal to 29ha is indicated to avoid risks with the discard of some analytical result.
The most efficient stratification criterion used two altitude classes; minimum of 295 composite samples (1:31, equivalent to 95.47% in relation to ACS), calculated in accordance with Mg.In practice, this difference was relatively modest.This fact, in the 9,101ha scale, is an indication that the distribution of chemical attributes tended to randomness.It should be noted that the heterogeneity of soil properties depends on the scale analyzed (SIQUEIRA et al., 2017).Therefore, it is probable that stratification becomes quite advantageous at other scales, including adopting other stratum definition criteria.
The definition of homogeneous strata allows the implementation of different fertilization recommendations, one recommendation per stratum.Since strata discriminate the distribution of chemical attributes, nutritional estimates are more accurate as stratification intensifies.Consequently, the quest for precision and accuracy requires investments for sampling strategy planning, field teams training and fertilization quality control.In this scenario, stratification by altitude class (ratio of 1:31) resulted in 248 s.u. for the stratum of up to 1000m and, when greater than 1000m, in 47 s.u.; assuming the maximum of two simultaneous recommendations in the area sampled.It is emphasized that, for stratified random sampling, the allocation of composite samples can be performed proportionally to the area of each stratum (SHIVER; BORDERS, 1996).
The sample intensity that exclusively represents Al, MO and pH varied relatively little or did not change when the boxplot was applied for the outliers removal.Therefore, the stratification and the elimination of its outliers are not indicated to meet the respective sampling adequacy.The economic viability of the unique chemical analysis for certain attributes should be investigated for the purpose of reducing analytical costs.
The equations for estimating the minimum representative quantity of composite samples showed adherence to the data (Table 3).Based on the shape of the curves generated (Figure 2), it was verified the existence of differences in the rate of decrease of s.u.quantity with the increase of the sample error.The curves of the lower variability attributes (pH, MO and Al) can be easily discriminated below the other curves, due to the greater homogeneity of the distribution of these chemical attributes in the soil of the sampled area.Sampling became progressively more costly as more composite samples were obtained to increase accuracy.Assuming larger sample errors for simple random sampling, such as 20 or 30%, the minimum representative quantity of all attributes was 20 (1: 455) and 9 s.u.(1: 1011), respectively.It is noted that the variation of 15 percentage points in the error (from 5 to 20%) reduced 289 s.u. and 10 percentage points (from 20 to 30%), only 11 s.u.This asymptotic trend followed up to 7 composite samples (sampling error beyond 100%).
In order to standardize the applied amount of limestone and/or other industrial waste to supply Ca and Mg, the representativeness (E=5%, Scenario C2) of the other attributes was obtained with a minimum of 284 (1:32) and 286 s.u.(1:32) for simple random sampling and stratified by altitude classes, respectively (Tables 2 and 4).A common operational practice in eucalyptus cultivation of large companies is the use of industrial waste from the pulp and charcoal/steel production chain to supply the demand for Ca and Mg.The application of these wastes, in general, is standardized in a single dose for all management units, considering the operational ease and reuse of waste.Similar to industrial waste, the application of standard doses of dolomitic limestone has become routine in forest areas due to the low levels of Ca and Mg in soils and to adopt the nutritional balance method.By this method, even if the Ca content in the soil is 0 cmolc dm -3 , 600kg ha -1 of CaO is recommended for high productivity, such as 50m 3 ha -1 year -1 (SANTANA et al., 2014), amount equivalent to 2.2ton ha -1 of dolomitic limestone (30% CaO).
In cases where S is not a direct object of the recommendation, because it is an accompanying ion in fertilizers, and there is standardization of the applied amount of Ca and Mg, the sample intensity to represent the other attributes (E=5%, Scenario C3) reduces to 270 composite samples (1:34) for casual sampling stratified by altitude classes (Tables 2 and 4).This premise was based on the exclusion of three chemical attributes in the sample size designation to represent soil fertility.
The initially sampled intensity of 1:13 (687 composite samples: 9,101ha) can be reduced by more than 50%, as long there is outliers removal.This result has great practical importance because it demonstrates that soil sampling can be faster, less costly and laborious.The choice of the sampling procedure should be planned considering the exploratory analysis and sampling procedure, as well as the costs involved with geoprocessing, displacement, sampling and chemical analysis.
The balance between the amount of composite samples, sample errors, non-sample errors and resource availability is required to represent soil fertility.By neglecting the representativeness of essential chemical attributes for plant establishment and growth, silvicultural assessments and decisions become inaccurate and can be mistaken.Sampling adequacy analysis is crucial to minimize costs with soil sampling accurately.
= (1 +   − ) −1 , in which ,  and  are parameters of the logistic model; RQEM = mean error square root; MDA = mean of absolute deviations; p F(H0) = probability value of the Graybill's F test; ACS = Simple random sampling; and ACE5 = stratified random sampling, with two strata of altitude.
Identification and removal of outlier by applying boxplot; ACS = simple random sampling; ACE1, ACE2, ACE3, ACE4 and ACE5 = stratified random sampling with stratification according to soil type, geological formation, drainage density, textural class and altitude, respectively;

Table 4 .
Summary of the relation between maximum size of the plot (ha) to obtain a composite sample and sampling error.Tabela 4. Resumo da relação entre o tamanho máximo da gleba (ha) para obtenção de uma amostra composta e o erro amostral.
in which all chemical attributes are considered; C2 = scenario where only Ca and Mg are not direct objects of the recommendation; C3 = scenario where only Ca, Mg and S are not direct objects of the recommendation; ACS = Simple random sampling; and ACE5 = stratified random sampling, with two strata of altitude.