SPATIAL STATISTICS ON AMAZON RAINFOREST ASSESSMENT: SPATIALLY STRATIFIED INVENTORY PROCESSING

Biomass and wood volume estimates in forest ecosystems are fundamental to a variety of studies focusing at forest dynamics. These estimates are usually carried out through forest inventory techniques which rely upon statistical computations. This work aims at providing a new methodological approach to forest inventory processing when data is georeferenced. Specifically, geostatistical modelling is performed through ordinary cokriging using tree basal area and tree richness as a cofactor in an Amazonian rainforest site. The spatial interpolation provided the tools for the creation of two disjoint forest strata, which are processed following the principles of Stratified Forest Inventory. The spatially stratified forest inventory processing has shown a 14.29% decrease in error as directly compared to simple random sampling processing. Only two strata have been used following spatial interpolation, albeit it is argued that theoretically any number of them could be generated. The procedure is methodologically feasible and offers a framework to future research on its development and reach. Particularly, the geometries of forest strata and the behavior of spatial interpolation along a gradient of forest vertical structures are of potential interest in future work.


INTRODUCTION
Biomass and wood volume estimates are of utmost importance on a wide range of scientific investigations focusing at forest dynamics (BRAZ and MATTOS 2015;ENE et al., 2016;DUFFY et al., 2017). In tropical forests, however, researchers are confronted with great variability concerning forest parameters and generating good estimators represent a real challenge . To deal with such shortcomings, the development of statistical techniques to extract information from these forests have long been of interest for both foresters and ecologists and much research is still needed to cogent methodology (GWENZI et al., 2017).
In this context, applications of spatial statistics in forest research have been explored in a plethora of studies over the last decade, which represents a real breakthrough on the field (GALEANA-PIZAÑA et al., 2014;PELISSARI et al., 2017;BABCOCK et al., 2018). Particularly, mapping of tree richness and biomass distributions have flourished, and results are encouraging to future research (YADAV and NANDY, 2015). Thus, integrating spatial statistics to estimates of forest parameters is a promising field of investigation and much remains to be explored.
Forest inventories are procedures that make use of sampling techniques to estimate these parameters. Errors are tacitly included in the inductive nature of inventory estimates and therefore their deflation along with FLORESTA, Curitiba, PR, v. 51, n. 4, p. 883-892, out/dez 2021. Kauai, F. et.al. ISSN eletrônico 1982-4688 DOI: 10.5380/rf.v51 i4. 73881 884 tighter confidence intervals is highly desirable (MOLTO et al., 2013;PÉLLICO NETO et al., 2017;KAUAI et al., 2019). This is usually achieved through stratification of sites, wherein a forest landscape is split into smaller geographic units which are supposedly more homogeneous, with respect to the parameter of interest, than the entire region (WHITE et al., 2013). To do so, the distribution of the sought parameter must be known a priori. In the case of forest plantations this is routinely done since these are well-controlled and mapped (SANQUETTA et al., 2014), but in tropical forests the task requires greater work since one does not know the distribution of species beforehand nor the geographical composition of regions.
Many parameters traditionally used in wood volume and forest biomass estimates have been shown to be spatially dependent in a series of studies (AKHAVAN, 2010;BERRILL and O'HARA, 2016;PELISSARI et al., 2017). One of such kind is forest basal area, which is widely employed in wood volume modelling (CYSNEIROS et al., 2016) and has been shown to mediate forest productivity along with tree richness (VILÀ et al. 2013). If biodiversity is a strong driver of forest productivity (FORRESTER and BAUHUS, 2016;DUFFY et al., 2017), then it is reasonable to expect coregionalization holds for basal area and tree richness. The present work builds upon this observation by investigating the use of geostatistics analysis to compute wood-volume inventory confidence intervals.
Here two hypotheses are considered; I) basal area can be spatially interpolated using species richness as a cofactor on geostatistical models (i.e., coregionalization exists), and II) more accurate estimators and smaller errors are yielded using the proposed geostatistical modelling. The performance of the procedure is directly tested against a simple random sampling simulation for total commercial wood volume in the studied region.
A simulation was performed over a georeferenced database from a natural tropical forest in the Amazon basin and encompasses the two considered hypotheses. The high levels of tree richness in Amazonian rainforests (STEEGE et al., 2013) as well as the sparse species distributions  provide an opportunity to test the hypotheses herein considered. Finally, we address a potential application to sustainable forest management in the Brazilian Amazon. Potential applications emerge in the context of continuous monitoring of forest dynamics which can be laborious in Amazonian forest ecosystems, where the high levels of biodiversity and complexity require sampling intensities that can make the procedure infeasible (SANQUETTA et al., 2014).

Study site and data
Simulations were conceived over a Unity of Annual Production (UAP), which consists of a geographic region that is part of an ensemble of forests under sustainable management in Brazil. The UAP consists of a tropical rainforest and has a surface area of 1,997.575 hectares located on the northern region of the Rondônia State, which belongs to the Amazon basin. The major climatic type according to Köppen classification is Aw. The access point to the UAP has UTM coordinates 498,195.65 E and 9,055,578.01 N. Figure 1 displays the location map of the studied site.  Data come from a census carried out prior to harvesting, wherein the geographic location of every tree with diameter at breast height (DBH) equal to or greater than 40 cm had been recorded. Along with the geographical information, tree attributes such as height, DBH, rigorous commercial wood volume (measured after harvesting) and species identification are registered. In total, 59 species are recorded.

Simulation setting
Forest inventory simulation was conceived through Simple Random Sampling (SRP) with 20 m x 100 m sample plots and 4% sampling intensity (393 sample plots). SRP has been used for its simplicity, and moreover each point has equal chance of being sampled, which guarantees greater geographical coverage to subsequent stratification. From these sample plots forest attributes information was extracted; sample plot tree basal areas, sample plot commercial wood volumes and sample plot tree richness. As tree richness is a discrete measure, or the count of tree species per sample plot, this variable has been made continuous by dividing the number of species in a plot by the total number of species found in the entire region and multiplying this ratio by 100.
Wood volume statistical inference in the studied site has followed the principles of forest inventory processing for SRP. The procedure consists of evaluating the parameter standard error, which according to Sanquetta et al. (2014) can be computed as follows: where is the ℎ observation, ̅ is the mean, the number of sample plots and ̅ the standard error. Thus, the confidence interval (CI) is computed as: where corresponds to the t-student scalar for a given degree of freedom and precision.
The wood volume confidence interval obtained through this procedure shall serve as a basis to which the methodology, developed in the following sections, can be directly compared to.

Spatial statistics
For geostatistical analysis, sample plots basal area and tree richness, derived from the initial simulation, were used. Inspection of the data showed that both variables presented a non-normal distribution, thus requiring the transformation of these observations, which has been done through the computation of their natural logarithms.
Many mathematical functions are available for semivariogram modelling. Here, considering the nature of the distribution of variances as a function of distance, the spherical function has been implemented, which is extensively described elsewhere (see MATHERON, 1963). The spherical model was chosen because it produced the best fits when compared to the exponential and Gaussian models.
The geostatistical technique employed for spatial interpolation was ordinary cokriging. According to Goovaerts (1998), the 1 attribute estimator, in the case of only a second attribute 2 , is given by: where parameters ∝ 1 and ∝ 2 are weights determining the influence of attributes measured in locations , that is, the set { 1 ( 1 ), 1 = 1, … , } represents the 1 attribute values measured 1 times at locations 1 . Thus, a linear system of equations may be solved for ∝ 1 and ∝ 2 such that the variance of the error, i.e., ( (1) * ( ) − ( )), is minimized. Pearson's correlation was computed to determine the correlation of the two attributes.
The interpolation error was assessed through the Root Mean Square Error (RMSE), calculated as follows.
where ̂( ) is the estimate, ( ) the observation and the number of observations. In addition, the coefficient of correlation (adjusted R²) was used.
This procedure allowed for the generation of two classes, i.e., two strata. Theoretically, any number of classes could be generated by this process, however, for simplicity, the entire region was split into two classes according to a limit of basal area defined in such a way that each class has approximately half the total number of observations. Borderlines of each class were set by creating intervals over the geostatistical layer created after geostatistical interpolation in ArcMap TM 10.5 software by Esri. ArcMap TM is the intellectual property of Esri © and is used herein under license. These intervals are constructed such that shapefiles are approximately equal in area.

Stratified inventory processing
The sample plots scattered over the inventoried region, from SRP, were intersected with the classes generated by spatial interpolation. This provided two sampling sets, each of which samples a class with differing basal area mean. The scenario corresponds to a Stratified Inventory Processing (SIP) and according to Sanquetta et al. (2014), the estimator of the variance is computed as: where ℎ 2 is the weight of the ℎ class in the population (i.e. class area over total area), ℎ 2 is the variance inside class ℎ, ℎ the number of samples plots in class ℎ and the potential number of sample plots in the population. Thus, the confidence interval is computed as follows: For a stratified mean ̅ , which is defined to be the mean of each class multiplied by its corresponding weight. We refer to SIP attached to the spatial statistics procedure described in the previous section as Spatially Stratified Inventory Processing (SSIP), for simplicity.

Data analysis
Statistical divergence between classes for both basal area and tree richness, as well as for wood volume, was assessed through Multivariate Analysis of Variance (MANOVA). This procedure has been implemented to test if significant differences existed between classes, which provides information to whether the proposed stratification is relevant. According to MacCulloch (1990), MANOVA is conceived as a linear combination of observations , that is: where is a single combined response variable. Furthermore, confidence intervals for commercial wood volume means inside each class were achieved through Bootstrap resampling with 5,000 iterations. This allowed us to spot any overlapping between classes, since the objective was to have completely disjoint classes.
In this work all confidence intervals were conceived at a 95% probability level. Data have been treated in R Studio 3.4.1 (R Development Core Team 2017) along with ggplot2 (WICKHAM, 2009) and gstat packages (PEBESMA, 2004). Spatial data have been analyzed and treated in ArcMap TM 10.5 throughout the work.

RESULTS
The resulting equation from the linear regression between predictions and measurements is approximated as = 0.57744 + 43.04404. Both the slope and the intersection are statistically significant (p < 0.05). The coefficient of correlation R² was 0.61, the RMSE 0.67 and no anisotropy had been identified in the model. Pearson's correlation for basal area and tree richness was 0.72. Semivariogram analyses were conducted, and the results indicate similar spatial dependence, for both response variables, at a scale of about 650 m. Semivariograms and the stratification process maps are shown in Figures 2 and 3   MANOVA identified significant statistical divergence (p < 0.05) between classes for the three tested variables (i.e. basal area, tree richness and commercial wood volume) with 0.027 computed for Pillai's trace. The decomposition of the MANOVA into pairwise analyses has also shown statistically significance for all groups (p < 0.05).
Bootstrap resampling simulation yielded the wood volume density plot means shown in Figure 4. Class 1 wood volume mean was inspected to be 58.180 m³. ha -1 with a confidence interval, at 95% probability, ranging from 52.800 to 64.030 m³. ha -1 . On the other hand, class 2 wood volume mean was 72.425 m³. ha -1 with its confidence interval ranging from 65.180 to 80.075 m³. ha -1 . Thus, the two classes come out to be disjoint with respect to the confidence intervals at a 95% probability level. Data descriptive statistics is shown in Table 1 and wood volume density plots are shown in Figure 5.   Figure 5. Wood-volume density plots without classes (i.e., prior to interpolation) and for each class created after geostatistical modelling. Figura 5. Gráficos de densidade de volume de madeira sem classes (i.e. antes da interpolação), e para cada classe gerada após interpolação.
The relative processing error obtained from SSIP was 6.23% and the confidence interval ranged from 119,884.424 to 135,807.063 m³. Diversely, SRS yielded an error of 7.12% with the confidence interval ranging from 119,448.307 to 137,789.352 m³, thereupon reporting 14.29% increase in error as compared to SSIP. However, both estimators have overestimated total commercial wood volume stock, which has been observed to be 111,548.600 m³.

DISCUSSION
The two most common and well-supported hypotheses to explain diversity effects on the functioning of ecosystems are the sampling and niche complementarity effects (FORRESTER and BAUHUS, 2016). The latter postulates that higher species diversity induces greater functional traits abundance which optimizes ecosystem resources partitioning, thereupon increasing ecosystem functioning. The sampling effect or selection effect (MENSAH et al., 2018), on the other hand, draws the attention to the fact that the probability of dominant species occurrence increases with increasing diversity levels, driving ecosystem productivity. These hypotheses have been considered in depth (LIANG et al., 2015;LIANG et al., 2016;DUFFY et al., 2017) for explaining the positive diversity-productivity relationship, which is supported to some extent by the present work given the inspected coregionalization of basal area and tree richness.
Basal area coupled with tree richness as a cofactor in geostatistical modelling provided us with the tools to forest stratification. As White et al. (2013) points out, sample plots must reflect forest attributes variability and therefore stratified forest inventory procedures represent a feasible solution to ecosystem assessment. In this context, remote sensing techniques have been extensively implemented to stratify natural forest sites (WHITE et al., 2013). However, we argue that scale and cost of such technologies may result in an application bottleneck, since sample plots must be set up regardless of prior strata for an ecosystem not previously sampled. This issue may be circumvented through the methodology here deployed.
The disjoint classes obtained allowed for a clearer analysis of the procedure potential for it would not have been easy an evaluation on a greater number of them. Nevertheless, an infinite series of classes could be derived once the interpolation is set, giving rise to a wide range of different class geometries. Moreover, the methodology has been applied to a specific forest vertical structure (i.e., DBH ≥ 40cm). Applying geostatistics to modelling of basal area and species richness on larger DBH intervals may provide more accurate tools to biomass or wood volume estimates in tropical forest ecosystems, since some species have different distribution patterns  (PÉLLICO NETO et al., 2017) and these are indirectly considered in the models.
For a direct comparison of procedures, nonetheless, SSIP showed superior performance as compared to SRS, with a 14.29% relative error reduction for the same probability level. However, this result must be confirmed in future research by running many pairwise simulations for both SSIP and SRS, with the former being computed as described earlier. The scope of this work rests solely on the method formulation along with a direct comparison, and conclusions about its higher performance are not imperative without further investigation.
One important application of the proposed methodology concerns natural forest management in the Brazilian territory. After harvesting activities, prescribed by sustainable management plans, forest sites under management must undergo monitoring throughout the stablished cut cycle (SANQUETTA et al., 2014). To reduce sampling intensities in this process, it might be feasible to stratify forest sites and to allot sample plots strategically to forest dynamics monitoring. This might be achieved by using census data that comprise forest vertical structures spanning a larger DBH interval, which is the case for Brazilian tropical forests under management.
It is often required from ecological studies the assessment of mortality rates, wood volume growth rates, as well as potential damage in forest ecosystems caused by sustainable management activities. These analyses are carried out by continuous forest sampling (SANQUETTA et al., 2014), which in the case of Amazon ecosystems must be implemented with such a sampling intensity that, from an operational point of view, can make the procedure infeasible. To circumvent such shortcomings, a higher sampling intensity, or even a census as in the case of sustainable forest management in Brazil, can be used to create permanent strata, as proposed in the present work, to sample continuously the ecosystems in such a way that workload is decreased in subsequent forest evaluations. It remains to be studied, however, whether the variances of measured parameters within generated strata are conserved through continuous measurements in time.
The Brazilian Agricultural Research Corporation (EMBRAPA) has published guidelines for the setup of permanent sample plots to continuously measure natural forests in Amazon (SILVA et al., 2005). These guidelines provide recommendations for sample plot sizes and shapes which are more suitable to efficiently capture phytosociological parameters in the assessment of forest dynamics, after sustainable management plans have been carried out. The method developed and proposed in the present work might be suitable within this context, potentially deflating estimate errors or reducing sampling intensity.

CONCLUSIONS
Basal area and tree richness have proven to be feasible variables for spatial interpolation through ordinary cokriging.
• Forest site stratification can be done prior to inventory processing allowing for more homogeneous geographic units and therefore potentially deflating errors. • Stratified forest inventory showed augmented performance as compared to simple random sampling procedure. • We address forest vertical structures and classes' geometries as important subjects of future investigation.
• The proposed methodology for inventory processing in different classes of diameters must be assessed in order to explore a wider range of applications. • Lastly, the spatially stratified inventory processing having as its basis methodologies other than SRS must be studied in order to assess the quality of generated estimators.