HOW TO ESTIMATE BLACK WATTLE ABOVEGROUND BIOMASS FROM HETEROSCEDASTIC DATA?

This study developed a system of equations for predicting total aboveground and component biomass in black wattle trees. A total of 140 black wattle trees at age 10 years were measured regarding their diameter at 1.30 m height above the ground (d), total tree height (h), basic wood density (branches and stem), and biomass (stem, crown, and aboveground). We evaluated the performance of linear and nonlinear allometric models by comparing the statistics of Radj., RRMSE%, and BIC. Nonlinear models performed better when predicting crown biomass (using only d as an independent variable), and stem and aboveground biomass (using d and h as independent variables). Adding basic density did not significantly improve biomass modeling. The residuals had non-homogeneous variance; thus, the fitted equations were weighted, with weights derived from a function containing the same independent variables of the fitted biomass function. Subsequently, we used a simultaneous set of equations to ensure that the sum of each component's estimated biomass values was equal to the total biomass values. Simultaneous fitting improved the performance of the equations by guaranteeing the components' additivity, and weighted regression allowed to stabilize error variance, ensuring the homoscedasticity of the residuals.


INTRODUCTION
Forest biometricians have been using empirical models to estimate total tree biomass and its components, such as leaves, branches, fruits, bark, stems, and roots. Allometric models take place from easy-to-measure variables such as diameter at 1.30 m above the ground (d) and total tree height (h). Nevertheless, direct biomass determination, although more accurate, is a highly costly process. Therefore, statistical modeling has been used in forest inventories to reduce the field's operational effort, consequently reducing costs.
Biomass modeling is usually done independently, with the models being fitted without considering the interdependence between tree components. Notwithstanding, the sum of the components' biomasses does not produce the same result as obtained using the total biomass equation, not being biologically consistent (BEHLING FLORESTA,Curitiba,PR,v. 51,n. et al., 2018). Thus, biomass modeling requires estimators that guarantee the additivity of estimates: compatibility between component biomass and total biomass estimates. An alternative to ensure the additivity of the components is using systems of simultaneous equations (SANQUETTA et al., 2015).
A good system of equations must meet three premises, being i) the performance in the quality of biomass estimates; ii) consistency, referring to the additivity of components with the total biomass; and iii) efficiency of the equation estimators, resulting in less variance of the estimated parameters. Among the various systems of equations applied in forest biometry, the method of simultaneous fitting of equations SUR (Seemingly Unrelated Regression) stands out, which has been used in several studies (GERTRUDIX et al., 2012;SANQUETTA et al., 2015;and FU et al., 2017).
The SUR method, developed by Zellner (1962), estimates all models' parameters simultaneously so that each equation can contain information provided by the other equations. The additional information is used to describe the system, resulting in greater accuracy in estimating the parameters. The increase in efficiency in estimating parameters is proportional to the increase in the correlation between the error terms of the different equations (JUDGE et al., 1988). When studying the system of equations, Parresol (2001) concluded that the SUR procedure, for linear functions, and the NSUR (Nonlinear Seemingly Unrelated Regressions) procedure, for nonlinear functions, resulted in more efficient estimators for component and total biomass.
The residuals from forest biomass modeling are usually highly heteroscedastic due to the non-constant variance of component and total biomass from the increase in trees' dimensions (d and h). In these conditions, modeling the variance structure is an interesting alternative to correct residual's heteroscedasticity through weighted regressions (PARRESOL, 2001). Modeling this structure allows for its stabilization, that is, the weight functions allow the distribution of the residuals to be homogeneous, which is an indispensable condition for the validation of hypothesis tests during the fitting process. To this end, it is suggested that biomass estimates by systems of simultaneous equations be made by applying either the WSUR (Weighted-Seemingly Unrelated Regressions) or the WNSUR (Weighted-Nonlinear Seemingly Unrelated Regressions) procedure, considering the satisfactory performance of these methods in previous research (e.g., KRALICEK et al., 2017).
Thus, this study aimed to develop a system of equations to predict the aboveground biomass and its components in black wattle stands (Acacia mearnsii De Wild.). It was expected that this system improves biomass estimation while reducing the impact of residual heteroscedasticity.

Data description
The database is derived from black wattle (Acacia mearnsii) stands, at age 10 years, established in three different locations in the Rio Grande do Sul State (Cristal; Encruzilhada do Sul and Piratini), southern Brazil. These regions are encompassed by a humid subtropical climate (Cfa), according to Köppen classification (ALVARES et al., 2013). A total of 12 circular plots with an area of 315 m² were randomly installed. Within these plots, all trees were bucked for biomass determination, totaling thus 140 trees. The initial spacing was 3 m x 1.75 m, comprising 1,904 trees per hectare. All trees were measured with respect to their diameter at 1.3 m above the ground (d), total height (h), stem biomass (wood + bark), and crown biomass (branches + leaves + reproductive organs). It worth noting that all branches were suffered no distinction (thin, large, and dead) in this study.
Subsequently, stem and crown biomass components were weighed using a 5-g precision digital scale. For laboratory analysis, samples were taken from different portions of the stem and crown. These samples were weighed with a 1-g precision digital scale and oven-dried at 65 °C until constant weight. To determine the dry biomass of the different components and the total individual aboveground biomass, we used moisture weighting, obtained through the following equation: Where: = (dry) biomass (in kg); = oven-dried biomass of the sample (in kg); = green biomass of the sample (in kg), and the subscript i refer to each biomass component.
Also, the basic wood density was also determined for both stem and branches (representing the crown component). This process followed the NBR 11941 instructionshydrostatic balance method (ABNT, 2003 Where: = basic wood density (in g.cm³); = dry mass of sample (in g); = saturated volume (in cm³), and the subscript j refer to each sample.

Data analysis
A total of 140 trees were used for fitting the allometric equations. Six allometric models (four linear and two nonlinear) were examined when predicting both components and total biomasses ( Table 1). The coefficients of all equations were estimated using the package nlme (PINHEIRO et al., 2019) for linear and nonlinear models in the software R (R CORE TEAM, 2018). Table 1. Allometric models to estimate the aboveground biomass of black wattle trees. Tabela 1. Modelos alométricos testados para a estimativa da biomassa de árvores de acácia-negra. Model Where: = biomass (in kg); = diameter at 1.3 m above the ground (in cm); ℎ = total tree height (in m); = wood basic density (in g.cm -³); β 0 ,β 1 , β 2 , β 3 = coefficients to be estimated; = associated error (with normal distribution, zero mean, and constant variance), and subscripts i and j refer to the biomass of i tree in j component.

Adjusted coefficient of determination (
. 2 ), relative root mean square error ( %), and the Bayesian information criterion ( ) were assessed to indicate the model performance . As the number of parameters varies among the evaluated models, the added parameters were accounted for when calculating the goodness-of-fit indicators.
summary(modcopa) dados$estcopa = predict(modcopa, newdata = dados, level = 0) dados$rescopa = (dados$B_copa- Where: = number of observations; ̂ = predicted biomass; = observed biomass; ̅ = mean observed biomass; = number of model parameters; = natural logarithm; = likelihood; and = number of variance components, and the subscript i refer to each observation. The Shapiro-Wilk test verified the residuals' normality at a 5% probability. Additionally, we used a graphical analysis to examine residuals' homogeneity. In cases when heteroscedasticity was observed, weighted regression was used to ensure residual homoscedasticity. According to the statistical criteria described, weighted regression was performed for each component's best model and the whole plant. Thus, the residuals () were predicted by the natural logarithm of the original model's independent variables: ln(̂2) = 0 + 1 ln( 1 ) + ⋯ + ln ( ) The best performing models for estimating the biomass of each component and total biomass were refitted using the diagonal weight matrix, calculated according to the following equation: Where: = weight attributed to observation i; X 1… 1… = independent variable raised to the coefficient estimated for the residual equation (see Equation 12).
To ensure the biomass components' additivity, the models were fitted using a system of simultaneous equations (PARRESOL, 2001). A system of equations formed by stem biomass, crown biomass, and total biomass was considered. The WNSUR (Weighted-Nonlinear Seemingly Unrelated Regression) system was applied to the SAS statistical program, and the graphics were edited using R software. The performance analysis of each model was based on the same goodness-of-fit statistics. The procedure was performed as follows: Where: = stem biomass (in kg); = crown biomass (in kg); = aboveground biomass (in kg); 1,2 = independent variables; 1,2 = parameters of the regression models; = associated error associated; and subscript i refer to each observation. Table 2 displays the descriptive statistics for black wattle tree attributes. Biomasses presented the highest variations, evidenced by the coefficient of variation (CV%) values -ranging from 63.74 to 91.83%. The crown component, however, presented the higher variation. Basic wood densities (for both stem and branches) presented the lowest variations, with CV% of 9.29% for stem component and 11.56% for branches.  Table 3 displays the estimated coefficients and goodness-of-fit statistics for all allometric equations when predicting stem, crown, and aboveground biomass. The overall goodness-of-fit statistics were satisfactory, although a slight superiority of nonlinear models, M4 for the crown biomass, and M5 for both stem total (aboveground) biomass. The inclusion of wood basic density (ρ) as an auxiliary variable promoted no improvements in biomass estimates.

RESULTS
The goodness-of-fit statistics were poorer for crown (Wcr) estimates, in which the R 2 adj. ranged from 0.70 to 0.84 and 36.07 to 50.37% for RRMSE. The parameter 2 of M5 was non-significant, and thus this model became identical to M4. Therefore, we removed this from Table 3. M4 presented the best performance when predicting crown biomass (R 2 adj. = 0.84, RRMSE = 36.21%, and BIC = 11.61). The estimates of stem biomass (Wst) were always more accurate when than those for the crown component. The R 2 adj. was never lower than 0.91, and the RRMSE higher than 19.16%. Both M2 and M6 presented non-significant coefficients, and its removal resulted in mathematical formulations identical to M1. M5 presented the best performance when predicting stem biomass (R 2 adj. = 0.96, RRMSE = 13.12%, and BIC = 13.98). Where: 0,1,2,3 = estimated coefficients; R 2 adj. = adjusted coefficient of determination; RRMSE (%) = relative root square mean error; BIC = Bayesian Information Criterion; W = Shapiro-Wilk test; * = significant at 5% probability; ** = significant at 1% probability.
Aboveground biomass (W) estimates behaved similarly to stem biomass estimates (Wst), as this component corresponded to approximately 84% of W. The adjusted coefficient of determination ranged from 0.89 to 0.96, while the RRMSE ranged from 14.14 to 22.15%. M5 presented the best performance, with higher R²adj. (0.96), and lower BIC (15.01) and RRMSE (14.14%). The parameter B3 from M6 was non-explicit significant, and its removal resulted in similarity with M2. Thus, this model was removed from Table 3. The intercept of M3 was also non-significant; we removed this parameter and refitted the model.
The Shapiro-Wilk test indicated that all models' residual distribution, including stem and crown components, presented non-normal distributions. Thus, we weighted the residual variance. This process ensures biomass estimation reliability and compatibility among tree components when new inventory data are available. Figure 1 displays the residual distribution, in which we noticed a remarkable increase in residuals as the diameter at 1.3 m above the ground (d) increased. Thus, the error variance is not constant (or heteroscedastic).
This phenomenon occurs because the variability of component biomass tends to increase when individuals' dimensions increase. With the violation of homoscedasticity, the weighted regression method was used to ensure more consistent estimates for each of the biomass components and total biomass to obtain less variance of the parameter estimates.
Weighted regressions considered weights (w) for each component according to the equations: Weighted regression was performed based on the structure of the original equation. The best-performed models for each component were refitted, accounting for the weights for each observation.
After weighting, the new residual distribution presented homoscedasticity, and a small variance was noticed, indicating, thus, the improvement in the models' variance structure, reducing errors in estimating the dependent variables ( Figure 2). Table 4 displays the estimated parameters considering the weighted regression. The weighting process improved the fitting performance, providing a biological interpretation of the parameters (represented by its components additivity). Thus, biological consistency was guaranteed: compatibility between component biomass and the predicted aboveground biomass. The use of simultaneous weighted regression also improved residual distribution, allowing error modeling to guarantee the homogeneity of the variances (Figure 2). Table 3. Fitted coefficients and fitting statistics when predicting crown, stem, and aboveground biomass of black wattle trees using simultaneous equations. Tabela 4. Coeficientes e estatísticas de ajuste para as biomassas de copa, fuste e aérea de árvores de acácia-negra usando ajuste simultâneo.

DISCUSSION
Several authors addressed the inclusion of basic wood density in allometric models, such as Chave et al. (2014), Tashi et al. (2017), andCoutinho et al. (2018). These authors found that this auxiliary variable improved the performance of allometric equations. However, in this study, we noticed non-explicit improvement when adding the basic wood density in allometric models for predicting the aboveground biomass (and for its components). As we used trees at the same age, we believe that the basic wood density presented no apparent relationship with biomass. However, if trees from different stands (and different ages) were sampled, the basic wood density could improve the estimates. Studies show that basic density tends to increase with age due to the increase in cell wall thickness and the decrease in cell width (ZAQUE et al., 2018).
The performance of allometric models (Tables 3 and 4) suggested that the aboveground biomass of black wattle trees can be estimated using traditional variables such as the diameter at 1.3 m above the ground (d) and total tree height (h). Thus, the development of an accurate allometric equation allows biomass estimation without cutting and weighing new trees in similar conditions.
The best-performed model indicated that the relationship between component biomass and the variable d is not linear, as evidenced by António et al. (2007).
However, the graphical analysis revealed heteroscedasticity of variance in all components' residuals, notably for trees with larger diameters (Figure 1). Heteroscedasticity is common in equations involving the relationship between biomass and tree attributes (XIANG et al., 2011). It is expected that the sum of trees component biomass estimates correspond to the predicted total biomass since total biomass comprises all components. However, although additivity can be determined considering total biomass as the sum of the components obtained through an independent fitting, this estimation is inefficient compared to the application of seemingly unrelated regressions. Thus, it can be said that additivity is a fundamental property of biomass equations, ensuring biological consistency. Recently, several researchers have used SUR and NSUR to develop additive biomass equations, such as Dong et al. (2014), Sanquetta et al. (2015), Zhao et al. (2015), Zheng et al. (2015), and Coutinho et al. (2018).
The application of seemingly unrelated regression ensured a better performance in the quality of estimates, guaranteeing the homogeneity of the residuals (one of the assumptions of the regression). Furthermore, the condition presented in Table 4 ensured biologically consistent results, in which component biomasses are additive to aboveground biomass. In other words, the application of seemingly unrelated regression in this study provided positive aspects in tree biomass modeling. Likewise, when studying the use of simultaneous fitting, Correia et al. (2008), Zhang et al. (2016), and Coutinho et al. (2018) noted substantial improvements in goodnessof-fit statistics when compared to the traditional approach. In their studies, equations decreased standard errors and led to biological consistency regarding the components' additivity.
Another critical aspect addressed in the present study was ensuring the residuals' homoscedasticity through weighted regression. McNicol et al. (2015) also evaluated the use of weighted regression with the previous definition of weights for tree biomass data and observed that the residuals were not homoscedastic. This indicates limitations when defining the weights in advance and not considering the residuals from the unweighted fitting.

CONCLUSIONS
• Fitting weighted equations through the estimation of weights allowed the obtainment of stable variances; • The use of simultaneous equations guaranteed the biomass components' additivity, leading to realistic results when considering biological aspects.