TREEDETECTION: AUTOMATIC TREE DETECTION USING UAV- BASED DATA

In this study, we present a toolbox built in ArcGIS, by using ArcPy, and designed to automatically detect trees in high resolution data obtained from Unmanned Aerial Vehicles (UAV). The toolbox (TreeDetection) contains a tool called TreeDetect, which requires three parameters: a raster input, a conversion factor, and an output folder. Three other optional parameters can be changed to improve the detection according to the characteristics of the forest and the raster source. The TreeDetect tool was tested in three study sites: one young Eucalyptus plantation; adult Eucalyptus and Pinus stands; and one Mixed Hardwood natural forest. Distinct raster inputs were also evaluated according to the data availability in each site. The tool was considered efficient to detect trees in the three study areas. The detection accuracy was lower in the natural stand, as expected considering the complex structure of this forest type. All the raster input provided satisfactory results. However, in the homogeneous stand, the Digital Surface Model (DSM) was not effective as the spectral bands. Furthermore, research can be performed with emphasis in different sensors and band combinations, as well in the selection of parameters.


INTRODUCTION
The use of remote sensed data in forest management is a common practice (TANG; SHAO, 2015), since it allows the collection of a large amount of information about the environment with less human effort in field. This is even more important in large or inaccessible areas, where the field work is more exhausting and expensive (TOMPPO et al., 2008;KOCH, 2015). Besides that, the most common Remote Sensing sources have their own disadvantages, usually related to the price and concerning high resolution imagery and Lidar data (KE;QUACKENBUSH, 2011;BALDAUF;GARCIA, 2016;GOMES;MAILLARD, 2016), as well as limitations of temporal coverage and data availability, according to weather conditions for passive sensors (TANG; SHAO, 2015). Because of these limitations and the restricted knowledge on some Remote Sensing techniques, the traditional field forest inventories are still the most common practice in forest companies.
The development of Unmanned Aerial Vehicles (UAV) in the last years quickly started to be used for many forest applications, as the detection of individual trees (HUNG et al., 2012;WALLACE et al., 2016;MOHAN et al., 2017;NEVALAINEN et al., 2017). This technology is easily disseminated for many reasons, as for its low cost compared to similar technologies, possibility to obtain high resolution data, flexibility regardless to data collection, and the capacity to use many different sensors in the same vehicle according to the application (NEX; REMONDINO, 2014;SALAMÍ et al., 2014).
Besides that, it is necessary to consider the challenges to use this high-resolution data, since it presents great variance in color patterns, shape and texture for the same object, and this can cause confusion in the traditional classification methods (KE;QUACKENBUSH, 2011;GOMES;MAILLARD, 2016). In order to facilitate the acquisition of the targeted information, it is possible to apply image preprocessing techniques, including the enhancement of differences between distinct objects, selection of the most appropriate band, radiometric corrections, image smoothing, and resampling (KE; QUACKENBUSH, 2011).
There are many algorithms developed to detect trees on images or on Lidar point clouds, for instance: local maxima; region-based segmentation; boundary-following; region growth; watershed segmentation; voxel space; adaptive segmentation; and adaptive filtering (BRANDTBERG; WARNER, 2006;VAUHKONEN et al., 2012;FAVORSKAYA;JAIN, 2017). Many algorithms can be used in hybrid approaches. In some cases, fusions between imagery and Lidar point clouds are used to have spectral and 3D information (LECKIE et al., 2003;KOUKOULAS;BLACKBURN, 2005). This fusion process can be applied in data derived from UAVs, since those have the spectral information as well as the 3D terrain reconstruction in most of the cases. Besides that, there are few approaches created specifically for high resolution datasets, as the UAV imagery, as well as few automatic tools for tree detection, limiting the applicability of the developed methods to academic uses.
Considering the capacities presented for UAV data collection, there is indication that this data can be used in many forest applications. For that, the first task to be developed is the single tree detection. Therefore, this study had the objective to create a toolbox able to detect individual trees in datasets obtained from UAV. The toolbox was created in Python language for ArcMap.

Tool structure
Any python toolbox is composed by one or more tools. Each tool has its own script and is composed by some parts (called Methods). The python toolbox created for this research is called TreeDetection, and it contains a tool called TreeDetect. For this tool, we used three methods: init (defining tool name and description), getParameterInfo (defining the necessary parameters), and execute (primary execution code).

Parameters
Before the execution of the sequence of tools, it is necessary to input some parameters, as presented in Figure 1. The first input is the raster that will be used for the tree detection. Therefore, it is a required parameter. The tool was created to be used with an orthomosaic generated by UAV images, but it can be replaced by other files, as a DSM (Digital Surface Model), a band composition, a band index, or other high-resolution raster file.  FLORESTA,Curitiba,PR,v. 48,n. 3,jul The second parameter is the Cell size (or resolution) used in the processing. It has a default value of 0.5 m and it is optional to change the value. Smaller the value selected, more detailed is be the process. Thus, in some cases, this could be prejudicial because variations in the same canopy are treated as more than one tree. The third parameter is the Conversion value, a required parameter that can be either 1 or -1. If the value of -1 is selected, it inverts the structure of the raster. The fourth parameter is called Minimum size and it represents the minimum area that a tree crown has in the area. This parameter has a default value of 3 m 2 .
The last two parameters are the Smoothing rate and the Output folder. The Smoothing rate is optional and has a default value of 2 m 2 , which is adequate to most homogenous plantations. However, it should be modified in some cases, as in natural forests with much color variation in the crowns of the trees. The Output folder is required and should to be a geodatabase (gdb).

Execute
The TreeDetect tool was created based on the watershed segmentation. In this script, a group of several geoprocessing tools already implemented in ArcGIS and extensions were applied, as presented in Table 1. The first tool executed in the script is the Times tool, which multiplies the input raster by the Conversion value. The Conversion Value parameter can be either 1 (when it is not necessary to invert the raster values) or -1 (when it is necessary to invert the raster values). The inversion is necessary when the selected raster has higher digital values in the top of the trees and smaller values in the surroundings. It happens, for example, in a DSM, in which the trees have a higher value (altitude) in relationship to the surroundings. For the correct execution of this script, it is necessary that the tree tops have a smaller value than the surroundings, since tools based on the watershed segmentation will be applied. The digital value is highly variable according to different cameras, bands, and possible raster combinations, so the user can select between two values that can be used in the Times tool.
The second tool is the Focal statistics, used to highlight the lowest cell values in relation to the surroundings. The lowest cell values correspond to the tree tops. The tool searches the minimum value in a circle neighborhood, and the radius is defined by the Smoothing rate parameter. The lowest value in the search radius is applied to all cell in this same radius. This step is highly influenced by the Smoothing rate. We suggest that the default value (or a similar value) should be used in areas with homogeneous plantations, since the trees have homogeneous characteristics. In natural forests, it is suggested to use a higher value as Smoothing rate to smooth variations within the same tree crown. Besides that, there is not a formula to calculate the optimal parameter, and the user should try different values and analyze which one provides the best result.
The next tool applied is the Flow direction, which results in a raster that represents the flow direction out of each cell. In the Flow direction, each cell of the output raster is represented by a number that corresponds to one of the eight possible neighborhoods of each cell, considering that each direction has one value (ArcGIS, 2017). By using the Flow direction results, the tool Basin is applied. This tool delimitates the watersheds by identifying the ridge edges in the Flow direction raster. In this case, the result is approximately the tree crown area and it includes ground areas, if they are visible among the trees.
These crown areas are converted to polygons by using the tool Raster to polygon. Then, these polygons are selected based on their areas by the tool Select, and only polygons above a minimum value are utilized in the next steps. The minimum area is defined by the parameter 4 (minimum size), and the value of 3 m 2 is used if the user does not define one. This selection enables the exclusion of small fragments created close to the edges or in big gaps in the canopy, for example. From this selection, a shapefile that represents all possible crown areas above the minimum size is created.
The Zonal statistics tool is employed to locate the smallest value of the area for each crown area. For that, the minimum statistic of the areas delimited in the focal statistics is applied. By executing this tool, a raster that represents the lowest values (more than one if the same value repeats) of each tree area is obtained. The lowest values represent the areas of the top of the trees.
The Equal to tool is used to identify cells that have the same value on the raster resulted from the Focal statistics and the Focal raster. Only the minimum positions have the same value. These return 1, whereas the other cells return 0. The smallest cells, which return 1, are the tree tops. The ones with value of 0 are reclassified by using the Reclassify tool to NoData, whereas the one with value of 1 are converted to shapefile by using the Raster to point. The number of each crown area is included in the tree top points by the Intersect tool. The Delete identical tool is employed to delete possible extra points in the same tree area. These points occur if there is more than one cell with the same value in the same area. Concluding all the steps above, a shapefile that represents the top of all trees of the area is obtained.

Case Studies
Case study 1: young Eucalyptus tree plantations The study area is located in the municipality of Telêmaco Borba (state of Paraná, Brazil) and is property of the pulp and paper company Klabin S/A, which provided the data for this study. The study area is divided into three stands with an Eucalyptus grandis W. Hill and E. urophylla S.T. Blake hybrid, planted in February 2014 (18 months by the survey date) in a grid of 3.75 x 2.4 m.
The UAV used was an eBee-Ag (Sensefly) with RGB camera (Canon PowerShot ELPH 110 HS). This camera has a resolution of 16MP, a sensor size of 6.2 x 4.6 mm (4608 x 3456 pixels), pixel size of 1.33 μm, and focal length of 4.37 mm. Forty-one images were taken to cover an area of 117 ha, using approximately 170 m of flight height and 70% of overlap. The imagery processing was done by using Postflight Terra 3D (3.1.45). The options selected in the initial step were: full keypoints; image scale 1; automatic number of keypoints; and standard calibration. In the dense cloud step, we selected: multiscale; ½ image size; optional point density; and three minimum matches. The DSM was generated by the filter noise, and it was smoothed by a sharp surface. The DSM and orthomosaic archived a resolution of 16.21 cm/pixel. Ground control points were not available, but it was acceptable in order to obtain the total number of trees.
The parameters selected in the TreeDetect were: orthomosaic input raster; cell size of 0.5; conversion value of 1; minimum size of 5; and smoothing rate of 2. The algorithm was applied for one stand at a time after clipping the orthomosaic in the shape of the stands. TreeDetect algorithm used only the red band (first), since no calculation was performed and the algorithm uses only one band (or a mathematical combination). TreeDetect results were compared to the number of trees manually detected on the orthomosaic, as well as the estimated number from the plant spacing. To evaluate the results, Qui-square (χ2) was also applied. In order to apply the χ2 test, the number of trees observed in the images was considered as the expected value in three different stands. The values observed were obtained by the algorithm TreeDetect.
Case study 2: adult Eucalyptus and Pinus tree plantations In this case study, two large commercial stands of approximately 3 ha each were used. The stands are also located in the municipality of Telêmaco Borba (state of Paraná, Brazil) and are property of the pulp and paper company Klabin S/A. One stand is formed of an Eucalyptus grandis and E. urophylla hybrid tree species, planted in a 3.75 x 2.4 m spacing and of 5 years old. The other stand is formed of Pinus taeda L. trees of 7 years old, planted in a 2.5 x 2.5 m spacing. The trees are planted in lines, and the canopy is closed.
The imagery collection was performed by the UAV eBee-Ag, with different cameras for each. For the Pinus stand, a RGB (Red, Green and Blue) Sony DSC-WX220 and a Near Infrared (Red, Green and NIR) Canon S110 NIR were used. For the Eucalyptus stand, a Multispectral camera (Green, Red, Red Edge and NIR) Multispec 4C was used. The RGB camera has resolution of 18.8 MP, sensor size of 6.16 x 4.62 mm (4896 x 3672 pixels), focal length of 4 mm, and pixel size of 1.26 μm. The NIR camera has resolution of 12 MP, sensor size of 6.23 x 4.69 mm (4048 x 3048 pixels), focal length of 5 mm, and pixel size of 1.54 μm. The Multispectral camera has resolution of 1.2 MP, sensor size of 4.8 x 3.6 mm (1280 x 960 pixels), focal length of 4 mm, and pixel size of 3.75 μm.
Each camera required a different flight. The flight height and overlap were the same for all cameras, approximately 170 meters above ground and 85% of lateral and longitudinal overlap, as well as the double grid acquisition (to cover the same, it was twice in perpendicular flight directions). In the Pinus stand, an area of 161 ha was covered by 554 pictures of the RGB camera, and 128 ha by 648 pictures of the NIR camera. The multispectral The images were processed by the Pix4D Mapper (version 3.2.17). RGB and NIR have the same processing settings, and we selected: full keypoints; image scale 1; automatic number of keypoints; geometric verified matching and standard calibration in the initial step; multiscale; ½ image scale; high density; three minimum matches in the dense cloud step; and filter noise and no smooth surface in the DSM option. We obtained DSM and orthomosaic with resolution of 5.95 cm/pixel, RMS (root mean square) error of 0.039 m by the RGB camera, and resolution of 6.22 cm/pixel and RMS error of 0.041 m by the NIR camera.
For the Multispec camera, the Rapid keypoints and Alternative calibration in the initial step of processing were changed, and the optimal point cloud density in the dense cloud step was selected. The other parameters were the same as the ones from the RGB and NIR cameras. In the Multispec, the camera and sun irradiance were corrected by using the values of reflectance from a calibration target. The correction enables to generate a reflectance map of each band from the Multispec camera, instead of one orthomosaic. The resolution of the reflectance maps of each band from the Multispec was 12.09 cm/pixel, and the RMSE error was 0.067 m.
Different combinations of bands were tested for each stand, as well as different parameters, according to the raster input. For the Eucalyptus stand, three options were used: Red Edge band only; the sum of multispectral bands; and a Normalized Difference Vegetation Index (NDVI). For the TreeDetect, we selected for the three cases: cell size of 0.5; conversion value of -1; minimum area of 3 m 2 ; and smoothing rate of 2. For the Pinus stand, three options were also used: the sum of RGB bands; NIR band only; and DSM generated from the NIR camera. For the sum of RGB bands and the NIR band, we selected: cell size of 0.5; conversion value of -1; minimum area of 2 m 2 ; and smoothing rate of 2. For the DSM, we selected: cell size of 0.05; conversion value of -1; minimum area of 1; and smoothing rate of 1.
TreeDetect results were compared to field data inventory, in which all individuals were measured (height and DBH -Diameter at breast height) and accounted by the line and position in the line. The trees were plotted in the orthomosaic by using the field data. Some trees were not found (such as trees with broken trunks, or natural regeneration trees). Thus, the total number of trees measured in the field and observed in the orthomosaic were not the exact same number.
Case study 3: natural mixed hardwood forest The third case study was performed in a mixed hardwood forest, part of the West Virginia University Research Forest (WVURF), located in the city of Morgantown, state of West Virginia, USA. One stand of approximately 11 ha was selected. The UAV imagery collection occurred during the fall season of 2016. The UAV used was a Phantom 3 professional, equipped with a RGB camera (FC300X). The camera had resolution of 4 K, sensor size of 6.13 x 4.73 mm (4600 x 3000 pixels), focal length of 3.61 mm, and pixel size of 1.57 μm. The flight covered 30.2 ha in approximately 70 m above ground, using 85 x 80% of overlap, as well as double grid acquisition. 1571 images were taken.
Eight points were used as ground control. They were positioned in open areas (road and gaps in the canopy), and the positions were recorded by an iGage X900S-OPUS GNSS static receiver. The images were processed by the Agisoft Photoscan Professional Version 1.2.6. The alignment step was processed by using high accuracy, reference pair selection and no Key or Tie point limit. The dense cloud was generated by medium quality and moderate depth filtering. DSM with resolution of 9.97 cm/pix and orthomosaic of 2.49 cm/pix were obtained, and the RMS error was 0.491 m.
CHM (Canopy Height Model) was used to detect the trees. It was generated by subtracting the DSM values with a DTM (Digital Terrain Model) generated from Lidar data with bare earth classified, which was obtained from the West Virginia GIS Technical Center website (http://www.wvgis.wvu.edu/lidar). The DTM had original resolution of 1 m, and it was resampled to the DSM resolution. Because of the difference in resolution, temporal variations and other factors, there were errors in the CHM (negative values) Therefore, this is not suitable for many applications. However, it can be used to exclude large variations in the terrain in this case.
For the TreeDetect tool, the following values were applied: cell size of 0.1; conversion value of -1; minimum area of 10 m 2 ; and smoothing rate of 15. The detection results were compared to field inventory done in 2016, in 27 plots of 0.1 ha. All trees of 10.16 cm or more were identified by species and classified as Dominant (D), Co-dominant (CD), Intermediate (I), and Suppressed (S). Seven plots were excluded from the inventory since they intercepted regions with no forest, caused by the presence of a pipeline and an area with open field in this stand. 398 FLORESTA,Curitiba,PR,v. 48,n. 3,jul

Case study 1: young Eucalyptus tree plantations
The results for case study 1 are presented in Table 2 and Figure 2. Figure 2 also highlights areas with major errors of each plot. The results exhibited good capacity of the TreeDetect to detect small trees with open canopy, since the errors in the number of trees were 4.69%, in maximum.
The algorithm overestimated the number of trees in the plots. Major errors occurred in tan erosion area (T1) and spots with weeds growing (T2). In the stand T3, there are areas with many missing trees. It is possible that some of the trees are too small to be correctly visualized, or they can be weeds instead of Eucalyptus trees.
Despite the errors, the TreeDetect algorithm calculated the total number of trees closest to the true value instead of the estimate from the plant spacing. In the stands T1 and T3, the plant spacing overestimated the total number of trees by approximately 10%, whereas the TreeDetect overestimated by less than 5%. This difference can result in a significant improvement in the production estimative. Furthermore, these results are better than the ones for automatic detection from Artificial Neural Network in the same study site (RUZA et al., 2017). The number of trees detected by the TreeDetect presented non-significant difference in relation to the visual tree count.

Case study 2: adult Eucalyptus and Pinus trees plantations
The detection results are presented in Figure 3 and Table 3. They were compared to the number of trees plotted in the image. The results were considered as very good for the Eucalyptus stand for all raster inputs. The sum of all Multispec bands presented better results compared to only using the Red edge band and the NDVI.  The use of the DSM for the Pinus stand did not provide such satisfactory results, even with higher resolution. Since the number of trees was underestimated, we believe the DSM does not recognize small trees. The use of one band provided satisfactory results and can be used to expedite the processing. These results show similar accuracy to those observed for Eucalyptus tree plantations by Gebreslasie et al. (2011) with IKONOS images and to those observed by Oliveira et al. (2012) with Lidar.

Case study 3: natural mixed hardwood forest
The results for the natural mixed hardwood forest are presented in Figure 4 and Table 4. Because of the large variability of crown sizes and the overlap between crowns, the total number of trees per plot had a broad range. The plot 11 had a total of 31 trees measured at field, but 12 were suppressed, which 400 FLORESTA, Curitiba, PR, v. 48, n. 3, p. 393-402, jul/set.2018Hentz A. M. K. et.al. .ISSN eletrônico 1982 means that they may not be visible in the upper layer of the canopy even if some species have few remaining leaves in the fall season. The greatest errors were observed when comparing the upper layer trees (D, CD and I) in the plots 9 and 21 (details in Figure 4). It can be explained by the presence of suppressed trees, since both plots have a high total number of trees. Plot 3 also presented 60% of detection error when considering the number of trees in the upper layer, but the detection was 100% efficient for the total number of trees. Besides that, some trees may be still occluded, since there are many suppressed trees.
Despite the errors in detection, the results were considered as good, since the detection of single trees in deciduous forests is more difficult due to the complex geometry of the crowns (VAUHKONEN et al., 2012) and their overlap (GULBE et al., 2013). Comparable results were observed by Vauhkonen et al. (2012). They noticed that the success rate in detecting deciduous forest single tree ranged from 43 to 92%, considering different algorithms within two ALS datasets. Also, the results can be compared to the range on accuracy of 50 to 71% using combinations of Lidar and multispectral images in mixed forest, observed by Gulbe et al. (2013).

DISCUSSION
The TreeDetect tool showed to be efficient to detect single trees in the three experimental sites: young and well separated Eucalyptus plantations; adult Pinus and Eucalyptus tree plantations with closed canopy; and mixed hardwood natural forest. The results varied in accuracy depending on the areas. The largest errors were observed in the mixed hardwood natural forest (as expected), since this type of forest is extremely complex (VAUHKONEN et al., 2012;GULBE et al., 2013;TANHUANPÄÄ et al., 2016).
In the study sites, diverse raster sources were applied for the detection, and it was observed that many options can be applied with the tool. Overall, using one band can be as good as using a sum of many bands, as long as a band with good vegetation characterization is used, such as the NIR band (KE; QUACKENBUSH, 2011). In the future, the effect of each band in tree detections in different forest types should be evaluated, as well as the possibility of index and combinations with DSM or CHM. The raw point cloud derived from UAVs can also be evaluated in terms of detection. FLORESTA,Curitiba,PR,v. 48,n. 3,jul The use of CHM as an input source presented satisfactory results in the natural forest. However, using only DSM in the plantation site provided larger errors than using the spectral bands. The DSM may not model the crowns in dense plantations so well due to the overlap between some close crowns, but it can be a good source in areas as the mixed forest, since the upper layer of the canopy presents large crowns that can be delineated by using only the CHM, or even DSM if the area is relatively flat. The use of the CHM in mixed forest is also important, since the large variation on colors observed in the spectral bands can confuse the detection.
Because of these different scenarios, the tool was constructed with parameters that can adjust the detection based on the forest type and source data. For example, it was observed that the resolution needs to be set close to the maximum (considering the resolution of the input file) if DSM or CHM were used, whereas the resolution should be reduced to minimize variations and facilitate the process if the spectral bands were used. Another crucial factor is the selection of the smoothing rate (PANAGIOTIDIS et al., 2016;TANHUANPÄÄ et al., 2016;MOHAN et al., 2017). It needs to be set to higher values in areas of extreme variances, as in the natural mixed forest, whereas this value should be small in plantations in order to not merge neighboring trees.
In the future, more tools can be incorporate in the toolbox for crown delineation and possibly for tree height calculation, for example. These moves should be focused on using the tree detection points as seeds, both for tree height and crown delineation. However, further investigation is still necessary to create complete automated tools for these parameters. Another possible path is to create a complete database of parameters for forest types, possibly considering the most common UAV sensors used in forestry.

CONCLUSIONS
• The proposed tool can be applied to detect trees in UAV datasets with satisfactory results.
• The tool can be applied in different forest types with satisfactory results, but some changes in the parameters are necessary in some cases. • It is possible to use distinct input files, but the detection results vary according to the choice.