SpeedTree uses a grid/cell-based culling system that is efficient at culling very large forests. There are three broad steps involved in forest-level culling:
The SDK is set up for the client application to stream in those tree instances that are visible and will automatically flush those instances that are no longer visible.
The SDK organizes the world as a series of cells. As the camera moves, cells go in and out of visibility. As cells become visible, the SDK will provide a list of those cells that need to have their populations streamed in. Hence, the SDK will perform most efficiently when the client has the tree instances already organized by cells so that the data might be passed into the SDK without further on-the-fly processing.
In the reference application, this is done using our example CMyInstancesContainer
, defined in MyPopulate.h/cpp
. In video games, this class would likely be used to store all of the tree instances for a given level. CMyInstancesContainer
is initialized by passing all the base trees and instances in once, and then calling CMyInstancesContainer::SplitIntoCells()
, which organizes the input data in a way that will efficiently feed into the SDK. Note that you are free to use any type of data structure here that best suits your needs. CMyInstancesContainer
is merely an example that works well enough for the needs of the SDK's reference application (although not in as heap-friendly a manner as it could).
There are several classes and structures you'll need to get started:
CView
: Encapsulates many of the common values associated with a particular view like projection and modelview matrices and near and far clipping planes, but it also includes code for deriving values like camera azimuth, pitch, and frustum data.CTree
: These are known as base trees. There is one of these objects for every .stsdk file loaded in a scene. It contains a complete definition of a single tree model.SInstance
: Each base tree may have one or more instances, each with unique position, orientation, and scale. STreeInstance
derives from SInstance
, adding culling-specific data.CCell
: The forest is divided into a series of evenly spaced cells. Cells contain SInstances
.CVisibleInstances
: The central class for instance culling and streaming. It contains the functions for determining which cells are visible, instance culling, and 3D instance level of detail (LOD) computations.S3dTreeInstanceLod
: When the SDK provides a list of 3D instances (3D means they're rendered as 3D geometry, not billboards), it will do so using this structure which contains an SInstance
pointer as well as important LOD data.
For more detail on these objects, see Culling and Population Structures. You'll need one CVisibleInstances
class per view and it should persist as long as the world does. The quantities of the other classes will be determined by how your world is populated.
As shown in CMyApplication::StreamTreePopulation()
, the general procedure followed for each frame where the camera has changed is outlined below. The whole procedure happens very quickly, even for densely populated forests.
Per frame, the general procedure is as follows (please see CMyPopulate::StreamTrees()
for details on our implementation).
CView
object has been updated with the current view. Note that normally one CView
object is used per view (one for the main camera, another for each of the example cascaded shadow maps) and they commonly persist with the world. Sometimes there are challenges aligning SpeedTree's expected view matrices with client application's.CVisibleInstances::RoughCullCells()
with the current CView
. Because the SDK does not store the entire tree instance population for a level (it stores only the visible cells), it doesn't know beforehand the complete extents of the cells. It knows the width and height of the cells because they're in a grid layout, but it doesn't know the height on a rolling terrain without input from the application. RoughCullCells()
returns a long list of cells that would be visible if the grid were composed of infinitely tall cells.CMyInstancesContainer
class holds complete extents for each cell (including height), and the reference application looks up these extents for the rough cull list and sets them for each cell. Use CCell::GetExtents()
on each of the rough culled cells to get its x/y extents.CVisibleInstances::FineCullCells()
with the current CView
. This gives the SDK a chance to use the updated cell extents to give an exact list of those cells that are within the view frustum.CVisibleInstances::FineCullCells()
has been called, a list of newly visible cells can be retrieved from the class by calling CVisibleInstances::NewlyVisibleCells()
. Newly visible cells are those cells that were not in the frustum the last time FineCullCells()
was called.CCell::SetTreeInstances()
on each one. This function takes both a list of base trees and a list of instances, but there are a couple of notes in using this function:CCell::SetTreeInstances()
.CVisibleInstances::Cull3dTreesAndComputeLod()
. When it executes, it will also determine the LOD states for the 3D trees in an array of S3dTreeInstanceLod
objects.CVisibleInstancesRI
class from CVisibleInstances
, allowing it to add rendering components to it, specifically instancing-based rendering code for 3D trees, grass, and billboards. As such the client app can control when this derived class updates its instance list vertex buffers by calling CVisibleInstancesRI::Cull3dTreesAndUpdateGpuInstBuffers()
. Note that if you call this function, it will automatically call CVisibleInstances::Cull3dTreesAndComputeLod()
, so you can skip step 7 above.
To update the instance buffers for the billboard geometry, invoke CVisibleInstancesRI::UpdateBillboardGpuInstBuffer()
for each of the base trees in the scene. Likewise, invoke CVisibleInstancesRI::UpdateGrassGpuInstBuffer()
for each base grass in the scene. Sometimes these instance VB updates will wait on the GPU because the buffer is busy. The SDK double buffers the instance buffers to help avoid this.
The reference application shows exactly how run-time culling and LOD computation works for 3D trees, billboards, and grass in the CMyPopulate
class as called from CMyApplication::Cull()
.
The SpeedTree SDK is often integrated into world builders which means dynamic forest populations, particularly when the camera hasn't changed which is what normally triggers the SDK's streaming/culling code. In this case, when you want to manually trigger a new population event, simply call CVisibleInstances::NotifyOfPopulationChange()
and call your streaming/population function again (the 8 steps outlined above). NotifyOfPopulationChange()
clears out any history of cell visibility and the new cell list will contain all the visible cells so they can be populated again.
While the list above outlines the general approach needed for efficient and accurate tree streaming and culling, there are several other important points to consider.
The entire streaming and culling system is based on organizing trees into evenly spaced cells. It's important for performance reasons that the application can quickly populate the cells in step 6 above. This mostly means not wasting cycles during a render loop determining which instances are going into which cells. We provide the example class CMyInstancesContainer
, defined in the reference application in MyPopulate.h/cpp
. It is not sophisticated in its dynamics or heap usage, but it does show how to quickly and easily organize an existing population of base trees and instances into cells so that they can be efficiently passed into the SDK.
Step 3 above explains that a cell's height extents must be provided. Specifically, these extents are from the bottom of the lowest tree in the cell to the top of the tallest. Again, it makes sense to have this data available ahead of time and not waste cycles determining it in the render loop.
The cell size for 3D trees (set in the SDK by using CVisibleInstances::SetCellSize()
) is an important parameter for performance. The SDK first determines the number of visible cells. The larger the cells, the more quickly this determination is made. Once the cells are determined though, for any that intersect the frustum and aren't at a billboard distance, the SDK must loop through the 3D instances to determine their individual visibility. Having smaller cells reduces this time. The reference application, with its example world units in feet, uses a default cell size of 1200.0 feet (set in the SFC file), world::3dtree_cell_size parameter
), which we believe strikes a good balance for our example forest.
To provide a little more detail on how the SDK handles converting the user's arbitrary instance arrays into lists of 3D trees and billboards (two completely different representations):
S3dTreeInstanceLod
object computed and added to a list.
The organization of multiple types of tree instances into an arbitrary collection of cells can easily lead to a great number of heap allocations if care isn't used. We leave it to the user to implement their own app-side storage data structure, though our reference application's example CMyInstancesContainer
is not an ideal steward of the heap.
To control the SDK's heap behavior, be sure to read about the Reserves System.