There is a slight misconception about what a game engine actually is. One could look at programs like Unity, Unreal, or Godot and assume a game engine must have a full GUI editor to drag and drop object "prefabs", proper lighting on objects with various types of light sources, or audio of any kind. While these programs do fall under the category of a "game engine", they are very extended and versatile game engines, built to handle a vast array of use cases to make different types of games, like FPS shooters, card games, board games, tower defence games, and so on. Consider games produced by large studios, their games have to run on some kind of game engine, well what engine do those guys use? They built their own. Using in-house engines built and designed for one game, or game series. Most of these engines never see the light of day as an actual product on their own. They only exist to run the game(s) they were built for.
What a game engine really is, is a framework or set of tools that help game developers create, test, and run game efficiently and easily. Engines primary focus is how physics is calculated, how objects colliside, transition between different game states, player states, or really anything in a game that requires a state machine. The engine is designed to handle these things for you.
Some creative liberties were taken when calling the blocks in this engine "voxels". Voxel is a short hand for "Volume Pixel", instead of having a pixel on the screen at some (x,y) coordinate, a voxel is the three-dimensional counterpart. Using (x,y,z) coordinates to determine its position, this position has a colour attached to it, which gives it its pixel-ness. In this project, instead of colouring the voxel one colour, I used textures.
Texturing the voxels is quite simple, when the triangle information is processed, we can squeeze some extra data in with bit shifting. Adding information such as what voxel id or type is associated with the triangle, which face of the voxel the triangle is on, shading and light for implementing proper lighting, and an image to sample textures from.
Now that the GPU knows what kind of cube we are dealing with, and where on the cube we are, we can sample points of an image for colour and apply it to our pixel. We also need to perform some fancy matrix transformations to get the correct perspective. For more details on how shaders work, checkout the GPU Shaders document.
Since in our code, the world storing our voxels is really just an array of integers, we can start by assuming integer 0 means there is no voxel, and integer 1 means there is a voxel. We can make masses and shapes by picking and choosing which coordinates get assigned integer 1. You might quickly notice a problem if you try making a large, solid shape, such as a massive cube. Since, assuming so far you've only implemented rendering a 6 sided cube with image textures, and understandably haven't been thinking too hard about performance yet, we have a problem. There are hundreds of faces being calculated and rendered that we don't see, such as when two blocks are beside each other, the adjacent faces can never be seen, but we still render them! Why? For absolutely no reason! So we can implement face culling,
https://learnopengl.com/Advanced-OpenGL/Face-culling
Since in our code, the world storing our voxels is really just a array of integers indexable by some (x,y,z). So, if we assign specific integers to represent different block types, we can use math functions to determine which indexes get which integer values. To start off simple, let's say any voxel with pos.y <= 64, is assigned an integer representing a stone block; let's call this height the 'base height'. Next, lets go and change all stone blocks with three or less stone blocks above, to be turned into dirt blocks, and leave the very top block as a grass block.
This gives us a nice world that looks like Minecraft's superflat world preset...
But that's not very interesting, we want mountains, valleys, beaches, perhaps some a few happy little trees. So we can look towards smooth noise functions, specifically functions that take (x,y) or (x,y,z) values and return a float, where values sampled from adjacent points change smoothly. For more details on smooth noise, checkout perlin or simplex and the concept of Fractional Brownian Motion. For simpler terrain, we can restrict ourselves to 2 dimensional noise, we can sample noise from a generated noise texture, and treat its float value, in the range [-1.0, 1.0] as a height. We can multiply this height by a largeish integer, let's say 48, and add it to our base height for every (x, z) in our world. Due to the nature of our noise function, the changes in height will be gradual, and with Fractional Brownian Motion, the terrain can look interesting, or realistic.
We can make the terrain even more fun, we can use our (x,y,z) noise function and cut caves into the lower solid terrain. If you implement these ideas well, you can easily make more room for all kinds of caves by just increasing the base height. If you wish to throw realism out the window, maybe try using (x,y,z) noise as an additive function rather than subtractive, or play with cellular noise, ridge noise, noise with turbulance, abuse your constansts for Fractional Brownian Motion, or if you're feeling particularly fancy, you could apply your own matrix transformations to your sampling to warp the final result.