Humans only have two ears, but through a set of cues and calculations happening in our brain, we can locate sound in three dimensions: in range (distance), in elevation (direction above and below) and in azimuth (the horizontal plane: front, rear and either side). In traditional ambisonics we can only represent azimuth and elevation but not distance, which can be simulated using other techniques.
Our brains can estimate the location of a sound source by taking cues derived from one hear (monaural cues), and by comparing those received at both ears (binaural cues).
The Interaural Time Difference is the difference in arrival time of a sound between the two ears.
The Interaural Level Difference is the difference in amplitude of a sound between the two ears.
A sound source propagating from one of our sides will arrive to that side's ear before it will arrive to the other. This difference is expressed in milliseconds and based on this we can calculate the angle of the sound source to the head (azimuth). This angle is 0° (minimum time difference) when the sound is coming from our front, 90°/-90° (maximum time difference) when coming from either side and 180° (again minimum time difference) when coming from our back.
Likewise, a sound propagating from one of our sides will arrive with an higher amplitude to that side's ear than to the other.
Approaching sound waves hit our body and the HRTF considers how the shape, density and size of our head, nasal and oral cavities, shoulders, and chest, transform the sound, and affect its perception, through reflections, refractions and filtering that boost some frequencies and attenuate others.
The HRTF is unique for everyone and differs significantly from person to person. Despite that, there are generic HRTFs, based on standard calculations and approximations, that are available online and used for many purposes.
The first one is the ratio between the direct sound and its reflections. This is especially true for sounds happening indoors, where two types of sounds arrive at the listener: the direct sound, arriving at the listener without being reflected at a surface, and the reflected sound, that has been reflected at least once at a surface before arriving at the listener. The ratio between these two sounds can give an indication about the distance of the source. In addition to this, the time difference between the arrival of the direct wave and its first strong reflection also gives the listener information about the distance from the source. Nearby sources create a relatively large Initial Time Delay Gap (ITDG), because the first reflections have a longer path to take than the direct sounds. When the source is distant, the direct and reflected sound waves have similar path lengths.
Another clue is the spectrum of the perceived sound. In facts, sound naturally loses power while travelling through a medium (and this is the reason why closer sounds appear louder in general), but this attenuation happens faster in the high part of the spectrum, because high frequencies require more energy to travel through a medium than low frequencies. Therefore, a distant sound source will sound more muffled than a close one.
Our last clue is the curvature of the wave. Sound waves propagate spherically from its source, but the curvature of the sphere is more noticeable in the near field than in the far field, where the curvature is so little that the sound wave approximates a plane wave. This will affect the time difference at which the sound wave arrives at our ears, making it possible to detect the incoming direction of distant sounds, but difficult to estimate their exact position.