There are several terms that researchers, scientists, and psychologists have been used to describe this sensorial phenomenon from the early research until nowadays. Here are some examples of words that have been used to describe the same neurological phenomena:

Crossmodal correspondence has been noticing in humans for an extended period of time. Nonetheless, the topic only got attention from scientists and researchers around the beginning of the 20th century. In a phonetic symbolism study done by an American anthropologist-linguist, Edward Sapir, he has pointed out the crossmodal correspondences between speech sound and object size. He experimented the contrast between the vowel "a" and the vowel "i". He selected meaningless sounds, "mal" and "mil", to portray the object size correspondence. The result elaborates that people potentially associated "mal" with larger object size, on the contrary to "mil" with smaller object size. (Sapir, 1929). Harvard researchers have studied people's ability to match the stimuli presented in different sensory modalities. The result shows that both adults and children reliably match crossmodal correspondence between brightness and loudness.

The extensive body of research in the 20th century has suggested that crossmodal correspondence exists between all possible pairings of sensory modalities. Furthermore, researchers have proved that crossmodal correspondence has an essential role in human information processing. For example, people find it harder to classify the size of a visual stimulus when the sound presented at the same time is incongruent in pitch (e.g., when a high-pitched tone is presented at the same time as a large target) than when the sound is congruent (e.g., when a low-pitched tone is presented with a large target). Hence, multiple experiments show that crossmodal correspondences influence the speed and accuracy of human information processing.

Another research by Professor Ray H. Simpson and colleagues also pointed out the crossmodal correspondence between hue and pitch in children. He reported that, in children, high-pitched tones are more likely to be matched with yellow rather than blue (Simpson et al., 1956). Mentioning about crossmodal correspondence in children, people can exhibit a consistent crossmodal from a young age point. For example, children at the age of two can exhibit crossmodal correspondence and match loud sounds with large shapes (L.B. Smith & Sera, 1992). Moreover, evidence has shown that infants may be aware of certain crossmodal correspondences such as one between auditory pitch and virtual elevation/sharpness.

Crossmodal correspondence is a sensorial phenomenon that operates autonomously (with or without self-consciousness) and penetrates deeply within our psychological and nervous systems. In general, our senses work independently as a single unit (unisensory), and also activate multiple units of senses together at the same time. Human consistently combine those unisensory units that refer to the same object or event to get the most truthful and precise information of environmental attributes. Numerous studies show that people give consistent crossmodal correspondence mapping between various incentive stimuli in different sensory systems. There are several ways and levels that object or sensory stimuli can be matched and create crossmodal correspondences; for instance:

 

  • Crossmodal correspondence that occurs between two or more stimuli on a physical object or sensory modalities that seem different or unrelated,
  • Crossmodal correspondence in the abstract form, e.g., pleasantness, cognitive meaning, or activity,
  • Crossmodal correspondence at the level of the effect upon which stimuli act to observer's mental and emotional state, e.g., the observer can feel the crossmodal correspondence between two different stimuli because both stimuli give rise to the observer's arousal or excitement.

Aside from the consequences of the phenomena, researchers can identify several kinds of crossmodal correspondences which are:

The cause of this neurological phenomenon is ambiguous, but there are several hints pointing toward a concrete answer to the question. Early studies showed that crossmodal correspondence is likely to occur when spatial and temporal factors modulating two stimuli are incongruent with both stimuli. In other words, crossmodal correspondence is more likely to occur the closer the stimuli in different modalities are presented in time. Later on, researchers focused more on the semantic and synaesthetic manner of the crossmodal correspondence. Semantic congruency refers to circumstances that two or more stimuli are congruent or incongruent in terms of their meaning and identity. Synaesthetic congruency refers to correspondences between stimulus properties shared by a large number of people. There is a prominent suggestion that humans use a statistical manner which comes from integrations of knowledge and sensory information to mediate the similarity or dissimilarity of two or more stimuli. Some researches have also concluded that "crossmodal correspondence is a weak form of synaesthesia". There is no evidence or concrete proofs to predicate that this statement is right or wrong. The technology and human intellectual are not advanced enough to elaborate the accurate answer. However, it, of course, has significant links between the subject of crossmodal correspondence and synaesthesia. Many shreds of evidence showed the similarity in multiple pairs of multisensory mapping between ordinary people and synaesthetes. Reports on rare forms of synaesthesia, "lexical-gustatory synaesthesia" and "sound-gustatory synaesthesia." will be presented later in another section of this research.

refer to crossmodal correspondences that occur with pair of stimuli correlated in its nature; for instance, a natural correlation between the size of the object and resonance frequency - the larger the object, the lower the frequency, which is an autonomous effect caused by the internalization of our brain to the environmental normality, 

·    refer to crossmodal correspondences that occur from the speciality of the neurological systems/connections presented at birth in which we use to solve and code sensory information,

refer to crossmodal correspondences that occur when the general terms used by people to describe objects or phenomena are overlapping between two different stimuli; for example, words "low" and "high" which are used to imply the sound pitch are also used to imply the placement of objects (vertically).

These words are generally used for describing uncontrol associations between two or more stimuli, whether in physical property or in sensory modality. However, there is also confusion in the personal interpretation of individual researchers. There is a precise terminology of the term "crossmodal correspondences" given in one of the essential reviews on crossmodal correspondences by a world-leading researcher on the topic of multisensory, Professor Charles Spence, the Head of Crossmodal Research Laboratory, Department of Experimental Psychology, Oxford University. The term “crossmodal correspondence" was used in a review (Spence, 2011) referring to “a compatibility effect between attributes or dimensions of a stimulus (i.e., an object or event) in different sensory modalities (be they redundant or not)", "A key feature of (or the assumption underlying) all such crossmodal correspondences is that they are shared by a large number of people (and some may, in fact, be universal)".