172A. Cognitive Psychology of Music (Introduction)
Undergraduate, non-major course
Week 7
Summary of lectures
Stimulus-Response chains (SR chains). It is supposed to comply to the presumably lawful relationship between stimulus and response, with responses being related isomorphicaly to stimuli through either instinct (i.e. tissue needs) or what is called Classical Conditioning.
Psychology of learning and memory
3 main schools of thought:
i) Behaviorism ii) Neo-Behaviorism iii) Cognitivism These approaches attempt to provide models that explain behavior in general. We will concentrate on their implications in explaining musical behavior.
i) Behaviorism
This approach to psychology and what we have termed 'Cartesianism' have had parallel development and share similar ideas.
The human organism is understood as a machine and behavior is broken down to
Early Behaviorism: Pavlov, Watson, Guthrie & Thorndike
Food (US)
Salivation (UR)
Buzzer (CS)
Orientation (ears perk-up, etc.)
Food (US)
Salivation
Buzzer (CS)Buzzer (CS)
Salivation (CR)
(US: Unconditioned Stimulus, CS: Conditioned Stimulus,
UR: Unconditioned Response, CR: Conditioned Response)Classical Conditioning: Food elicits salivation in a dog but a buzzer does not. After successive pairings of food and buzzer, the buzzer begins to elicit salivation.
The reason why temporal contiguity results in the buzzer eliciting salivation rather than the food eliciting perking-up of the ears is that usually the stimulus connected to a tissue need takes over.In the above example we essentially have an arbitrary connection between buzzer and salivation, resulting from repeated pairings between food (unconditioned stimulus) and buzzer (conditioned stimulus).
Salivation (the instinctive response to food) becomes the conditioned response to the buzzer. Therefore, the buzzer has become the conditioned stimulus for salivation because of its temporal contiguity with the original cause of salivation: food (unconditioned stimulus.)Although the limits of such an approach to behavior are clear, this mechanistic Stimulus-Response-chain idea may explain cases of pure referentialism. Cases, that is, where some symbol (lexical, notational, musical etc.) is arbitrarily connected to a specific referent, not because of any intrinsic or inherent relationship between symbol and referent, but because of repetition/convention/simple association. (i.e. the word Cat and the animal it points to; the French national anthem and the notion of French ethnicity/culture it points to; etc.)
When something (a sign) simply points to something else (a referent) by arbitrary association, its relationship to the referent is called indexical. Therefore, referentialism outlines indexical relationships.
An example of referentialism is what Davis calls D.T.P.O.T. (darling they are playing our tune). What makes a tune 'ours' is simple indexical association and indicates no inherent formal, or any other, relationship between us and our favorite tune. Rather, it indicates a type of knowing that may be understood in terms of classical conditioning.
Skinner's approach (1950's) is a simple extension of the ideas of behaviorism, an extension that puts emphasis on learning through positive reinforcement (out of a set of behaviors rewarding only the desirable one so that it will be repeated) or negative punishment (removing something positive in response to an undesirable behavior so that it will be avoided) and (to a much lesser degree) negative reinforcement (removing something negative in response to a desirable behavior so that it will be repeated) or positive punishment (punishing an undesirable behavior so that it will be avoided.) All methods rely on the same idea of forming behavior through temporal contiguity between a behavior and some sort of feedback to this behavior.
The interesting point here is that responses are not really linked isomorphicaly to any stimulus. They are rather "emitted" by us (in a non-prescribed manner) and then are selectively continued or discontinued based on the feedback we get.
Behavioral Shaping occurs, therefore, over time with a move from simple SR connections to complex chains of SR connections. This approach represents a very 'serial' understanding of both time and behavior-forming that can and has been criticized. Genetic predisposition (Cartesianism) is simply replaced by environment shaping resulting in a model that is still machine-like.
Skinner expanded his ideas in a book called "Beyond Freedom and Dignity" where he essentially claims that no behavior is volitional since our behaviors are shaped by the environment through reinforcement (an extreme and somewhat disturbing example of 'nurture' over 'nature' that denies the possibility for 'will' or 'initiative'.)
The table below illustrates, in a fishing example, the two types of conditioning operating during learning:
A. Operant conditioningStimulus
(desire to catch a fish)
R1 (gray hackle) no bites
R2 (royal coachman)two trout
R3 (Idaho nymph)one nibble
R4 (Lefrancois nymph)nothing
How one learns to use a royal coachman fly
B. Classical conditioningUnconditioned stimulus
(fly in the ear)
Unconditioned response
('ouch', + swear words + withdrawal behavior)Conditioned Stimulus
(color of fly, its name: Lefrancois nymph, etc.)
Conditioned response
(avoidance behavior)How one learns to avoid a Lefrancois nymph
A) Skinner's 'Operant Conditioning' (adopting an operant behavior that is associated with positive results).
Example
The stimulus (or better the issue/problem) is the desire to catch fish. The response (solution) is to use different types of bait. The result is repeated use of the bait that is most successful.
Explanation
1.The positive reinforcement of the trouts, and the temporal contiguity (:linked presentation) of the royal coachman and the trouts, creates an operant behavior (using the royal coachman) that results in a positive consequence (catching trouts).B) Pavlov's 'Classical Conditioning' (Temporal contiguity of two stimuli associates a common response to both stimuli).
Example
The stimulus (problem) is the fly in the ear and the accompanying pain. The response (solution) is some expression of pain and removal of the fly. The result is avoidance of the fly that looks, or has the same name like the one that caused the pain.
Explanation
1.The temporal contiguity (:linked presentation) of the Lefrancois nymph and the pain, creates an association (Lefrancois nymphpain ), conditioning the stimulus Lefrancois nymph with that of pain. The result is that the simple presence of the (now conditioned) stimulus: Lefrancois nymph, elicits a response (avoidance) that is usually elicited by pain.
ii) Neo-Behaviorism (Hull, etc.)
The main difference between this approach and behaviorism is that it inserts human perception as a parameter in the stimulus-response chain: Stimulus - Organism - Response. The parameter 'organism' is represented by some sort of function of variable complexity and flexibility depending on the researcher or broader theoretical framework. The insertion of perception between stimulus and response reflects the belief that behavior (response) cannot be explained simply in terms of the external world (stimulus) but must be understood as a response to our perception of the external world (perception defines reality).
At the same time, however, neo-behaviorism:
a) is still atomistic/reductionistic (S-O-R events form the 'molecules of behavior' with human cognition being reduced to functions), and
b) is still outlining isomorphic relationships (although not at the stimulus-response level) within each stimulus-organism-response chain. A stimulus may not always elicit the same response (the 'organism' parameter may be different), but the same stimulus-organism pair will, in general, elicit the same response.
Cognitivism represents a much more valid approach to the psychology of learning and memory. It works with models that do justice to the complexity, variability, and non-isomorphic relationships demonstrated in human behavior. As it can be expected, however, this high degree of validity means that:
a) there can be no clear-cut, fixed, or simple operational definitions of variables, and
b) there is a large number of interrelated variables involved making it hard to attribute changes in a dependent variable to a specific independent variable.
In other words this high degree of validity means that there is a decrease in reliability.In this class we link Tolman's (1930's) concept of cognitive maps with the Gestalt principles (understood as implicit rules.) These rules are seen as connected to 'grouping', helping in categorization as well as recognition/imposition of patterns. Such recognition/imposition is based on finding boundaries that outline units which somehow match our schemata.
Gestalt Theory (Koffka, Kohler, Wertheimer, etc.)
Gestalt is the German word for "form," and as applied in gestalt psychology it means "unified whole" or "configuration". The essential point of gestalt is that in perception the whole is greater than the sum of its parts. Gestalt theorists adopt the Cartesian idea of law while recognizing that perception organizes the world. We therefore have laws of perception.
In our class, Gestalt laws are going to be approached not as 'laws' but as 'rules'. Rules that identify basic implicit processes involved in the data reduction that allows us to collapse the complexity of the world into categories/schemata. It is thanks to such rules that we are able to predict.
Gestalt laws are therefore going to be seen as rules of prediction.
Those rules are deduced from behavior, rather than prescribe behavior or define precise models that can be scientifically reliable.The main question that gestalt theory points to (without explicitly answering it) is:
What are the units of perception (in language, music, etc.)?
A possible answer is that units of perception are flexible and depend on the frame of reference of an investigation. Units are not discrete things belonging to the outside 'objective' world. "Units" and "Forms/structures" (:gestalts) are schemata/categories and relationships between categories, created with the help of implicit rules and existing in the mind as implicit knowledge.
For example, during communication in language, units are not the words per se. As we have discussed, communication involves interpretive translation across frames of reference that involves implicit rules. This is not an isomorphic translation, meaning that units are transformed when they 'move' from the cognitive frame of reference of the speaker/author to the cognitive frame of reference of the listener/reader.
We must therefore differentiate between
a) explicit symbols that are invented (i.e. word 'cat') and are often arbitrary, and
b) implicit metasymbols that exist in the mind in the form of schemata/categories and are closely related to general issues of knowing and experience (beyond recognizing lexical units made out of discrete letters i.e. 'c-a-t').
1) Proximity.
Items placed in close (spatial) proximity tend to be grouped together as a unit.
![]()
On the left there appear to be three horizontal rows, while on the right the grouping appears to be in columns.
In music this principle works in terms of a combination of frequency and temporal proximity.
Miller & Heisse have shown that rapid alterations between sequential pitches (a trill) can result in the sequence being split apart (fission) if the musical interval is larger than a minor third (approx.). When the interval is smaller or the alteration slower the sequence is perceived as a unit (coherence). Van Noorden conducted extended studies examining temporal coherence and fission. In general, the slower the trill, the larger the frequency difference necessary for fission to occur.Example 1: Sonata for Alto Flute by Teleman (17th century.) Example of fission (occasional break of the flute's melodic line into two separate melodies) resulting from rapid alteration between pitches separated by intervals larger than a minor 3rd. (violation of the proximity principle).
Below is another such example from music by Handel.
2) Similarity.
Similar items tend to be grouped together as a unit.
In the above figure there seems to be a triangle inside the square.
It is interesting to note here that it is very difficult to operationally define similarity. This is reflected in the fact that similarity-groupings that might appear simple to us, may present an almost impossible task to a computer.
In music this principle is most often connected to timbre.
Sequences of pitches performed by different instruments will most likely be grouped according to instrument.The piece "Loops" by R. Erickson is an example of competition between the proximity and similarity principles. Although the pitch sequence does not involve rapid alteration between widely separated pitches (satisfies the proximity principle) each pitch is assigned to a different timbre (violating the similarity principle). Organization, therefore, according to proximity results in different pitch groupings than organization according to similarity.
Listen to the Erickson example
3) Good continuation ( & common direction.)
We tend to continue contours whenever the elements of the pattern establish an implied direction. The related principle of "common direction" makes a similar claim in terms of motion: Elements sharing motion attributes (direction, speed, or both) are usually grouped together.
Here, for example, the grouping is done by direction rather than color.
Very similar to the principle of good continuation is the method of interpolation used for example in digital sound editing software to correct/fill-in/recreate missing audio data based on implications by existing data.
In music this principle is reflected in that we tend to follow separate melodies even when they cross each other.
At the same time, however, if both melodies are performed by the same instrument and at the same register the similarity (in terms of timbre) and proximity (in terms of pitch) principles might override the good continuation principle (in terms of melodic motion)
In the above example, the 2 crossing melodic lines (A) are perceived as two melodic lines that stay within a perfect fourth range(B). Studies (Deutsch, 1975) indicate that the grouping in (B) will persist even if the panning (left-right) alternates for each note in (A).
In 1974 Dannenbring demonstrated the good continuation principle using stimuli that were interrupted at regular time intervals with noise filling the gaps. Subjects reported that the stimuli had no gaps, perceiving them as continuing under the noise.
Dannenbring (1974) Listening Examples
Example 1 Steady tone with deleted portions replaced by white noise Example 2 Steady tone with deleted portions not replaced by white noise Example 3 Tone glides with deleted portions replaced by white noise Example 4 Tone glides with deleted portions not replaced by white noise
4) Closure
We tend to enclose a space by completing a contour and ignoring gaps in a figure.
In music this can be connected to the idea of tonality: The idea that certain pitches within a system of pitches (scales) serve as focal points for a piece (or section of a piece) of music.
The concept of 'closure' is closely connected to that of 'expectation'. A small number of events may imply a much larger structure (providing we have enough previous knowledge of the larger structure so that we can synthesize it.)
5) Pragnanz (good figure)
A stimulus will be organized into as good a figure as possible. Here, good means symmetrical, simple, regular, but most importantly familiar.
The above figure appears to the eye as a square overlapping a triangle, not as a combination of several complicated shapes.
In music this is illustrated in the fact that no matter how complex or unpatterned a piece may be, some sort of pattern will be imposed by the listener based on his/her experience. Whenever that is not possible, the listener may perceive the piece of music as meaningless.
It is often the case that more than one principle are operating simultaneously, with decisions based on multiple cues and/or selective attention. In the following graph, depending on directed attention, we might see a white vase or two black faces (figure-ground alteration).
In everyday life this is illustrated in the so-called 'coctail party effect' (term coined by Cherry): our ability to selectively pick out a single speaker (foreground) out of a very active/complex sound environment (background.)
In music this is illustrated in our ability to distinguish individual parts within a complicated orchestral piece, sometimes even despite frequency proximity or timbral similarity. We will return to this during our discussion on 'attention'.
When all those rules, along with our cognitive limits, are violated (as may be the case with pieces of music based entirely on the twelve-tone system) listeners are presented with such an aesthetic and intellectual challenge, that they often find it hard to impose any kind of organization and therefore meaningfulness. (i.e. Webern's late dodecaphonic works).
Example: Opening of Symphony 21 by Webern. Violation of the 7±2 rule and of the majority of the gestalt principles (widely separated pitches assigned to different timbres) makes organization of the piece, by the listener, difficult.
Absolute Pitch
(& chromaesthesia).
Piaget's Genetic Epistemology.
The gestalt principles introduced above describe possible implicit rules that operate when we process information from our environment. All those principles are based on some sort of previous knowledge, indicating that implicit knowledge rules may, to a large extent, be extracted at some stage from the environment.
An example of musical knowledge based on implicit rules is what we call "Absolute Pitch":
Absolute Pitch:
The ability to identify and name a specific pitch in isolation; in the absence, that is, of any reference pitch and independently from any other attribute of the sound (i.e. timbre). (Implicit process).
Relative Pitch:
The ability to identify a specific pitch based on its distance (interval) from a reference pitch. (Explicit process).It is not clear if the ability for absolute pitch is environmentally (nurture) or genetically (nature) based. If it is not genetic can we develop it through training?
A possible answer can be given through:Piaget's Genetic Epistemology (Epistemology: Theory of learning/knowing).
According to Piaget, the human mind can be understood metaphorically as a 'muscle' that develops through 'exercise' (exposure to the environment).
This development is Epigenetic (: it entails gradually unfolding capabilities, just like the gradually increasing specificity of cellular division), and has 4 basic stages:
1) Sensori-motor (~0-2yrs)
At that stage, for the developing child the world exists only to the extent that it can be sensed. All experience and thought is directly linked to sensori-motor capacities. Vocalizations of children at that age are best understood as practice for the performance of language.
2) Preoperational (~2-7 yrs) Continued development of language, though egocentric in nature. Aquisition of motor skills. Both, the sensori-motor and the preoperational stages, are characterized by automatic acquisition of information from the environment, and the years ~0 to 5-7 are often referred to as the period when the automatic acquisition window of language is open. Here, automatic means non-formal, non-conceptual, non-explicit, but implicit and dependent only on interaction with and feedback from the environment.
3) Concrete operations (~7-11 yrs)
Organized, logical thought connected to concrete problem solving.
4) Formal operations ( ~>11yrs)
At this stage, abstract thought processes become possible. Language learning moves to the explicit level (vocabulary lists, grammar rules etc.). Ideation (mental imagery) can take the place of physical reality. Children develop the ability to assign feelings outside themselves.Absolute pitch is believed to develop during the preoperational stage, when the automatic acquisition window of language is still open. Formal and consistent contact with sounds from the Western (or any other) tuning system during this stage imprints, mentally, absolute pitch relationships.
In other words, according to this approach, absolute pitch represents a capacity that can be developed through interaction with the environment but only while the language window is still open. (For a different opinion look at the work of Miyazaki.)
The idea that while the window of language is open, knowledge (implicit) is extracted automatically from the environment is explored by:
Suzuki's approach to children's music learning. In this approach children do not learn music through symbols or formal rules. They rather learn, along with their parents, to express themselves musically through interaction and feedback. Suzuki's training method has proved very successful, and children trained with this method are very quick in later acquiring the relevant formal knowledge.Another example of a behavior based on implicit rules is the phenomenon of "Chromaesthesia"
Chromaesthesia:
The ability to see specific colors when specific pitches are heard. Although not all people with chromaesthesia associate the same colors with the same pitches, they are all internally consistent. Chromaesthesia is a special case of
Synesthesia: Perceptual, cross-modal connection of the senses.
Chromaesthesia is a rare phenomenon and it has been studied extensively by Radocy. The way it operates remains largely a mystery.
During the second half of the 19th century a number of composers ( i.e.: Scriabin in his Poem of Ecstasy, Rimsky-Korsakov, etc.) were fascinated by this phenomenon and created musical compositions that were believed to have an intrinsic relationship to color compositions.
Analytic & Synthetic Listening.
Example 1) In Dannebring's (1974) demonstration of the good continuation principle, when gaps in synthesized tones (with either fixed frequency or periodic frequency glides) are filled by noise (that creates ambiguity) listeners resolve the ambiguity by reporting that the fragmented (in reality) tone is heard continuous under the noise.
In this example, the listeners actually synthesize the sensation using the implicit process that gestalt theory identifies as the rule of good continuation. This form of listening is often referred to as Synthetic or Holistic listening and is based largely on implicit rules. If listeners are alerted to the fact that there is an actual gap in the presented tone, they will be able to perceive it by directing their attention to the separate portions of total stimulus. This form of listening is often referred to as Analytic listening and is based largely on explicit rules. The following is another example of analytic/synthetic listening:
Example 2) Two successive complex tones with 2 components (harmonics) each (experiment by Smoorenburg) : (a) 800Hz & 1000Hz, b) 750Hz & 1000Hz. (This is a return to an example from week 4 presented during our discussion on explicit and implicit rules.) When moving from (a) to (b), some listeners hear the pitch going down by following the motion of the first component in each tone (800Hz
750Hz). Explicit rules are employed to track the physical attributes of the two tones, and determine the pitch motion. This form of listening is called analytic.
Other listeners hear the pitch going up by reconstructing the motion of the (missing) fundamental implied by the two complex tones (200Hz250Hz). Implicit rules are employed to synthesize a physical attribute that, although may not be there, it is implied by the rest of the attributes of each tone, and help determine pitch motion. This form of listening is called synthetic.
Listen to the Smoorenburg example
![]()
Example 3) Crossing-scales experiment by Deutch. Power of proximity principle.
In the above example (Deutsch scale paradox), the 2 crossing melodic lines (A) are perceived as two melodies (B) that stay within a range of a perfect fourth. The crossing scales are therefore regrouped perceptually according to the proximity principle (pitches separated by small intervals are grouped together.)
Stravinski's violin concerto in D and Tchaikovsky's 4th symphony both exploit this principle. Studies indicate that the grouping demonstrated in (B) will persist even if
i) the panning (left-right) alternates for each note of each scale in (A). The panning will be reorganized perceptually so that each melodic line in (B) will be assigned to a single ear (left / right).
ii) the panning and the timbre alternates for each note in (A). In this case, the panning and timbres will be reorganized perceptually so that each melodic line in (B) will be assigned to a single ear and a single timbre.
The above two cases (i) & (ii) are examples of Synthetic listening. They illustrate the power of implicit rules to process incoming information and send a synthesized version of reality to conscious awareness (explicit knowledge). The same is illustrated by the so-called cocktail party effect.If, however, the differences in timbre become extreme the illusion breaks down and listeners cannot synthesize the conjunct melodies of (B). They hear, instead, what is actually happening: constant leaps of pitch and timbre in each ear. In other words, after a threshold of dissimilarity is crossed, the similarity principle overrides the proximity principle, and we shift from implicit to explicit rules.
Like almost everything in music cognition, whether there will be a shift from implicit to explicit rules depends strongly on context and existing knowledge. For example, although non-musicians and Western-trained musicians hear the conjunct melodies in (B), musicians skilled in the 12-tone system hear the leaps that actually take place. They listen analytically since they have developed through training and experience explicit rules that can deal with the disjunct melodies that are being presented.
The recurring dichotomy (Analytic/Synthetic - Explicit/Implicit) may be explained (at least in theory) in terms of brain-area specialization (studies by Kimura).
(It is important to note here that brain-area specialization can be retrained and does not imply any sort insulation).A) Analytic operations depend on the dominant brain hemisphere (the one that is contralateral to the dominant side of the body). Scientists are therefore thought of as left-brain oriented (provided they are right-handed). The dominant brain hemisphere is also though to be the 'home' of language processing.
B) Synthetic operations depend on the right hemisphere of the brain (for right handed people). Artists are therefore thought of as right-brain oriented (provided they are right-handed).For example, trained musicians listen to music analytically (using mostly explicit rules, i.e. music theory, strong categories etc.) and are supposed to be using more the left brain-hemisphere, while non-musicians listen to music synthetically/holistically (using mostly implicit rules of organization, i.e. general patterns, gestalt rules etc.) and are supposed to be using more the right brain-hemisphere.
Ethnomusicology Department - UCLA©