The subjects were 21 right-handed healthy individuals (13 female, eight male; mean age=33 years, range=24–60) and 11 right-handed cocaine-dependent patients (three female, eight male; mean age=34 years, range=20–42). All gave written informed consent to participate. The patients all had recently been admitted (days before scan: mean=14.9 days, range=4–23) to an inpatient drug dependence treatment unit, met the DSM-III-R criteria for cocaine dependence, and reported regular use of cocaine up to the time of admission. Four patients also met the criteria for substance-induced mood disorder with depressive features, and a fifth met the criteria for generalized anxiety disorder. The healthy subjects denied any more than occasional social use of alcohol or illicit drugs, and they reported no use of psychoactive substances during the 72 hours before functional magnetic resonance imaging (fMRI). None of the subjects was taking prescription drugs with effects on the central nervous system, had a history of neurological illness or injury, had a history of psychiatric illness other than cocaine dependence, or had abnormalities shown by structural MRI.
Stimulus Videotapes
Each videotape consisted of an actress seated, looking at and talking directly to the camera (viewer). In the happy tape, the actress smiled frequently and spoke cheerfully about happy memories of growing up on a farm and then of a family reunion. In the sad tape, the actress spoke sadly about the deaths of close family members, crying throughout. In the cocaine tape, the actress spoke briefly about recent perceived injustices, expressed anger and frustration, described the desire to get high, removed drug paraphernalia from a paper bag, chopped a fake rock of “crack” cocaine, twice smoked some and pretended to get a rush, commented repeatedly how good the “shit” was, and invited or teased the viewer about getting high him- or herself. One actress made a happy, a sad, and a cocaine tape. A second actress made a sad and a cocaine tape. There were therefore a total of two sad tapes, two cocaine tapes, and one happy tape. The tapes were from 3.0 to 4.5 minutes in length and were preceded and followed by 30-second baseline periods of gray illumination. The subjects indicated the onset and progressive increases in emotional responses by button pushes during tape viewing. Immediately after each tape, the subjects described the nature of their emotional responses and rated the peak and average intensities of their responses from 0 to 10. The cocaine-dependent patients were asked whether or not they felt the urge to use cocaine immediately after each tape and to rate the peak and average intensities of this feeling from 0 to 10. The tapes were presented in one of two sequences, each to one-half of the subjects: 1) sad tape A, cocaine tape B, happy tape, sad tape B, cocaine tape A or 2) happy tape, sad tape B, cocaine tape A, sad tape A, cocaine tape B.
The subjects were told that the experiment involved measurement of brain responses related to emotions that might be triggered by watching videotapes and that they were to indicate when they first began to feel any emotional response and when that response became moderate or very strong. The cocaine-dependent patients were also told that some of the tapes might make them feel like using cocaine. The subjects were not told before or after a tape what emotion that actress was attempting to portray. There was a 2.5-minute rest between tapes.
fMRI
Each subject’s head was stabilized by a band across the forehead. Video goggles were placed over the subject’s eyes and headphones over the ears. By using a 1.5-T GE Signa MRI system with resonant gradients for echoplanar imaging, conventional T1-weighted spin echo sagittal anatomical images (TE=11 msec, TR=667 msec, field of view=24 cm, slice thickness=5 mm, gap=0, 256×128 data matrix) were acquired for slice localization. Next, eight T1-weighted oblique axial slices (TE=13 msec, TR=500 msec, field of view=40×40 cm, slice thickness=8 mm, gap=1 mm, 256×192 data matrix) were acquired parallel to the plane transecting the anterior and posterior commissures, covering the brain from the inferior temporal sulcus to the most superior portion of the cortex, to serve as underlays for functional images collected at the same locations. Functional images were obtained by using a single-shot echoplanar gradient-echo sequence (flip angle=60°, TE=60 msec, TR=2600 msec, field of view=40×20 cm, 128×128 data matrix, slice thickness=8 mm, skip=1 mm). Head movement was evaluated by measuring changes in the center of mass of the functional images over time. If the motion exceeded 1 voxel in the x or y direction from the beginning to the end of the study, the subject was dropped from the data set (one patient was dropped). If the motion exceeded 0.5 voxel from the beginning to the end of a particular tape, that tape was dropped from the data set (one sad tape and one cocaine tape from one patient and one cocaine tape from another patient were dropped). Otherwise, motion was corrected by using SPM 96 software. The corrected images were spatially filtered by using a Gaussian filter with a full width at half maximum of 6.5 mm. Data are presented for axial-oblique imaging slices 4 mm below and 4, 12, 24, and 32 mm above the plane of the anterior and posterior commissures (referred to as z levels).
Changes in echoplanar imaging signal were evaluated in three pairs of successive epochs (
Figure 1). The first comparison was between the 30-second pretape baseline (baseline 1) and the initial period of tape viewing before the self-report of emotional response (emotion 0). The second comparison was between the initial period of emotional response (emotion 1) and the immediately preceding period of tape viewing before the self-report of emotional response (emotion 0). The third comparison was between the final 45 seconds of tape viewing (independent of report of emotional response) (emotion 2) and the posttape baseline (baseline 2). The onset of emotional response varied from subject to subject, leading some subjects to have longer periods of tape viewing before the report of emotion (emotion 0) and shorter periods of watching the tape after reporting the onset of emotion (emotion 1). In order to maintain sufficient comparability among subjects in data sampling, comparisons were not made if either epoch to be compared was less than 13 seconds long (five images), and images acquired only during the first or last 45 seconds of long epochs were considered. Three healthy subjects reported the onset of emotion so quickly that there were not the required five images in the emotion 0 period (one during cocaine tape A and two others during cocaine tape B), as did one patient subject during sad tape B. Data from these subjects were not used in the contrast between emotion 0 and baseline 1 or the contrast between emotion 1 and emotion 0, but they were used in the analysis of the difference between emotion 2 and baseline 2. One healthy subject reported the onset of emotion too late to have enough images for the contrast of emotion 1 and emotion 0 for cocaine tape A, and one patient reported too late during the happy tape. Some healthy subjects did not report any emotional response to one or two tapes (five to the happy tape, two to sad tape A, two to sad tape B, and one to cocaine tape A). The same was true for some patients (three to the happy tape, four to sad tape A, three to sad tape B, three to cocaine tape A, three to cocaine tape B). These subjects therefore did not contribute to the contrast of emotion 1 and emotion 0. One healthy subject did not report any emotional response to any tape and was dropped from the data set (as one would drop a subject whose performance on a cognitive activation task was so low as to raise doubts about whether he or she was doing the task). Technical problems (ghosting) led to the loss of data for one healthy subject during cocaine tape B, one patient during cocaine tape A, and two patients during sad tape B.
Thus, in total, for the contrast of emotion 1 and emotion 0 there were 14 healthy subjects and seven patients for the happy tape, 17 healthy subjects and six patients for sad tape A, 17 healthy subjects and five patients for sad tape B, 17 healthy subjects and five patients for cocaine tape A, and 16 healthy subjects and seven patients for cocaine tape B. For the contrasts involving emotion 0 minus baseline 1 and emotion 2 minus baseline 2, there were 19 healthy subjects and 10 patients for the happy tape, 19 healthy subjects and 10 patients for sad tape A, 19 healthy subjects and seven patients for sad tape B, 18 healthy subjects and seven patients for cocaine tape A, and 18 healthy subjects and 10 patients for cocaine tape B. The mean/median durations of the emotion 0 and emotion 1 epochs were 38.7/41.6 seconds and 26.0/41.5 seconds for the happy tape, 39.3/39.0 seconds and 30.9/41.6 seconds for sad tape A, 30.2/36.4 seconds and 31.2/41.6 seconds for sad tape B, 29.9/39.0 seconds and 27.8/36.4 seconds for cocaine tape A, and 37.2/41.6 seconds and 34.2/41.6 seconds for cocaine tape B.
The data analysis followed several steps. First, three t maps were created for each subject for each tape, 1) comparing the signals during emotion 0 and baseline 1, 2) comparing emotion 1 and emotion 0, and 3) comparing emotion 2 and baseline 2. These first-order statistical maps and the anatomic images from individual subjects were transformed into a proportional three-dimensional grid
(12) in order to combine data across subjects. The t maps were used only to compute standard linear contrast measures
(13). Under the null hypothesis of no effect, the expected value of the mean of this contrast across subjects is equal to zero. We then used a randomization procedure to generate a distribution of task-related t values in order to estimate the significance of the observed linear contrast at each voxel
(14–
16). This procedure creates the population distribution for each voxel by repeatedly calculating the value of the contrast when the t values of one-half the subjects, randomly chosen, have a reversed sign. This randomization was performed 1,000 times, generating a sampling distribution of the linear contrast measures. The observed linear contrast measure, calculated without sign reversal, was assigned a p value on the basis of its position in this distribution. All reported p values were derived from this procedure. A voxel was considered to show a significant difference between conditions (or groups, see next paragraph) only if it and two of the four voxels with which it shared a border met the significance criterion.
This procedure was followed for the two sad tapes together, the two cocaine tapes together, and the happy tape, in the healthy and patient groups separately, in order to identify the activation topography of each emotional condition in each subject group. Next, the statistical significance of differences between the cocaine and sad tapes and between the cocaine and happy tapes for the cocaine addicts was evaluated. These randomizations were done by switching t values of the cocaine and sad tapes (or the cocaine and happy tapes) in randomly chosen subsets of one-half the subjects. Finally, the statistical significance of differences between healthy and patient groups for the cocaine, sad, and happy tapes was evaluated. These randomizations were done by switching the group assignments in randomly chosen subsets of one-half of the subjects, a procedure unaffected by differences in the size of the subject groups compared.
Because of differences between the patient and healthy groups in gender mix, we also compared male healthy subjects to male addicts separately. The results reported are for the comparisons of the mixed-gender groups, but only intergroup differences that were also significant for the comparisons of male subjects are labeled and discussed as significant.