I was interested in one of the points the authors mentioned about the generalizability of their study. They mention that the task they gave their subjects may have affected the way they processed the information / the results of the study. Specifically, they asked them to compare a very perceptually-linked property, object size, rather than something more abstractly tied to the object like value. I did not understand how asking about an object's value could help study visual priming. It seems that keeping the task as low-level as possible would be more beneficial rather than including higher areas of the brain (where more stuff goes on that we don't know about). Does anyone understand how a task like that could be more helpful to the purpose of the study?I am glad we got to read this paper; I think visual priming is a cool 'behavior' that we probably don't think much about in a computer vision context but seems to have been studied a lot in human perception. Do you think it is studied more explicitly in machine learning (understanding how gaining more training examples can prepare, or 'prime' our systems to perform better at test time?) Or is there an analagous area of study in computer vision?
I suppose this talk of Alyosha on big data at MIT might be helpful for gaining an insight for second question asked above. Also, this paper on Do We Need More Training Data or BetterModels for Object Detection? might be helpful. I think Abhinav can tell us more about different work in the community.
Talk - http://amps-web.amps.ms.mit.edu/csail/2012-2013/Big_Data/csail-lec-mit-kiva-2012oct24-1600.htmlPaper - http://web.mit.edu/vondrick/largetrain.pdf
In answer to your first question, I think the point is this: if they ask a question about the size, it may not require the subject to fully recognize the object, they may only use the part of the brain that determines size from visual appearance. Asking about value on the other hand, requires the subject to fully recognize the object, then perform some other task related to it's semantic meaning. I think the point is that using the more abstract concept requires a more full understanding of what the object "is," and so if you see some sort of priming effect, you can make the more general case that the priming is happening as part of the recognition, and not just estimation of the object's size... maybe?
This comment has been removed by the author.
I don't know if priming in a Computer Vision context would just mean "more data." Human vision systems are "real-time." I wonder if priming serves as a prior in a temporal context; i.e., the fact that I saw an object two seconds ago increases the probability that I will see the same object two seconds in the future.
To add onto Jacob's views, priming can also be viewed from the perspective of bias in the algorithm(?). We, as people, have several cognitive biases which turn up all the time in behavioral studies (see Dan Ariely's work on the Upside of Irrationality). Additionally, it might be that priming is more of a survival instinct, in the human-visual perception context.
This paper is dealing with an interesting problem, i.e. does different regions of the brain are responsible for processing specific visual form information as opposed to more generalized information?.The reference to work of Vaidya et.al on page-2(second column>first paragraph) was not clear to me. The same has been brought up in the discussion section to show connections between two studies (which seems to me as an important connection). It seemed to me that some background of this work should have been provided. Anyways..If someone has troubles understanding this part (as I had), you can look at the abstract on which gives some intuition as why right-occipital lobe is important for processing specific visual form. Further I found the wiki page on priming (psychology) interesting and it helped me in understanding paper so far. (I have not finished the paper yet. I thought of putting up this info in case someone is stuck.)For convenience, I am pasting abstract of Vaidya et. al here -- AbstractFont-specificity in visual word-stem completion priming was examined in patients with global amnesia and Patient M.S., who had a right-occipital lobectomy. Word-stems appeared in the same or different font as study words. Amnesic patients showed normal font-specific priming (greater priming for words studied in the same than different font as test), despite impaired word-stem cued recall. Patient M.S. failed to exhibit font-specific priming, despite preserved declarative memory. Therefore, perceptual specificity in visual priming depends on visual processes mediated by the right-occipital lobe rather than medial temporal and diencephalic regions involved in declarative memory.
I am confused about what the results are saying about exemplars vs categories. In Fig 3 the images of the 3rd column show the regions where there is greater activation for novel items than repeated different items. In the discussion it was noted that there was more modest reductions in neural activity for different exemplars than for same exemplars (Near the bottom right of page 194). Are these results favoring exemplars over categories?
If I'm reading it correctly (someone please feel free to disagree), I think all it's saying that seeing a repeated example of the same exact object has more of a priming effect than seeing something from the same category. I don't think the paper is saying it's either exemplars or categories since the intro argues it's both: you can recognize all varieties of cups as cups, and at the same time recognize your cup as your cup.
I think David is right. Once again, please correct if I also read incorrectly.
I agree on the interpretation that it is both - exemplar and categorical.
I especially liked the idea they had for future experiments. For instance, right now, for their "different instance, same category" stimuli, they picked examples from semantic categories. They suggest shape similarity instead (lollipop and magnifying glass), but maybe also functional categories etc. I'd be curious where they'd observe the differences in priming effects. As a side-note, I'd also be curious to see what happens with non-humans and these semantic categories (although picking ones that the animals would be familiar with might be problematic). Finally, if people have time, I think the Torralbo et al. paper is also worth reading or skimming. They have a really neat example, where they can decode "good" exemplars substantially better than"bad" ones. Unfortunately, when they compute the average images, they get canonical images of the category (e.g., the mountain category has a single, centered, peak), which they suggest might account for much of this. I wish they had done something to futz with the data to factor this out, but they must have tried and (understandably) not found a way to do this convincingly.
I'm trying to use this study to understand the extent to which our brain is exemplar vs categorical (assuming it is both). (a) If we are predominantly exemplar then a "repeated different" stimuli would show negligible repetition suppression. (b) If we are predominantly categorical then a "repeated different" stimuli would show lots of repetition suppression.The data suggests that different parts of the brain are more inclined to either of these. But its definitely not as categorical as most computer vision image classification methods make it.
This is slightly off topic.Can we understand repetition suppression using simple Hebbian learning. After a stimulus A is shown, the synapses for which neurons on either side fire, will strengthen their connections. As a result, when A is shown again a smaller amount of activation in the neuron is enough to cause the next neuron to fire. This effect dies off with time because of say, weight decay.
I am not an expert in biology, but isn't the repetition suppression just a higher-level instance of sensory adaptation; i.e. a novel smell or taste seems to disappear after an amount of time of exposure to the stimulus?
The effect of specificity and unique important instances of general categories is very interesting.A semi-related tangent: I was at the airport coming home from Chicago this evening, and walking out through the extended parking lot trying to find my car. I had written down the sub-section of the lot it was in, and finally hiked over there after walking through rows and rows of car after car after car. At first I didn't see my car, but when I did finally find it, I knew I was looking at MY car. What immediately struck me though is that it was at least 80% occluded by the car next to it, and all I could see was the profile of the back end and the extra tire. Plus it was dark. Not much information, but the certainty was overwhelming.I know that cars are metal things with four wheels, headlights, windshields, etc, but I also know that MY car is an older green CRV. Furthermore, if necessary, I would be able to distinguish MY older green CRV from OTHER older green CRVs by looking for whatever junk I know is in my backseat, etc. It reminds me of Viola-Jones' cascades: there are a lot of things that can quickly tell me this is not my car (not green, not CRV, not old... and by the way, "old" in this case is not the generic "old", but has specific meaning in the context of cars, and even in the context of CRVs), but then if I see something that is an old green CRV, there are more expensive checks I can perform to gain more certainty.But then I'm struck with a further question: I'm not sure if I used "older", "green" or "CRV" in my amazing feat of object-recognition this evening. It was dark enough that "green" wouldn't have given me anything more than a weak expectation of "relatively dark". And again, it was occluded so much that I only got a piece of a silhouette. And was I really analyzing each of the thousands of other car instances (clutter) that I experienced on the way to finding MY car for "is this a CRV? is this older? is this green?" Or was my brain again using the very quick and easy-to-verify test of "have I arrived in the appropriate section of the parking lot?" to avoid even considering what kind of cars I was looking at until the necessary moment? In this case, my "cascade" would look like:"Are you in the right section of the parking lot?" if "No", stop, if "Yes" continue:"Is this car an old green CRV?" if "No", stop, if "Yes" continue:"If there is still some ambiguity, is your junk in the back seat?" ...If so, it seems like this "cascade" or whatever you want to call it has been generated ad-hoc according to a lot of prior knowledge about my current goal, and events that took place several days ago (what part of the lot did I park this thing in on Thursday?) Which is interesting, because then you have to ask when did I learn this cascade? It incorporates previous cascades (how to distinguish MY car from other cars) as a sub-process, but somehow wraps it up in this piece of information that's only become relevant recently.So anyway, sorry I'm rambling, and this is very introspectivey, not terribly scientific, and has nothing to do with the paper. I don't know what this means for the exemplars vs categories debate, or what it contributes to the conversation at all, other than to say "Vision is great! Good job eyes! Good job brain! Thanks for helping me find my car!"
The one thing they mention that may go in some way to help Aaron's question above is their citation of the paper by Habib and Lepage, where they claim the presence of greater activations for familiar stimuli than for new items - maybe just seeing the familiar green Honda CRV again and again now has a 'detector' of sorts in Aaron's brain that fires off whenever he sees that particular car. This could probably be how we learn to represent most of our so called categories - not necessarily by virtue of semantics, but by repetitive familiarity of the object.
One followup point I'd like to make is that you often look at your car during a discrimination task (when you're trying to find it among other cars), so it's not necessarily having seen the object a lot in the past, but specifically having to find it among other cars. The interesting followup question would be whether your performance would be as good if you had to find an object you were extremely familiar with (your couch, your blender, etc.) that you never really have to do a discrimination task on. I'd hypothesize that you'd find it much harder, even if you see your couch way more often than your car for instance. This probably relates to their point that I was asking about earlier about which task they were using for the experiment and how it might affect their results.
Adding on to Aaron's thoughts, I feel that the visual context is often an important part of the "cascade", in this case maybe the surrounding areas, the cars, memory of the physical space and structure of the parking spot et cetera. Most of the neuroscience experiments look at objects in isolation, however I'm sure visual scene context and related cues form an important part of the discriminating process.
Like I mentioned earlier in the class, I'm very interested in knowing more about the temporal effects of recall in these studies. They do mention looking for change in responses in accordance with recall of object categories seen in the past, and seeing an actual increase in activity for certain frontal regions of the brain associated with memory retrieval. The question I'm interested in is this - how much of this phenomenon can be attributed to short term memory? I remember Elissa saying that there have been instances of priming for objects that have been seen over years, but could that have been for strikingly dissimilar objects? What if they were very generic line drawings or photos that are not very rememberable?I also found how they determined the hemisphere based differences interesting to look at. In addition, the claim that they work differently fits in well with what we discussed in the last paper about the parietal lobes.
A semantic question regarding Repetition suppression(RS) and Priming. I'm a bit confused about how these terms are defined. This paper seems to say RS is defined for single-units (neurons) while priming is the more general concept. An earlier paper we read seemed to use RS more like priming is used in this paper i.e for BOLD responses instead of per-neuron response.
I feel there are implications that this study is not a simple viewing of object but rather "object classification decision".In other words the task given to the user might bias the priming effect. Could the chosen task increase likelihood for using memory to quickly retrieve an answer for a repeated object? This might explain priming effects in higher level semantic/visual processing and increased activation in Inferior parietal lobe.
I really like the paper. They have shown difference in stimuli in case of same instance, different instance but same category and different category. Also, I like the way they have shown their current findings match with previous results. But the best part of the paper is that it can be extended for further comparisons like :- similar shape but different semantics, object with tag and without tag, same functional properties but visually dissimilar. I am most interested in object with tag and without tag. As it will help to understand our bias toward category. If we provide category tag to each instance, then do we try to group them together according to their tag. If tag is not shown to us, then do we have treated it as a completely different instance (exemplar) and have completely different stimuli.Also there are few things about their experiment design which I didn't understand like use of colored line drawings instead of real object images and using only two instances of each category. Also why the task of the participant was to just judge the size of the object. According to me, they should have some more memory intensive task.On side note, I felt sentences in paper were too long. Multiple sentences were combined in one big sentence which could have been easily written separately.
I don't think this paper provides quite explicit conclusion whether and how our brain processes visual information in exemplar or categorical or some hybrid fashion. It demonstrates that we own both ability of recognizing membership of varied exemplars and identifying these exemplars themselves. Another thinking is that the exemplars should not be limited within the arguing of the same category. Identifying certain exemplars across different categories might be also helpful. For exemplar, in the paper "J. Lim, R. Salakhutdinov and A. Torralba, Transfer Learning by Borrowing Examples for Multiclass Object Detection, NIPS, 2011.", it improves the recognition performance by augmenting the training dataset and learning which categories to borrow from, and which examples to borrow.
I also believe that there is a lot of questions unanswered that left space for extensions of this paper.First of all it is not clear whether the distinction is made based on visual properties only, or through incorporating semantic categorisation, as most of the objects are very different from each other.It would be also nice to see the effect of different properties, such as color, shape, etc.I agree that real photographs of the objects would be better than the clipart pictures.Most importantly I wonder what would happen if not only two examples, but several examples of the object is shown. In this case one good experiment would be to show several instances with similar properties, but one which is in the same category, but with different visual properties. I wonder what would happen in this case. Would it be considered as novel, or a different instance? This would help to differentiate semantic versus visual similarity as well.