Visual Neurons and Machines: Attention: tasks under time constraints

Sunday, April 20, 2014

Attention: tasks under time constraints

34 comments:

UnknownApril 20, 2014 at 2:28 PM
Paper for reading: Interacting Roles of Attention and Visual Salience in V4, Reynolds 2003
ReplyDelete
Replies
UnknownApril 20, 2014 at 3:43 PM
I am not sure if responses from a total of 80 neurons from both monkeys are sufficient to study this experiment. Is it that they are considering neurons specifically for a particular receptive field? But if it is the case why is it that only 50 of 80 neurons responded to the reference and probe stimulus? Any thoughts?
ReplyDelete
Replies
UnknownApril 20, 2014 at 3:52 PM
So far we have primarily talked about V1 and never looked beyond that. But this paper talks about attention and visual salience in V4. Why is it that we are not talking about V1 here (or V2, V3)?
ReplyDelete
Replies
UnknownApril 20, 2014 at 4:26 PM
This comment has been removed by the author.
ReplyDelete
Replies
IshanApril 20, 2014 at 6:57 PM
This paper explores the question of attentiveness. Being monkey brains, the authors use probes which results in an impressive time resolution of the response.

I like the result of the paper. It shows how "concentration" is reflected in the suppression of neural signals irrespective of the contrast of the stimulus (as shown in later experiments and discussed in "The possibility that distracters attracted attention" on Pg 9).
This fits well with "task based focus" of vision systems. In this light, can we say something about task independent "saliency of images" being a very fuzzy thing?
ReplyDelete
Replies
UnknownApril 20, 2014 at 7:07 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownApril 20, 2014 at 7:14 PM
My question may not be very relevant to the main topic of the paper. It seems to me, in this paper, visual salience is primarily evaluated by the contrast. I’m not quite sure if my conclusion is fare. I think from computer vision perspective, visual salience contains more factors. For example, in addition to the image attributes variation (e.g. gradients, edges), people also consider the visual novelty, uniqueness, etc.
ReplyDelete
Replies
UnknownApril 20, 2014 at 7:50 PM
It might be something stupid but I was trying to do a small experiment using the following images (without any task in mind but to just see what I look in the images) --

1. Image 1 (http://blog.gettyimages.com/wp-content/uploads/2013/08/Jennifer-Lawrence-Bradley-Cooper-Oscars-2013-Christopher-Polk-Getty-Images-162549463.jpg) -- In this image, there are a lot of people and I look at faces in 3-4 seconds (mainly on the face of Jennifer Lawrence).

2. Image 2 (http://www.lonelyplanet.com/travel-blog/tip-article/wordpress_uploads/2013/05/india1_cs.jpg) - In this image, there are some people posing in front of Taj Mahal but in first 3 seconds I looked at Taj Mahal. (I later on looked more carefully to see what is there in image).

3. Image 3 (http://img.ibtimes.com/www/data/images/full/2013/08/19/400404-representational-image.jpg) - In this image, there are a lot of people sitting on train but I am not able to fix up my eyes on one thing.

Interestingly in image-1 & 2, the focus point (Jennifer Lawrence and Taj Mahal) are at center and in image-3, things seem to be symmetric around center. Although it is not a very rigorous experiment, can we say that something which is on center has more bias towards attention (instead of saying that bias is more for faces, monuments etc)? I mean more I think, more it seems to me that for bottom-up saliency detection or task-independent saliency we start looking from center and try to see something interesting near center. Does it make sense? Any thoughts?
ReplyDelete
Replies
UnknownApril 20, 2014 at 7:50 PM
Ok, this is a total can of worms, but...what's the take-home for main-stream computer vision?

Is there anything to learn here beyond the notion that a task can overrule looking for standard pop-out-type things. I think the the fact that there's task-based overruling of standard visual search policies is a very interesting point, but unfortunately we're too early to make much use of it (we really don't have systems with multiple purposes yet).

The Judd paper mainly suggests either replacing eye trackers or graphics-like tasks where you need to know where a human will look (or not!). i'm hopeful that at least someone will be willing to defend saliency as something really important.
ReplyDelete
Replies
M AravindhApril 20, 2014 at 7:56 PM
Computer vision research has primarily focused on attention as a way of knowing what to search for - spatially or in terms of feature selection. In this context, we have several papers that try to mimic the human in predicting image saliency.

I'm, however, interested in exploiting attention in the algorithm design itself. For example, this paper (http://www.umiacs.umd.edu/~mishraka/activeSeg.html) decomposes the problem to sequentially focus/fixate on one point at a time to segment out different objects in the scene. I unfortunately have not found any such work for other popular computer vision problems - what happens if we have a sequential decomposition of the problem and does a point seed fundamentally different from other seeding methods.
ReplyDelete
Replies
PriyamApril 21, 2014 at 12:06 AM
To me this was one of the more intriguing papers. Also I think the discussions going on are pretty interesting.

But one thing that this makes me think of is what if the feature extractors that are used for a particular image have such an RF of its own and the problems that vision faces might be due to the fact that this RF captures the wrong things from a scene or image.

What if the black box which processes inputs in CV isn't the right 'neuron' for the task. This might seem really stupid but being pretty new to vision as a field, I have some apprehensions about even the basics.
ReplyDelete
Replies
Kumar ShauryaApril 21, 2014 at 4:35 AM
After reading the paper, the most interesting aspect to me was the suppression of the response to stimuli by the presence of weak probe stimuli, and the suppression increased the poorer the probe stimulus was for the particular neuron. To me initially, this was very counterintuitive - it is very likely that most stimuli would not be tuned for given a particular selective neuron, and thus the net effect would be that the neuron would hardly ever trigger a strong response. Is it because we want the neuron to fire well only when the stimulus it is tuned for is at a high contrast/is being attended to?

The second thing that I was led to think about by the suppression was that this implies that viewing a stimulus with a very high response for a given selective neuron (say, e.g. a very bright red colour for a red specific neuron) would in effect 'devastate' the detection of other colours in the same receptive field. A simple test would be to display a big red circle with green dots in it vs the green dots in isolation. However, I feel that the colour-contrast between the two should actually enhance the appearance of the green dots. Maybe something like a very thin vertical stripe hidden between very prominent black and white horizontal stripes.
ReplyDelete
Replies
GauravApril 21, 2014 at 8:26 AM
I think there are 2 points raised by this paper possibly relevant to designing human like vision systems:
1. Higher contrast regions are automatically more important in lower vision areas and their response is preferentially transmitted to higher level areas.
2. Attention can change the response of the low level and hence the high level by increasing the effect of response due to what is present at area of attention.
These might be obvious in a way but the paper does a statistical study of the effects of contrast and attention to prove these two.

I think the "take home" might be to remove low contrast regions in a scene understanding setting.
ReplyDelete
Replies
UnknownApril 23, 2014 at 1:08 AM
One important discussion as a takeaway was the overall impact of attention during visual processing. Is it just a filter helping to prune what to process or is it actually helping in re-weighting our estimates of what to expect/recognize? How much of a role does context play in bottom up attention?
ReplyDelete
Replies

Add comment