Visual Neurons and Machines: scenes objects based rep

Tuesday, April 1, 2014

scenes objects based rep

32 comments:

UnknownApril 1, 2014 at 5:39 PM
Paper for reading : · Harel et al., 2013, Deconstructing Visual Scenes in Cortex: Gradients of Object and Spatial Layout Information. Cerebral Cortex. http://graphics.cs.cmu.edu/courses/16-899A/2014_spring/thevisualworld/HarelBaker2013.pdf
ReplyDelete
Replies
UnknownApril 1, 2014 at 8:29 PM
It is counter intuitive that knowing the identity, or absence, give more information compared to just the activation values. For example in figure 2, the initial experiment shows a comparison of neural activation, and also shows absence and identity decoding. Overall, the information conveyed is that PPA has object information, RSA has little object information. How does this experiment throw light on what we already know?
ReplyDelete
Replies
UnknownApril 1, 2014 at 8:30 PM
The second paper [Stansbury 2013] is also a good read.
ReplyDelete
Replies
UnknownApril 1, 2014 at 9:19 PM
My first reaction is that I'm torn by this paper. In the end, I'm glad that I read it though.

Part of me loves that that they try to make it as simple as possible to eliminate as many confounding factors as possible. This is really great and I think important to do.

However, part of me just has to say that I'm just fundamentally skeptical of their synthetic data. I believe that overly simplistic scenes can be misleading about computer vision, and that you can learn a lot of wrong things about the world with them. For instance, if you have scenes with only a few objects, you might think that it's important to label all the objects (i.e., identify all the chairs). However, in the real world, if you see a scene with tons of chairs, it might be ok to get away with just "chairness" representations (i.e., there are some chairs here and I'll never be bothered to count them, only maybe identify and segment out one nearby chair that I could sit on).

This doesn't mean that I don't buy any conclusion of the paper, but I just think that everything really needs to be taken with a large grain of salt.
ReplyDelete
Replies
IshanApril 1, 2014 at 10:36 PM
This paper seems to reinforce the way vision is perceived in modern times. The object recognition sub-field, the geometry (layout, single image 3D) sub-field having not too much cross talk.
I am highly disturbed by the fact that they use furniture objects (which themselves give strong cues of box-like 3D information), and nothing else (which may include more 2D/flat objects). The fact that we are using such objects might be a reason why the PPA gives a really high response for objects. It might just be latching on to the "perspectiveness" of these objects.
ReplyDelete
Replies
UnknownApril 1, 2014 at 10:47 PM
I liked the way in which author has shown that LOC is responsible for processing, RSC for scene layout and PPA for both. Also at last they have shown RSC is correlated with PPA which is also correlated with LOC. So is this correlation is enough to imply actual physical connection between these regions or exchange of information between them?
ReplyDelete
Replies
UnknownApril 1, 2014 at 10:59 PM
Interesting the PPA has also been shown to capture global statistics and textures. Following a discussion from the last class on dividing the scene recognition task into structure, content and style, would it be fair to say then that according to this paper, PPA is sensitive to all three (structure, content, style)? It seems to suggest that PPA is integrating inherited information from different regions of the brain to culminate the process of scene understanding.
ReplyDelete
Replies
UnknownApril 1, 2014 at 11:44 PM
This paper investigated synthetic scenes with one object, and it seems to be suggesting that there are 2 streams of processing, one for spacial properties, and the other for object properties. The relationship between different objects in the scene seems to be important. Object properties may be used as cues to spacial properties. For example we might rely on the position of objects on a smooth surface to deduce where that surface is. It would be interesting to see where and how the 2 streams of processing merge to form a consistent representation.
ReplyDelete
Replies
GauravApril 2, 2014 at 1:40 AM
Reading this paper I was immediately reminded of the Computer vision's GIST feature - http://cvcl.mit.edu/papers/IJCV01-Oliva-Torralba.pdf
PPA does what and where' processing and maybe this gives the inspiration for a general scene understanding feature like GIST ?
RSC is more related to the where aspect - might be useful for actual robotic applications of fast motion through environments. Read replicating driving/flying on robots.
ReplyDelete
Replies
GauravApril 2, 2014 at 1:41 AM
Taking inspiration PPA's use of both what and where -
Maybe in computer vision we can use cues from object recognition to do spatial understanding eg. identifying part of image as sky means it's far away always. This reminds me of a CVPR paper that used semantic labels to inform depth prediction - (http://users.cecs.anu.edu.au/~sgould/papers/cvpr10-depth.pdf)
ReplyDelete
Replies
PriyamApril 2, 2014 at 2:23 AM
I wonder if there have been tries in vision, where people have used different methods on the same data and brought them together to add more to the over-all semantics. Like, one analysis for say high-level recognition, another method for fine-grained analysis and finally bringing them together to give a consolidated result. Can any one mention some of such experiments? If any?
ReplyDelete
Replies
Jacob WalkerApril 2, 2014 at 6:08 AM
I think it's important think that biological vision is a means to an end. Organisms want to know where they are, find food, etc. PPA, LOC, and RSA may all be toward different cognitive ends. I agree that the data is contrived, but it may make sense for PPA to be sensitive to objects as well if it is indeed part of a larger pathway that aids in navigation. It may be a way of noting "landmarks"; I know that this scene (or I) was at place X and it had an object Y.
ReplyDelete
Replies
M AravindhApril 2, 2014 at 6:13 AM
The paper emphasizes the connections from LOC and RSC into PPA and how PPA receives information from several brain regions (hippocampus, V4 etc.).
What does PPA output too?

I'm interested in this question because then we may hypothesize what the brain does with the scene information.
ReplyDelete
Replies
Yuxiong WangApril 2, 2014 at 7:55 AM
Till now I still do not get a clear picture with respect to the role of the corresponding regions in the brain for scene understanding, especially PPA. Different papers we read use different experimental setup and dataset, and hence sometimes lead to controversial conclusions. I thinks this is probably because of different methodology between computer vision and neuroscience. In computer vision, we are accustomed to proving our assumption using extensive experiments on different dataset and even for different vision tasks. Without doing so, our paper would have high chance of being rejected. Meanwhile, for neuroscience paper, probably partially due to the hardness of conducting comprehensive experiments, they are usually simplified, such as simplified dataset and simplified parameter tuning procedure. Hence, the conclusions are usually skeptical for vision guys.
ReplyDelete
Replies
UnknownApril 2, 2014 at 8:56 AM
I would be curious to know people's opinions on what this all means for the visual process in general. I.E. if we forget about PPA/SRC/LOC for a minute and simply make the claim: "Spatial information about scenes (open/closed) is processed in one area of the brain, object identity in another area and object pose in another area" what does that actually tell us about the visual problem? What would be different if they were all processed as part of one big block of functionality? To me it makes obvious sense that they would be separate: the individual pieces of data seem to be relatively uncorrelated, and seem to require different process to evaluate, so it would make sense to have dedicated regions. Does anyone have any other thoughts on what we might get from this information?
ReplyDelete
Replies
UnknownApril 2, 2014 at 11:06 AM
One thing that struck me after class was that the PPA was shown to be more correlated with LOC than RSC, so that kind of goes against the hypothesis that PPA is basically latching on to the 3D structure or spatial information from the furniture objects. PPA is also encoding the object content as well. But again all this depends on how fair it is to use the correlation metric for comparison between regions of the brain, how long is the time to process the images etc.
ReplyDelete
Replies

Add comment