Visual Neurons and Machines: Scenes: Structure + Content

Saturday, March 29, 2014

Scenes: Structure + Content

Kravitz, Real-World Scene Representations in High-Level Visual Cortex: It’s the Spaces More Than the Places, 2011

37 comments:

UnknownMarch 30, 2014 at 2:53 PM
By the way -- for those who read / are going to read the Patterson and Hayes paper (which I'd encourage), there's a very recently added IJCV edition which does scene classification with the predicted attributes: http://cs.brown.edu/~gen/pub_papers/SUN_Attribute_Database-Patterson_et_al.pdf

I was very curious about this personally and was very surprised that the original CVPR paper did not do this.
ReplyDelete
Replies
UnknownMarch 30, 2014 at 4:22 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownMarch 30, 2014 at 4:45 PM
1. Under the heading, 'Representational structure within cortical regions' on Page-7325, the third line states -- "Given the limited acquisition volume possible at our high-resolution, our scene-selective regions included both TOS and PPA but not RSC." Does this mean that due to infrastructural issues they were not able to include RSC in their studies?

2. As we know (as in 2014) that there are some works which show that RSC contains information about spatial layout (open, close) more than PPA, how will the conclusions in this paper modify?
ReplyDelete
Replies
UnknownMarch 30, 2014 at 5:39 PM
In the discussion, it is mentioned that "low-level representations may be important in supporting quick discriminations of complex stimuli, whereas high-level representations are specialized to support more abstract or specialized actions (e.g., navigation)". I wonder how much the brain's representations will change depending on the task, and which parts of the processing will be invariant to the task being performed?

Personally it feels different when I am looking at a scene passively, vs when I am looking at a scene with the intent to identify it, vs when I am trying to navigate. In this paper, it was mentioned that the subjects were looking out for changes in the fixation cross. I wonder if this could push the subjects towards perceiving the scenes more passively, and as a result high level semantic properties of the scene (which may require more focus to pick out) are underrepresented.
ReplyDelete
Replies
Jacob WalkerMarch 30, 2014 at 6:07 PM
I notice that one of the areas studied (PPA) surrounds the hippocampus. Please correct me if I am wrong, but isn't the hippocampus the home to "place cells," that is, cells that fire based on the perceived location of the viewer (Think Brain-based SLAM)? If the scene representation in the PPA is primarily spatial, could the PPA just be part of this SLAM machinery?
ReplyDelete
Replies
Kumar ShauryaMarch 30, 2014 at 10:12 PM
Hm, interesting paper. Kind of blows the Walther paper out of the water. It was also very entertaining to see the ELO rating system used outside of competitive chess.

I didn't quite follow how they were determining the ELO ranking. They claim that they made 10,000 iterations of the ranking (since the order of appearance can influence ratings, which totally makes sense), but I wasn't completely clear what they were using as a differentiator to rank trial images.

I also liked how they made sure to remove the retinotopic voxels, and comparison with the control(chequerboards). I suspect most of the other papers also do that, but it made a lot of sense the way they presented it.

Finally, I also liked the MDS visualization, and it seemed like I've seen it somewhere else as well.
ReplyDelete
Replies
UnknownMarch 31, 2014 at 12:34 AM
The paper was interesting and easy to follow. Liked the fact that they explained in detail (Discussion) about the bias in study design for previous studies which have shown that PPA distinguishes between high level perceptual categories of scenes, contrary to their findings. Based on the findings of this paper it would be interesting to find claims on where does scene-identification primarily occur in the brain then? Is there some strong evidence to suggest it could occur elsewhere? Or is it not a region specific task, but rather the inheritance of features along the ventral stream which ultimately enables scene understanding? The authors claim based on their findings that weaker categorization by relative distance in PPA means the PPA inherits aspects of scene categorization from pEVC.
ReplyDelete
Replies
GauravMarch 31, 2014 at 3:20 AM
This comment has been removed by the author.
ReplyDelete
Replies
GauravMarch 31, 2014 at 3:24 AM
I'm still a bit hazy on ventral and dorsal stream separation and would be glad for some insight and corrections on 2 things related to paper -
1. We have 1024x768 highly detailed images vs. line drawings in some papers. Since there is depth and *where* question to be answered do we expect the dorsal stream to light up too on fMRI for the scenes used ?
2. Does the paper blow a big hole in the two stream theory by saying PPA does spatial processing which was the domain of the dorsal stream ? Please correct me if I'm wrong here :)
ReplyDelete
Replies
UnknownMarch 31, 2014 at 5:10 AM
I'm still left thinking that this/previous papers which claim X piece of brain is responsible for Y may be misleading. For example their data shows that PPA has selectivity for expansive vs. enclosed spaces, but couldn't you also use this data to make the claim that PA is characterized by parts which detect ceilings and walls, and other parts which detect horizons?
ReplyDelete
Replies
M AravindhMarch 31, 2014 at 6:04 AM
To incorporate some of the findings of this paper, we could change the vision pipeline to do the following:-
1. First evaluate whether a scene is open/closed and near/far (relative distance).
2. Based on the result of 1 pick one of four pretrained models (trained for these specific situations) and use this model instead of a model that operates in all conditions.
Can we get away with searching a small scale space based on the relative distance?
ReplyDelete
Replies
Yuxiong WangMarch 31, 2014 at 7:52 AM
The entire experiments are conducted when the authors already have some assumption or prior (also probably bias) on certain regions of the brain to be analyzed, like PPA. So the experiments are usually, I think, not easy to generalize well on other regions of the brain. Moreover, the conclusions of these experiments are also limited. For instance, there might be other explanations for the functions of PPA rather than expanse (open, closed); there might be other regions responsible for the semantics instead of spatial factors for scene understanding. This means when we look at these problems, in most cases we are forced to be in a local perspective rather than a global one.
ReplyDelete
Replies

Add comment