Comments on Visual Neurons and Machines: scenes objects based rep

One thing that struck me after class was that the ...

2014-04-02T11:06:36.694-07:00

One thing that struck me after class was that the PPA was shown to be more correlated with LOC than RSC, so that kind of goes against the hypothesis that PPA is basically latching on to the 3D structure or spatial information from the furniture objects. PPA is also encoding the object content as well. But again all this depends on how fair it is to use the correlation metric for comparison between regions of the brain, how long is the time to process the images etc.

I would be curious to know people's opinions o...

2014-04-02T08:56:28.733-07:00

I would be curious to know people's opinions on what this all means for the visual process in general. I.E. if we forget about PPA/SRC/LOC for a minute and simply make the claim: "Spatial information about scenes (open/closed) is processed in one area of the brain, object identity in another area and object pose in another area" what does that actually tell us about the visual problem? What would be different if they were all processed as part of one big block of functionality? To me it makes obvious sense that they would be separate: the individual pieces of data seem to be relatively uncorrelated, and seem to require different process to evaluate, so it would make sense to have dedicated regions. Does anyone have any other thoughts on what we might get from this information?

I personally like the use of synthetic data. It&#...

2014-04-02T08:45:10.178-07:00

I personally like the use of synthetic data. It's important not to extrapolate the conclusions too far, but there is something to separating out the important things we want to test for... removing clutter if you will. It might have been more compelling though if they simply constructed real physical scenes that were very simple and photographed them, rather than using CG models. Honestly if you're going through all the trouble of running fMRI on 20 patients (at hundreds of dollars an hour), you think they could have found a bare room and an old couch somewhere and simply taken a bunch of photos. Also I agree that they probably should have done more comparisons against complex scenes.

I'm actually not sure how much the PPA is able...

2014-04-02T08:05:27.716-07:00

I'm actually not sure how much the PPA is able to decode. Last class I defended this correlation approach as being better than an SVM. But one problem I now see is that I really don't know what 0.02 decoding here means. Is this strong decoding? Yes, yes, it's statistically significant (i.e., there's something going on), but the question I have is whether this can be explained by other stuff going on in the brain that's not really "semantic-object" tuned. Basically: is this decoding accuracy good enough that it's not just low-level statistics or some other stuff floating around the PPA.

Till now I still do not get a clear picture with r...

2014-04-02T07:55:52.677-07:00

Till now I still do not get a clear picture with respect to the role of the corresponding regions in the brain for scene understanding, especially PPA. Different papers we read use different experimental setup and dataset, and hence sometimes lead to controversial conclusions. I thinks this is probably because of different methodology between computer vision and neuroscience. In computer vision, we are accustomed to proving our assumption using extensive experiments on different dataset and even for different vision tasks. Without doing so, our paper would have high chance of being rejected. Meanwhile, for neuroscience paper, probably partially due to the hardness of conducting comprehensive experiments, they are usually simplified, such as simplified dataset and simplified parameter tuning procedure. Hence, the conclusions are usually skeptical for vision guys.

I agree with their philosophy of disentangling and...

2014-04-02T07:50:41.316-07:00

I agree with their philosophy of disentangling and controlling different factors by using simplified synthetic data due to the indeed complexity of the issue. However, i think their conclusions conducted by the usage of only synthetic data is unguaranteed. Simply combining both synthetic and natural data, i.e., repeating the experiments using both data, is also not enough. A better approach is to use the gradual evolution of data. For example, start from some simplified synthetic data, and then gradually add other objects, backgrounds and geometric layout to make the scene become more complex. An alternative is that start from some complex scene image, and then gradually discard some objects and backgrounds to simplify the scene. In both scenarios, test whether and how the response of the corresponding areas in the brain change.

The paper emphasizes the connections from LOC and ...

2014-04-02T06:13:34.915-07:00

The paper emphasizes the connections from LOC and RSC into PPA and how PPA receives information from several brain regions (hippocampus, V4 etc.).
What does PPA output too?

I'm interested in this question because then we may hypothesize what the brain does with the scene information.

I agree that synthetic data has its flaws, but don...

2014-04-02T06:11:03.135-07:00

I agree that synthetic data has its flaws, but don't you think it's important to understand which changes in a scene matter the most? By making the data as simple as possible, they can see the effect of the slightest change (desk vs bed). I think this is more difficult with messy real world data.

I think it's important think that biological v...

2014-04-02T06:08:03.914-07:00

I think it's important think that biological vision is a means to an end. Organisms want to know where they are, find food, etc. PPA, LOC, and RSA may all be toward different cognitive ends. I agree that the data is contrived, but it may make sense for PPA to be sensitive to objects as well if it is indeed part of a larger pathway that aids in navigation. It may be a way of noting "landmarks"; I know that this scene (or I) was at place X and it had an object Y.

If PPA is able to decode between semi-fine grained...

2014-04-02T06:07:52.138-07:00

If PPA is able to decode between semi-fine grained furniture categories then its definitely not latching onto simple cues like perspective.

I think we need results from both kinds of data - ...

2014-04-02T05:05:32.260-07:00

I think we need results from both kinds of data - synthetic and natural. The authors argue, in the introduction, that naturalistic photographic stimuli has led to conflicting results. In such a situation a synthetic data based experiment seems very useful.

I also feel that the brain areas discussed here are not so abstract as to have different pathways for synthetic vs real scenes - just a guess.

I wonder if there have been tries in vision, where...

2014-04-02T02:23:27.814-07:00

I wonder if there have been tries in vision, where people have used different methods on the same data and brought them together to add more to the over-all semantics. Like, one analysis for say high-level recognition, another method for fine-grained analysis and finally bringing them together to give a consolidated result. Can any one mention some of such experiments? If any?

Taking inspiration PPA's use of both what and ...

2014-04-02T01:41:19.795-07:00

Taking inspiration PPA's use of both what and where -
Maybe in computer vision we can use cues from object recognition to do spatial understanding eg. identifying part of image as sky means it's far away always. This reminds me of a CVPR paper that used semantic labels to inform depth prediction - (http://users.cecs.anu.edu.au/~sgould/papers/cvpr10-depth.pdf)

Reading this paper I was immediately reminded of t...

2014-04-02T01:40:12.376-07:00

Reading this paper I was immediately reminded of the Computer vision's GIST feature - http://cvcl.mit.edu/papers/IJCV01-Oliva-Torralba.pdf
PPA does what and where' processing and maybe this gives the inspiration for a general scene understanding feature like GIST ?
RSC is more related to the where aspect - might be useful for actual robotic applications of fast motion through environments. Read replicating driving/flying on robots.

This paper investigated synthetic scenes with one ...

2014-04-01T23:44:00.116-07:00

This paper investigated synthetic scenes with one object, and it seems to be suggesting that there are 2 streams of processing, one for spacial properties, and the other for object properties. The relationship between different objects in the scene seems to be important. Object properties may be used as cues to spacial properties. For example we might rely on the position of objects on a smooth surface to deduce where that surface is. It would be interesting to see where and how the 2 streams of processing merge to form a consistent representation.

Keeping with the theme of simmering discontentment...

2014-04-01T23:42:38.796-07:00

Keeping with the theme of simmering discontentment in this thread, in the discussions section when they describe their general approach, they claim that they use their formulated framework to "inform the design of complementary studies that use both *naturalistic* and artificial scenes to understand the nature of information being represented in the scene network."

Are they referring to the monkey paper? Isn't that too big a jump? Or am I just parsing this incorrectly?

To add to the skepticism, their take-away message ...

2014-04-01T23:17:19.830-07:00

To add to the skepticism, their take-away message is much too broad for the experiment they actually did. I'm particularly peeved by the idea that object information is represented by invariance to backgrounds. Simply low-level features could get you the same thing.

I am similarly skeptical about the synthetic image...

2014-04-01T23:14:51.057-07:00

I am similarly skeptical about the synthetic images. They used gray backgrounds but color objects, and it is imaginable how this might not be a good idea particularly in the case of "open" scenes.

Interesting the PPA has also been shown to capture...

2014-04-01T22:59:34.705-07:00

Interesting the PPA has also been shown to capture global statistics and textures. Following a discussion from the last class on dividing the scene recognition task into structure, content and style, would it be fair to say then that according to this paper, PPA is sensitive to all three (structure, content, style)? It seems to suggest that PPA is integrating inherited information from different regions of the brain to culminate the process of scene understanding.

The author mentioned that hey are using furniture ...

2014-04-01T22:57:43.161-07:00

The author mentioned that hey are using furniture because they can realistically embedded in a spatial environment. But many other flat objects like flat carpet, portrait etc can be easily embedded in a spatial environment and can be used in their experiment.

I liked the way in which author has shown that LOC...

2014-04-01T22:47:29.653-07:00

I liked the way in which author has shown that LOC is responsible for processing, RSC for scene layout and PPA for both. Also at last they have shown RSC is correlated with PPA which is also correlated with LOC. So is this correlation is enough to imply actual physical connection between these regions or exchange of information between them?

The authors also mention in the 'Discussion...

2014-04-01T22:46:56.587-07:00

The authors also mention in the 'Discussion' section that the objects that they use have navigational affordances which provide diagnostic details about spatial environment. So ultimately PPA may be sensitive to spatial information alone from different sources. Other objects such as manipulable tools or complex backgrounds may not lead to similar results.

On the other hand, these big objects also make the...

2014-04-01T22:39:53.490-07:00

On the other hand, these big objects also make the case for the LOC results stronger.

This paper seems to reinforce the way vision is pe...

2014-04-01T22:36:09.579-07:00

This paper seems to reinforce the way vision is perceived in modern times. The object recognition sub-field, the geometry (layout, single image 3D) sub-field having not too much cross talk.
I am highly disturbed by the fact that they use furniture objects (which themselves give strong cues of box-like 3D information), and nothing else (which may include more 2D/flat objects). The fact that we are using such objects might be a reason why the PPA gives a really high response for objects. It might just be latching on to the "perspectiveness" of these objects.

To add to your second point, the simplicity of sce...

2014-04-01T22:18:12.735-07:00

To add to your second point, the simplicity of scenes might have some role to play in PPA processing objects in these experiments...