What I liked about the paper is the transition from the readings so far. Specifically, the authors show that the PPA which is selective for scenes can also be activated for navigationally relevant objects objects, and marks a big change in the idea's we've come across about the PPA.
I think my favorite part was that they compare forgotten and not forgotten objects. I think this is an important point to consider for vision - should we reward systems that act more like a human by ignoring things that are irrelevant? We usually try to classify everything possible, but maybe being more selective to things relevant to a task like navigation would be a good trait in a vision system.
I though the fact that objects reported as forgotten but were at important junctions in the maze still yielded a strong response is very interesting. Clearly the object was not actually forgotten (part of the brain is still responding highly to it), but it's not making it up to the conscious brain somehow. I wonder if this would change if the images were presented for longer than 500ms.
I really like this question: "should we reward systems that act more like a human by ignoring things that are irrelevant".
Frequently, I think there's an emphasis in computer vision on explaining everything in a particular scene (even in cases a human given huge amounts of time would have issues getting everything), when in reality, we might be better concerned more with understanding 30% of the scene really well (and knowing which 30% we understand). Object detection does this fairly well by having "difficult" objects in Pascal VOC as far as I remember.
I find it fascinating that the brain still responded for objects at important junctions that the subjects couldn't recall, and what that means for navigation purposes. I like Ishan's analogy with corner detections - it does seem enticing that the brain learns to associate these 'decision objects' as a sort of feature to fire for automatic retrieval. As always, I'm interested in knowing how long this effect lasted. Additionally, I wonder how different their responses would have been if in the end they would be dropped off at a point in the maze and try to start a toy tour from there by actually moving in the museum rather than just passively watching a video.
With regards to what Allie and David have to say, I completely agree. Humans are extremely efficient at 'zoning out' the endless stream of data that our senses provide us, and it makes a lot of sense to run some sort of anomaly detectors to just fire whenever things are out of the ordinary. Slightly off-topic, but similar reasoning has led to a pretty fancy camera developed at UZurich that only detects changes in the visual field, much like a retina, allowing effectively super high speed vision at very low latency. Lookup Dynamic Vision Sensor if you're interested.
I think it is important to think of biological vision as a means to an end; organisms do not need a "complete" understanding of a scene; they just need something that accomplishes their goal of survival at the end of the day. Instead of scene "understanding," it looks like in this case the PPA is doing scene understanding for the sake of navigation.
I think there are two issues. One is that does a human really ignore things that are irrelevant? I personally think that at the early stages of information processing, humans encode and maintain every possible information. It is at the later decision-related stages that certain information is highlighted while others are strongly suppressed. This is task dependent and processed unconsciously. That is, even if certain object is forgotten by the navigation task, other scenario would probably remind us that we have seen it before. The other issue is that should we build machine vision system according to human vision system given that these two are quite different from each other such as computational ability and memory capability.
One question I am left with is whether or not a similar signal would be apparent for objects associated with non-spatial decision making. It seems that they showed that objects associated with a clear choice between paths in a maze produce stronger signal, but could this be repeated if the object were associated with some other decision? For example if the users were trained to press certain buttons to generate various actions, would objects presented in conjunction with that elicit similar results? I guess the question is whether or not this is related to spatial paths or decision making in general.
I am also curious about this. In a previous paper we read about how manipulable objects and non-manipulable objects are treated differently. I wonder if there is any similarity in the mechanisms behind these two?
On one hand, navigation and manipulation seem to be such different tasks that it seems natural that the underlying mechanisms would be different. But then again, both are cases where extra attention is paid to objects due to their relation with actions.
I'm puzzled by one thing, and I was hoping someone might be able to fill me in. Fom what I can tell, their analysis is based on these beta variables, which are per-voxel. Their methods say they fit a GLM. I get what the X might be in y = X \beta + \epsilon. Can someone please explain what y is, and whether y changes per experiment?
This is my guess which may be totally wrong, but let me just put it out there. It's based on this sentence from the paper (pg 5) "Event-related responses .. response function" I am guessing they are doing some sort of least-squares curve fitting. So we can treat the hemodynamic response as "delta functions" -> points through which we fit a curve (a line for GLM) getting the residual "epsilon". The "betas" tell you which "delta functions" contribute and how much.
Then the y is just how much response you get? But didn't some earlier paper (not sure which one) argue that the amount of signal wasn't actually important? Oh well, I mean, assuming the y is a reasonable response variable, this paper makes sense. And I'm not an expert, so I'm going to have to defer to the authors and assume it's a reasonable response variable...
Y is the BOLD (fMRI signal) response. This was an earlier paper where the majority of fMRI studies were only looking at overall amplitude/strength of activity using GLM stats. Nowadays, it is much more centered on MVPA analysis. The two different analyses ask different questions, make different assumptions of the underlying encoding and role in the cognitive process. So they don't always line up, but sometimes they do. I wouldn't say one is important and the other isn't --- just two different ways of doing analysis.
Ah ok -- as I said, I totally believed that it made sense, but I was just curious, especially given that most of the papers we've read haven't used this approach. I'm actually now quite curious what prompted the shift since to the MVPA paradigm.
I liked reading this paper and their result. To me this recall of objects at "decision points" seems very intuitive because of this nice "corner like" 3D structure of the decision points. I am not entirely convinced it is because of the "navigational" issue and I think the authors are overselling it. I bet the same would hold in an experiment where you put 2D objects in rectangles, people would remember better the ones which are closer to corners. It is just because visually you have the corner to latch on to, and thus create an association. Is it because you want to navigate? I do not know.
I agree. The primary task given to the subjects was to remember the route and a secondary task was to focus on toy objects. It seems intuitive that while remembering the routes one would also be focusing on things like "where can I walk on this scene?", "where are the walls?" ; essentially looking at the 3D spatial layout of the scene along with landmarks. In the last discussion it was concluded that the PPA encodes both spatial and object information. During the recognition of objects, it might be the spatial association with which the objects were seen earlier, that help in recalling if it has been seen before. The task of navigation may be biasing the subjects to latch on to the spatial associations though.
I guess they tried to control (somewhat) for this by having turns as the non decision points, which is at least better than just placing objects along a straight corridor.
It was interesting to see that the paper was organized to look at the information encoded in the brain specific to the task of navigation. This point has been brought up a number of times in class that the fMRI signals we analyze might be task-dependent. However during the fMRI experiment objects were shown in isolation and not in context of the scenes seen during the virtual tour. I wonder if anybody had any concerns about this?
I think their aim was to measure PPA response for the object. As we know PPA is responsive for the scene also, so if the subject where shown images with scene, it would be difficult to know whether response is due to scene or object.
I feel that the authors should take the task one step further. Some subjects (50% of them) should be instructed to actively explore the maze sequence from inside the scanner. The rest of them should do passive exploration. If navigation related objects are treated differently in the PPA, then I feel that active exploration subjects would show an even higher increase when compared to passive exploration subjects.
Attention cannot be controlled in the active exploration setup but the passive exploration setup shows that attention infact leads to a decrease in activation.
The above suggestion is based on our day to day experience that people routing themselves know routes much better than people sitting on the backseat.
I was also curious about this active navigation setup (although then it's so much harder to control things) for the exact same reason -- in my experience, you don't actually learn to navigate well in the backseat. Certain family drives that were totally familiar for years became totally unfamiliar when I actually had to do it.
I really liked the paper. In the paper they have shown that brain automatically distinguishes between objects at navigationally relevant and irrelevant locations. But I feel that it might be more than just navigation, may be our brain distinguishes to object on the basis of task. They should have added on more task which was not related to navigation while seeing the object and compared its response with navigation and no task at all.
This paper reminds me of work in robot path following. The researchers represented a path for the robot to follow as a lookup table of feature points and movement directions associated with them. At test time, the robot searches for feature points in the image and upon matching one from the lookup table simply moves in the direction stored for that feature point. (I don't remember the details of how they made feature matching robust). This method was able to travel significantly large distances (~1km) with very small errors (~1cm) on a quadcoptor. Does anyone have a link to this work?
Is navigational relevance a principle for learning routes on the fly or is it a more general categorization mechanism for grouping objects. I think this question is interesting and can be answered by studying the PPA on the same subjects a week after the maze experiment. If the increased activity for objects at decision points is still seen, then it favors a hypothesis that navigational relevance is a general categorization mechanism used by the PPA.
I like that this study involves looking at a series of images or video and is more applicable to robots. Personal liking aside, I would like to make a point: Correct me if I'm wrong, but is it wrong to extrapolate from this finding that neuroscience vision studies strongly reflects human life experience in preference to semantics ? to take this statement to it's most extreme conclusion - does the response in PPA reflect only how individual humans encountered scenes and objects in the past and similar responses are grouped by similar real world experience (objects encountered while navigating, object encountered while using hands) ? This follows that a computer vision system replicating human vision must have a few basic functions and then must be flexible enough to respond to influences due to experience. I'm interested in seeing a study where a *normal* person is compared with someone having a higher level of expertise or interest in the field. Example, a toys enthusiast in this experiment. Will he/she have a different interest in the study and remember toys for being toys and not navigational way points ?
I think Elissa mentioned that they did something like this with London cab drivers for navigation; they had different hippocampi compared to the average person.
Jacob, but lets' think from an average human perspective. Suppose someone is new to place and exploring the city --
1. For first k-times (until the things have very much engrossed into his/her) memory, (s)he will look at 'scene texts' very carefully to navigate around. The visual cues of surroundings begins to add as k increases.
2. After sometime, the same person no more looks around and with a slight visual cue can navigate very easily in that same region. Probably it is the memory that takes over.
Some questions -- 1. Can we consider scene texts as object? OR in general, can we consider text as object OR there are different places in the brain which is responsible for recognizing text?
2. Isn't it that while navigating visual cues are developing on top on linguistic stuff? (Probably it might be the fact that it makes easier to navigate and describe the places beginning from text than to say there exists so and so building which looks so and so.)
3. But what if there is no text to begin with. Suppose I go to the remotest place in say India. The only way people can describe in such a setting will be through visual cues. (Go so and so, you will see this and this. After this thing, take a turn). The interesting thing is our brain processes this information as well.
4. It seems brain is very much flexible to the information given to it. It tries to pick up the best which works in the situation. But where is it learning to process all this information? It is said that vision takes a lot of brain processes. But it seems to me that vision is one costly sensor that is helping ends meet. There is some stuff happening which is actually governing what to look at?
One thing I like about this paper is the elegant and simply design, in particular, the way they ruled out the alternative explanation that the reported effects could be due to more attention is paid to objects at the decision point. Attention is a serious confounding variable in many fMRI studies and this seems a neat way to control for it. One issue I have trouble following, is that the paper mentioned allocentric spatial representation at the very beginning, so I assume their objects are in allocentric representations (object-to-object relations). However, watching the virtual museum film seems tap egocentric representations to me (self-to-object relations)? Maybe I am heavily influenced by studies showing evidence from both normal and patient (i.e., hemispatial neglect) studies that there are multiple spatial reference frames (allocentric and egocentric representations) in the brain. So I am not sure if I took the term allocentric here too seriously. Is PPA sensitive to different kinds of spatial reference frames (egocentric v.s allocentric)?
From another perspective, I think what PPA does for navigation is quite similar to sparse coding in lots of related tasks in computer vision. That is, rule out unimportant or unnecessary information while highlighting certain factors that are crucial to the current tasks.
One takeaway for me from the discussion in the class was how context serves as a glue for object and scene responses in the PPA. The open-ended discussion about solving vision 100% versus discarding data to optimize time for completing a task- which is most likely what biological systems do to make decisions, was also interesting. I also think that although we are always optimizing time with respect to a task as biological entities, given more time we are capable of recognizing more from a computer vision perspective and maybe even making better decisions.
What I liked about the paper is the transition from the readings so far. Specifically, the authors show that the PPA which is selective for scenes can also be activated for navigationally relevant objects objects, and marks a big change in the idea's we've come across about the PPA.
ReplyDeleteI think my favorite part was that they compare forgotten and not forgotten objects. I think this is an important point to consider for vision - should we reward systems that act more like a human by ignoring things that are irrelevant? We usually try to classify everything possible, but maybe being more selective to things relevant to a task like navigation would be a good trait in a vision system.
ReplyDeleteI though the fact that objects reported as forgotten but were at important junctions in the maze still yielded a strong response is very interesting. Clearly the object was not actually forgotten (part of the brain is still responding highly to it), but it's not making it up to the conscious brain somehow. I wonder if this would change if the images were presented for longer than 500ms.
DeleteI really like this question: "should we reward systems that act more like a human by ignoring things that are irrelevant".
DeleteFrequently, I think there's an emphasis in computer vision on explaining everything in a particular scene (even in cases a human given huge amounts of time would have issues getting everything), when in reality, we might be better concerned more with understanding 30% of the scene really well (and knowing which 30% we understand). Object detection does this fairly well by having "difficult" objects in Pascal VOC as far as I remember.
I find it fascinating that the brain still responded for objects at important junctions that the subjects couldn't recall, and what that means for navigation purposes. I like Ishan's analogy with corner detections - it does seem enticing that the brain learns to associate these 'decision objects' as a sort of feature to fire for automatic retrieval. As always, I'm interested in knowing how long this effect lasted. Additionally, I wonder how different their responses would have been if in the end they would be dropped off at a point in the maze and try to start a toy tour from there by actually moving in the museum rather than just passively watching a video.
DeleteWith regards to what Allie and David have to say, I completely agree. Humans are extremely efficient at 'zoning out' the endless stream of data that our senses provide us, and it makes a lot of sense to run some sort of anomaly detectors to just fire whenever things are out of the ordinary. Slightly off-topic, but similar reasoning has led to a pretty fancy camera developed at UZurich that only detects changes in the visual field, much like a retina, allowing effectively super high speed vision at very low latency. Lookup Dynamic Vision Sensor if you're interested.
I think it is important to think of biological vision as a means to an end; organisms do not need a "complete" understanding of a scene; they just need something that accomplishes their goal of survival at the end of the day. Instead of scene "understanding," it looks like in this case the PPA is doing scene understanding for the sake of navigation.
DeleteI think there are two issues. One is that does a human really ignore things that are irrelevant? I personally think that at the early stages of information processing, humans encode and maintain every possible information. It is at the later decision-related stages that certain information is highlighted while others are strongly suppressed. This is task dependent and processed unconsciously. That is, even if certain object is forgotten by the navigation task, other scenario would probably remind us that we have seen it before. The other issue is that should we build machine vision system according to human vision system given that these two are quite different from each other such as computational ability and memory capability.
DeleteThis comment has been removed by the author.
ReplyDeleteOne question I am left with is whether or not a similar signal would be apparent for objects associated with non-spatial decision making. It seems that they showed that objects associated with a clear choice between paths in a maze produce stronger signal, but could this be repeated if the object were associated with some other decision? For example if the users were trained to press certain buttons to generate various actions, would objects presented in conjunction with that elicit similar results? I guess the question is whether or not this is related to spatial paths or decision making in general.
ReplyDeleteI am also curious about this. In a previous paper we read about how manipulable objects and non-manipulable objects are treated differently. I wonder if there is any similarity in the mechanisms behind these two?
DeleteOn one hand, navigation and manipulation seem to be such different tasks that it seems natural that the underlying mechanisms would be different. But then again, both are cases where extra attention is paid to objects due to their relation with actions.
The second paper for today addresses this question by not restricting to navigational context but to more general contextual association.
DeleteI'm puzzled by one thing, and I was hoping someone might be able to fill me in. Fom what I can tell, their analysis is based on these beta variables, which are per-voxel. Their methods say they fit a GLM. I get what the X might be in y = X \beta + \epsilon. Can someone please explain what y is, and whether y changes per experiment?
ReplyDeleteThis is my guess which may be totally wrong, but let me just put it out there.
DeleteIt's based on this sentence from the paper (pg 5)
"Event-related responses .. response function"
I am guessing they are doing some sort of least-squares curve fitting. So we can treat the hemodynamic response as "delta functions" -> points through which we fit a curve (a line for GLM) getting the residual "epsilon". The "betas" tell you which "delta functions" contribute and how much.
Then the y is just how much response you get? But didn't some earlier paper (not sure which one) argue that the amount of signal wasn't actually important? Oh well, I mean, assuming the y is a reasonable response variable, this paper makes sense. And I'm not an expert, so I'm going to have to defer to the authors and assume it's a reasonable response variable...
DeleteY is the BOLD (fMRI signal) response. This was an earlier paper where the majority of fMRI studies were only looking at overall amplitude/strength of activity using GLM stats. Nowadays, it is much more centered on MVPA analysis. The two different analyses ask different questions, make different assumptions of the underlying encoding and role in the cognitive process. So they don't always line up, but sometimes they do. I wouldn't say one is important and the other isn't --- just two different ways of doing analysis.
DeleteAh ok -- as I said, I totally believed that it made sense, but I was just curious, especially given that most of the papers we've read haven't used this approach. I'm actually now quite curious what prompted the shift since to the MVPA paradigm.
DeleteThis comment has been removed by the author.
ReplyDeleteI liked reading this paper and their result. To me this recall of objects at "decision points" seems very intuitive because of this nice "corner like" 3D structure of the decision points. I am not entirely convinced it is because of the "navigational" issue and I think the authors are overselling it. I bet the same would hold in an experiment where you put 2D objects in rectangles, people would remember better the ones which are closer to corners. It is just because visually you have the corner to latch on to, and thus create an association. Is it because you want to navigate? I do not know.
ReplyDeleteThis comment has been removed by the author.
DeleteI agree. The primary task given to the subjects was to remember the route and a secondary task was to focus on toy objects. It seems intuitive that while remembering the routes one would also be focusing on things like "where can I walk on this scene?", "where are the walls?" ; essentially looking at the 3D spatial layout of the scene along with landmarks. In the last discussion it was concluded that the PPA encodes both spatial and object information. During the recognition of objects, it might be the spatial association with which the objects were seen earlier, that help in recalling if it has been seen before. The task of navigation may be biasing the subjects to latch on to the spatial associations though.
DeleteI guess they tried to control (somewhat) for this by having turns as the non decision points, which is at least better than just placing objects along a straight corridor.
DeleteIt was interesting to see that the paper was organized to look at the information encoded in the brain specific to the task of navigation. This point has been brought up a number of times in class that the fMRI signals we analyze might be task-dependent. However during the fMRI experiment objects were shown in isolation and not in context of the scenes seen during the virtual tour. I wonder if anybody had any concerns about this?
ReplyDeleteI think their aim was to measure PPA response for the object. As we know PPA is responsive for the scene also, so if the subject where shown images with scene, it would be difficult to know whether response is due to scene or object.
DeleteThis comment has been removed by the author.
DeleteI feel that the authors should take the task one step further. Some subjects (50% of them) should be instructed to actively explore the maze sequence from inside the scanner. The rest of them should do passive exploration. If navigation related objects are treated differently in the PPA, then I feel that active exploration subjects would show an even higher increase when compared to passive exploration subjects.
DeleteAttention cannot be controlled in the active exploration setup but the passive exploration setup shows that attention infact leads to a decrease in activation.
The above suggestion is based on our day to day experience that people routing themselves know routes much better than people sitting on the backseat.
I was also curious about this active navigation setup (although then it's so much harder to control things) for the exact same reason -- in my experience, you don't actually learn to navigate well in the backseat. Certain family drives that were totally familiar for years became totally unfamiliar when I actually had to do it.
DeleteThis comment has been removed by the author.
ReplyDeleteI really liked the paper. In the paper they have shown that brain automatically
ReplyDeletedistinguishes between objects at navigationally relevant and irrelevant locations.
But I feel that it might be more than just navigation, may be our brain distinguishes to object on the basis of task. They should have added on more task which was not related to navigation while seeing the object and compared its response with navigation and no task at all.
This paper reminds me of work in robot path following. The researchers represented a path for the robot to follow as a lookup table of feature points and movement directions associated with them. At test time, the robot searches for feature points in the image and upon matching one from the lookup table simply moves in the direction stored for that feature point. (I don't remember the details of how they made feature matching robust). This method was able to travel significantly large distances (~1km) with very small errors (~1cm) on a quadcoptor. Does anyone have a link to this work?
ReplyDeleteIs navigational relevance a principle for learning routes on the fly or is it a more general categorization mechanism for grouping objects. I think this question is interesting and can be answered by studying the PPA on the same subjects a week after the maze experiment. If the increased activity for objects at decision points is still seen, then it favors a hypothesis that navigational relevance is a general categorization mechanism used by the PPA.
ReplyDeleteI like that this study involves looking at a series of images or video and is more applicable to robots.
ReplyDeletePersonal liking aside, I would like to make a point:
Correct me if I'm wrong, but is it wrong to extrapolate from this finding that neuroscience vision studies strongly reflects human life experience in preference to semantics ? to take this statement to it's most extreme conclusion - does the response in PPA reflect only how individual humans encountered scenes and objects in the past and similar responses are grouped by similar real world experience (objects encountered while navigating, object encountered while using hands) ?
This follows that a computer vision system replicating human vision must have a few basic functions and then must be flexible enough to respond to influences due to experience.
I'm interested in seeing a study where a *normal* person is compared with someone having a higher level of expertise or interest in the field. Example, a toys enthusiast in this experiment. Will he/she have a different interest in the study and remember toys for being toys and not navigational way points ?
I think Elissa mentioned that they did something like this with London cab drivers for navigation; they had different hippocampi compared to the average person.
DeleteJacob, but lets' think from an average human perspective. Suppose someone is new to place and exploring the city --
Delete1. For first k-times (until the things have very much engrossed into his/her) memory, (s)he will look at 'scene texts' very carefully to navigate around. The visual cues of surroundings begins to add as k increases.
2. After sometime, the same person no more looks around and with a slight visual cue can navigate very easily in that same region. Probably it is the memory that takes over.
Some questions --
1. Can we consider scene texts as object? OR in general, can we consider text as object OR there are different places in the brain which is responsible for recognizing text?
2. Isn't it that while navigating visual cues are developing on top on linguistic stuff? (Probably it might be the fact that it makes easier to navigate and describe the places beginning from text than to say there exists so and so building which looks so and so.)
3. But what if there is no text to begin with. Suppose I go to the remotest place in say India. The only way people can describe in such a setting will be through visual cues. (Go so and so, you will see this and this. After this thing, take a turn). The interesting thing is our brain processes this information as well.
4. It seems brain is very much flexible to the information given to it. It tries to pick up the best which works in the situation. But where is it learning to process all this information? It is said that vision takes a lot of brain processes. But it seems to me that vision is one costly sensor that is helping ends meet. There is some stuff happening which is actually governing what to look at?
One thing I like about this paper is the elegant and simply design, in particular, the way they ruled out the alternative explanation that the reported effects could be due to more attention is paid to objects at the decision point. Attention is a serious confounding variable in many fMRI studies and this seems a neat way to control for it. One issue I have trouble following, is that the paper mentioned allocentric spatial representation at the very beginning, so I assume their objects are in allocentric representations (object-to-object relations). However, watching the virtual museum film seems tap egocentric representations to me (self-to-object relations)? Maybe I am heavily influenced by studies showing evidence from both normal and patient (i.e., hemispatial neglect) studies that there are multiple spatial reference frames (allocentric and egocentric representations) in the brain. So I am not sure if I took the term allocentric here too seriously. Is PPA sensitive to different kinds of spatial reference frames (egocentric v.s allocentric)?
ReplyDeleteFrom another perspective, I think what PPA does for navigation is quite similar to sparse coding in lots of related tasks in computer vision. That is, rule out unimportant or unnecessary information while highlighting certain factors that are crucial to the current tasks.
ReplyDeleteOne takeaway for me from the discussion in the class was how context serves as a glue for object and scene responses in the PPA. The open-ended discussion about solving vision 100% versus discarding data to optimize time for completing a task- which is most likely what biological systems do to make decisions, was also interesting. I also think that although we are always optimizing time with respect to a task as biological entities, given more time we are capable of recognizing more from a computer vision perspective and maybe even making better decisions.
ReplyDelete