This may have been mentioned in class, but what's the proper way to visualize dissimilarity matrices. At first I thought it was imagesc, but then I realized that it maps based on the min and max in the data, where it seems as if it may be better to have everything mapped between zero and one... but I'm not entirely sure about that either. If you always map between zero and one (assuming your feature vector is normalized), something may look similar simply because it has similar average similarity. If you don't map between zero and one, it seems hard to compare similarity matrices. Like you run into the problem that two regions that are actually similar look different because some outlier in one matrix has drastically shifted the overall range in a certain direction.
Anyway, I'm not sure if that made any sense, but I'd be interested to know what other people have been using.
Also, what are good metrics to compare between two similarity matrices? I came across Canonical correlation and Gromov–Hausdorff convergence. Any other standard approaches for this?
My only worry with norm would be that it doesn't capture order and isn't invariant to a monotonic adjusting with the distances. For instance, if you have a similarity matrix:
>> T = [1, 0.7, 0.4; 0.7, 1, 0.6; 0.4, 0.6, 1]; (which has specific preferences for each element) and two matrices >> T2 = [1, 0.45, 0.55; 0.45, 1, 0.6; 0.55, 0.6, 1]; >> T3 = [1, 0.9, 0.2; 0.9, 1, 0.7; 0.2, 0.7, 1]; T2 flips the order of data point 1's preferences, but has similar orders of magnitude of distance. T3 is the same order as T, but transformed. I'd personally think T3 captures the structure of T better, but its distance has a higher norm than T2 (at least for L2 and frobenius).
My only concern with this is that whatever representation you use will end up being fairly high dimensional, and the the dimensionality of the data (i.e., numel(x)) will do something funny in terms of the angles / distances, which is the only way you can compare representations (since this should be supervised).So basically a representation might be better not because it induces a better order but because its the distribution is the right dimension
Yeah, I had similar concerns too... it seems like you run the risk of overfitting to something that captures the correlation of all the data.
What I've been doing so far is subtracting my similarity matrix from the LHS similarity matrix and looking for something that is close to zero. I'm also using LHS-RHS as a baseline, assuming I probably won't be able to come up with a representation that matches the LHS better than the RHS does.
But again, we run into the issue of how to compare two methods quantitatively rather than saying "oh, this one looks more better".
I too have been struggling with which distance metric to use. One question I had is whether both matrices should be normalized ahead of time. If you normalize, I believe this would be imposing equal amounts of available information to both systems (brain and algorithm) and comparing the way the information is distributed (where similarities between images indicate similar kinds of information being encoded). I could be thinking of this the wrong way, though. I think the big question I'm asking is what we're trying to compare, and I'm assuming it's not simply correlations in raw activity -- it's something deeper than that. Exactly what that is, I'm not sure -- but I think that would decide the distance metric to use.
I think one way to present the results could be similar to one used by Harrel et al. where they take similarity matrices from PPA, LOC and RSC, and compute the correlation score between them. Using different correlation scores, we can make a comparison as which 'feature' is nearest to fMRI score.
This may have been mentioned in class, but what's the proper way to visualize dissimilarity matrices. At first I thought it was imagesc, but then I realized that it maps based on the min and max in the data, where it seems as if it may be better to have everything mapped between zero and one... but I'm not entirely sure about that either. If you always map between zero and one (assuming your feature vector is normalized), something may look similar simply because it has similar average similarity. If you don't map between zero and one, it seems hard to compare similarity matrices. Like you run into the problem that two regions that are actually similar look different because some outlier in one matrix has drastically shifted the overall range in a certain direction.
ReplyDeleteAnyway, I'm not sure if that made any sense, but I'd be interested to know what other people have been using.
Also, what are good metrics to compare between two similarity matrices? I came across Canonical correlation and Gromov–Hausdorff convergence. Any other standard approaches for this?
DeleteIs that fair to simply compute the norm(M1, M2)? where M1, M2 are two normalized matrices?
DeleteSorry, I mean, in matlab, norm(M1-M2)
DeleteMy only worry with norm would be that it doesn't capture order and isn't invariant to a monotonic adjusting with the distances. For instance, if you have a similarity matrix:
Delete>> T = [1, 0.7, 0.4; 0.7, 1, 0.6; 0.4, 0.6, 1];
(which has specific preferences for each element)
and two matrices
>> T2 = [1, 0.45, 0.55; 0.45, 1, 0.6; 0.55, 0.6, 1];
>> T3 = [1, 0.9, 0.2; 0.9, 1, 0.7; 0.2, 0.7, 1];
T2 flips the order of data point 1's preferences, but has similar orders of magnitude of distance. T3 is the same order as T, but transformed. I'd personally think T3 captures the structure of T better, but its distance has a higher norm than T2 (at least for L2 and frobenius).
My only concern with this is that whatever representation you use will end up being fairly high dimensional, and the the dimensionality of the data (i.e., numel(x)) will do something funny in terms of the angles / distances, which is the only way you can compare representations (since this should be supervised).So basically a representation might be better not because it induces a better order but because its the distribution is the right dimension
Not sure if that makes sense.
Yeah, I had similar concerns too... it seems like you run the risk of overfitting to something that captures the correlation of all the data.
DeleteWhat I've been doing so far is subtracting my similarity matrix from the LHS similarity matrix and looking for something that is close to zero. I'm also using LHS-RHS as a baseline, assuming I probably won't be able to come up with a representation that matches the LHS better than the RHS does.
But again, we run into the issue of how to compare two methods quantitatively rather than saying "oh, this one looks more better".
I too have been struggling with which distance metric to use. One question I had is whether both matrices should be normalized ahead of time. If you normalize, I believe this would be imposing equal amounts of available information to both systems (brain and algorithm) and comparing the way the information is distributed (where similarities between images indicate similar kinds of information being encoded). I could be thinking of this the wrong way, though.
DeleteI think the big question I'm asking is what we're trying to compare, and I'm assuming it's not simply correlations in raw activity -- it's something deeper than that. Exactly what that is, I'm not sure -- but I think that would decide the distance metric to use.
I think one way to present the results could be similar to one used by Harrel et al. where they take similarity matrices from PPA, LOC and RSC, and compute the correlation score between them. Using different correlation scores, we can make a comparison as which 'feature' is nearest to fMRI score.
Delete@David - to show T3 is better than T2 in your example may I suggest Kendall's rank correlation on the similarity matrices rankings.
DeleteThis comment has been removed by the author.
ReplyDelete