-
Notifications
You must be signed in to change notification settings - Fork 0
Nca temp #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: nca
Are you sure you want to change the base?
Nca temp #1
Conversation
bellet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments!
sklearn/neighbors/nca.py
Outdated
| Neighborhood Component Analysis (NCA) is a machine learning algorithm for | ||
| metric learning. It learns a linear transformation in a supervised fashion | ||
| to improve the classification accuracy of a stochastic nearest neighbors | ||
| rule in the new space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transformed space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/neighbors/nca.py
Outdated
|
|
||
| As NCA is optimizing a non-convex objective function, it will | ||
| likely end up in a local optimum. Several runs with independent random | ||
| init might be necessary to get a good convergence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a standard warning used in other sklearn classes? LMNN does not have it (and is also nonconvex when solving for the linear transformation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I just saw it for k-means as a note, and thought it could be useful, but warning is surely too much, and even a note may not be necessary, for consistency with lmnn.
| "Neighbourhood Components Analysis". Advances in Neural Information | ||
| Processing Systems. 17, 513-520, 2005. | ||
| http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also add a reference to the Wikipedia page
https://en.wikipedia.org/wiki/Neighbourhood_components_analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/neighbors/nca.py
Outdated
| http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf | ||
| """ | ||
|
|
||
| def __init__(self, n_features_out=None, init='identity', max_iter=50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default init for LMNN is pca, it is probably better to have the same behavior for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/neighbors/nca.py
Outdated
| from ..externals.six import integer_types | ||
|
|
||
|
|
||
| class NeighborhoodComponentAnalysis(BaseEstimator, TransformerMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original name in the paper (also used in the Wikipedia page) has an 's' at the end of component. probably to change the class name and other occurrences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/neighbors/nca.py
Outdated
| soft = np.exp(-sum_of_squares - logsumexp(-sum_of_squares)) | ||
| ci = masks[:, y[i]] | ||
| p_i_j = soft[ci] | ||
| not_ci = np.logical_not(ci) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just use ~ci and avoid defining this variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right, I changed it
| not_ci = np.logical_not(ci) | ||
| diff_ci = diffs[i, ci, :] | ||
| diff_not_ci = diffs[i, not_ci, :] | ||
| sum_ci = diff_ci.T.dot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, sum_ci and sum_not_ci are distances
sklearn/neighbors/nca.py
Outdated
| # for every sample, compute its contribution to loss and gradient | ||
| for i in range(X.shape[0]): | ||
| diff_embedded = X_embedded[i] - X_embedded | ||
| sum_of_squares = np.einsum('ij,ij->i', diff_embedded, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this variable contains distances (in embedded space), maybe it's good to have this in the variable name for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: I called them dist_embedded
sklearn/neighbors/nca.py
Outdated
| sum_of_squares = np.einsum('ij,ij->i', diff_embedded, | ||
| diff_embedded) | ||
| sum_of_squares[i] = np.inf | ||
| soft = np.exp(-sum_of_squares - logsumexp(-sum_of_squares)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again maybe find a name which makes it clear that these are exponentiated distances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: I called them exp_dist_embedded
| X_embedded = transformation.dot(X.T).T | ||
|
|
||
| # for every sample, compute its contribution to loss and gradient | ||
| for i in range(X.shape[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a bit more comments describing the steps below would be useful, since this is the heart of the algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some, but I will continue to do so, maybe with big O notations etc ...
sklearn/neighbors/tests/test_nca.py
Outdated
| y = random_state.randint(0, n_labels, (n_samples)) | ||
| point = random_state.randn(num_dims, n_features) | ||
| X = random_state.randn(n_samples, n_features) | ||
| nca = NeighborhoodComponentsAnalysis(None, init=point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quoi sert ce premier paramètre None ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah oui effectivement il ne devrait pas être là... Je crois qu'il est apparu en refactorant
* Add averaging option to AMI and NMI Leave current behavior unchanged * Flake8 fixes * Incorporate tests of means for AMI and NMI * Add note about `average_method` in NMI * Update docs from AMI, NMI changes (#1) * Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI * Update documentation and remove nose tests (#2) * Update v0.20.rst * Update test_supervised.py * Update clustering.rst * Fix multiple spaces after operator * Rename all arguments * No more arbitrary values! * Improve handling of floating-point imprecision * Clearly state when the change occurs * Update AMI/NMI docs * Update v0.20.rst * Catch FutureWarnings in AMI and NMI
No description provided.