class sklearn.neighbors.NearestCentroid(metric=’euclidean’, shrink_threshold=None) [source]
Nearest centroid classifier.
Each class is represented by its centroid, with test samples classified to the class with the nearest centroid.
Read more in the User Guide.
| Parameters: |
|
|---|---|
| Attributes: |
|
See also
sklearn.neighbors.KNeighborsClassifier
When used for text classification with tf-idf vectors, this classifier is also known as the Rocchio classifier.
Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99(10), 6567-6572. The National Academy of Sciences.
>>> from sklearn.neighbors.nearest_centroid import NearestCentroid >>> import numpy as np >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> y = np.array([1, 1, 1, 2, 2, 2]) >>> clf = NearestCentroid() >>> clf.fit(X, y) NearestCentroid(metric='euclidean', shrink_threshold=None) >>> print(clf.predict([[-0.8, -1]])) [1]
fit(X, y) | Fit the NearestCentroid model according to the given training data. |
get_params([deep]) | Get parameters for this estimator. |
predict(X) | Perform classification on an array of test vectors X. |
score(X, y[, sample_weight]) | Returns the mean accuracy on the given test data and labels. |
set_params(**params) | Set the parameters of this estimator. |
__init__(metric=’euclidean’, shrink_threshold=None) [source]
fit(X, y) [source]
Fit the NearestCentroid model according to the given training data.
| Parameters: |
|
|---|
get_params(deep=True) [source]
Get parameters for this estimator.
| Parameters: |
|
|---|---|
| Returns: |
|
predict(X) [source]
Perform classification on an array of test vectors X.
The predicted class C for each sample in X is returned.
| Parameters: |
|
|---|---|
| Returns: |
|
If the metric constructor parameter is “precomputed”, X is assumed to be the distance matrix between the data to be predicted and self.centroids_.
score(X, y, sample_weight=None) [source]
Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
| Parameters: |
|
|---|---|
| Returns: |
|
set_params(**params) [source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
| Returns: |
|
|---|
sklearn.neighbors.NearestCentroid
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestCentroid.html