class sklearn.neighbors.LocalOutlierFactor(n_neighbors=20, algorithm=’auto’, leaf_size=30, metric=’minkowski’, p=2, metric_params=None, contamination=’legacy’, novelty=False, n_jobs=None) [source]
Unsupervised Outlier Detection using Local Outlier Factor (LOF)
The anomaly score of each sample is called Local Outlier Factor. It measures the local deviation of density of a given sample with respect to its neighbors. It is local in that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood. More precisely, locality is given by k-nearest neighbors, whose distance is used to estimate the local density. By comparing the local density of a sample to the local densities of its neighbors, one can identify samples that have a substantially lower density than their neighbors. These are considered outliers.
| Parameters: | 
 | 
|---|---|
| Attributes: | 
 | 
| [1] | Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-based local outliers. In ACM sigmod record. | 
| fit(X[, y]) | Fit the model using X as training data. | 
| get_params([deep]) | Get parameters for this estimator. | 
| kneighbors([X, n_neighbors, return_distance]) | Finds the K-neighbors of a point. | 
| kneighbors_graph([X, n_neighbors, mode]) | Computes the (weighted) graph of k-Neighbors for points in X | 
| set_params(**params) | Set the parameters of this estimator. | 
__init__(n_neighbors=20, algorithm=’auto’, leaf_size=30, metric=’minkowski’, p=2, metric_params=None, contamination=’legacy’, novelty=False, n_jobs=None) [source]
decision_function Shifted opposite of the Local Outlier Factor of X.
Bigger is better, i.e. large values correspond to inliers.
The shift offset allows a zero threshold for being an outlier. Only available for novelty detection (when novelty is set to True). The argument X is supposed to contain new data: if X contains a point from training, it considers the later in its own neighborhood. Also, the samples in X are not considered in the neighborhood of any point.
| Parameters: | 
 | 
|---|---|
| Returns: | 
 | 
fit(X, y=None) [source]
Fit the model using X as training data.
| Parameters: | 
 | 
|---|---|
| Returns: | 
 | 
fit_predict “Fits the model to the training set X and returns the labels.
Label is 1 for an inlier and -1 for an outlier according to the LOF score and the contamination parameter.
| Parameters: | 
 | 
|---|---|
| Returns: | 
 | 
get_params(deep=True) [source]
Get parameters for this estimator.
| Parameters: | 
 | 
|---|---|
| Returns: | 
 | 
kneighbors(X=None, n_neighbors=None, return_distance=True) [source]
Finds the K-neighbors of a point. Returns indices of and distances to the neighbors of each point.
| Parameters: | 
 | 
|---|---|
| Returns: | 
 | 
In the following example, we construct a NeighborsClassifier class from an array representing our data set and ask who’s the closest point to [1,1,1]
>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]] >>> from sklearn.neighbors import NearestNeighbors >>> neigh = NearestNeighbors(n_neighbors=1) >>> neigh.fit(samples) NearestNeighbors(algorithm='auto', leaf_size=30, ...) >>> print(neigh.kneighbors([[1., 1., 1.]])) (array([[0.5]]), array([[2]]))
As you can see, it returns [[0.5]], and [[2]], which means that the element is at distance 0.5 and is the third element of samples (indexes start at 0). You can also query for multiple points:
>>> X = [[0., 1., 0.], [1., 0., 1.]]
>>> neigh.kneighbors(X, return_distance=False) 
array([[1],
       [2]]...)
 kneighbors_graph(X=None, n_neighbors=None, mode=’connectivity’) [source]
Computes the (weighted) graph of k-Neighbors for points in X
| Parameters: | 
 | 
|---|---|
| Returns: | 
 | 
>>> X = [[0], [3], [1]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=2)
>>> neigh.fit(X) 
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> A = neigh.kneighbors_graph(X)
>>> A.toarray()
array([[1., 0., 1.],
       [0., 1., 1.],
       [1., 0., 1.]])
 predict Predict the labels (1 inlier, -1 outlier) of X according to LOF.
This method allows to generalize prediction to new observations (not in the training set). Only available for novelty detection (when novelty is set to True).
| Parameters: | 
 | 
|---|---|
| Returns: | 
 | 
score_samples Opposite of the Local Outlier Factor of X.
It is the opposite as as bigger is better, i.e. large values correspond to inliers.
Only available for novelty detection (when novelty is set to True). The argument X is supposed to contain new data: if X contains a point from training, it considers the later in its own neighborhood. Also, the samples in X are not considered in the neighborhood of any point. The score_samples on training data is available by considering the the negative_outlier_factor_ attribute.
| Parameters: | 
 | 
|---|---|
| Returns: | 
 | 
set_params(**params) [source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
| Returns: | 
 | 
|---|
sklearn.neighbors.LocalOutlierFactor
    © 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
    http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html