class sklearn.decomposition.LatentDirichletAllocation(n_components=10, doc_topic_prior=None, topic_word_prior=None, learning_method=’batch’, learning_decay=0.7, learning_offset=10.0, max_iter=10, batch_size=128, evaluate_every=-1, total_samples=1000000.0, perp_tol=0.1, mean_change_tol=0.001, max_doc_update_iter=100, n_jobs=None, verbose=0, random_state=None, n_topics=None) [source]
Latent Dirichlet Allocation with online variational Bayes algorithm
New in version 0.17.
Read more in the User Guide.
| Parameters: |
|
|---|---|
| Attributes: |
|
>>> from sklearn.decomposition import LatentDirichletAllocation
>>> from sklearn.datasets import make_multilabel_classification
>>> # This produces a feature matrix of token counts, similar to what
>>> # CountVectorizer would produce on text.
>>> X, _ = make_multilabel_classification(random_state=0)
>>> lda = LatentDirichletAllocation(n_components=5,
... random_state=0)
>>> lda.fit(X)
LatentDirichletAllocation(...)
>>> # get topics for some given samples:
>>> lda.transform(X[-2:])
array([[0.00360392, 0.25499205, 0.0036211 , 0.64236448, 0.09541846],
[0.15297572, 0.00362644, 0.44412786, 0.39568399, 0.003586 ]])
fit(X[, y]) | Learn model for the data X with variational Bayes method. |
fit_transform(X[, y]) | Fit to data, then transform it. |
get_params([deep]) | Get parameters for this estimator. |
partial_fit(X[, y]) | Online VB with Mini-Batch update. |
perplexity(X[, doc_topic_distr, sub_sampling]) | Calculate approximate perplexity for data X. |
score(X[, y]) | Calculate approximate log-likelihood as score. |
set_params(**params) | Set the parameters of this estimator. |
transform(X) | Transform data X according to the fitted model. |
__init__(n_components=10, doc_topic_prior=None, topic_word_prior=None, learning_method=’batch’, learning_decay=0.7, learning_offset=10.0, max_iter=10, batch_size=128, evaluate_every=-1, total_samples=1000000.0, perp_tol=0.1, mean_change_tol=0.001, max_doc_update_iter=100, n_jobs=None, verbose=0, random_state=None, n_topics=None) [source]
fit(X, y=None) [source]
Learn model for the data X with variational Bayes method.
When learning_method is ‘online’, use mini-batch update. Otherwise, use batch update.
| Parameters: |
|
|---|---|
| Returns: |
|
fit_transform(X, y=None, **fit_params) [source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
| Parameters: |
|
|---|---|
| Returns: |
|
get_params(deep=True) [source]
Get parameters for this estimator.
| Parameters: |
|
|---|---|
| Returns: |
|
partial_fit(X, y=None) [source]
Online VB with Mini-Batch update.
| Parameters: |
|
|---|---|
| Returns: |
|
perplexity(X, doc_topic_distr=’deprecated’, sub_sampling=False) [source]
Calculate approximate perplexity for data X.
Perplexity is defined as exp(-1. * log-likelihood per word)
Changed in version 0.19: doc_topic_distr argument has been deprecated and is ignored because user no longer has access to unnormalized distribution
| Parameters: |
|
|---|---|
| Returns: |
|
score(X, y=None) [source]
Calculate approximate log-likelihood as score.
| Parameters: |
|
|---|---|
| Returns: |
|
set_params(**params) [source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
| Returns: |
|
|---|
transform(X) [source]
Transform data X according to the fitted model.
Changed in version 0.18: doc_topic_distr is now normalized
| Parameters: |
|
|---|---|
| Returns: |
|
sklearn.decomposition.LatentDirichletAllocation
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html