LogisticRegressionModel(weights, intercept, …)
LogisticRegressionModel
Classification model trained using Multinomial/Binary Logistic Regression.
LogisticRegressionWithSGD
Train a classification model for Binary Logistic Regression using Stochastic Gradient Descent.
LogisticRegressionWithLBFGS
Train a classification model for Multinomial/Binary Logistic Regression using Limited-memory BFGS.
SVMModel(weights, intercept)
SVMModel
Model for Support Vector Machines (SVMs).
SVMWithSGD
Train a Support Vector Machine (SVM) using Stochastic Gradient Descent.
NaiveBayesModel(labels, pi, theta)
NaiveBayesModel
Model for Naive Bayes classifiers.
NaiveBayes
Train a Multinomial Naive Bayes model.
StreamingLogisticRegressionWithSGD([…])
StreamingLogisticRegressionWithSGD
Train or predict a logistic regression model on streaming data.
BisectingKMeansModel(java_model)
BisectingKMeansModel
A clustering model derived from the bisecting k-means method.
BisectingKMeans
A bisecting k-means algorithm based on the paper “A comparison of document clustering techniques” by Steinbach, Karypis, and Kumar, with modification to fit Spark.
KMeansModel(centers)
KMeansModel
A clustering model derived from the k-means method.
KMeans
K-means clustering.
GaussianMixtureModel(java_model)
GaussianMixtureModel
A clustering model derived from the Gaussian Mixture Model method.
GaussianMixture
Learning algorithm for Gaussian Mixtures using the expectation-maximization algorithm.
PowerIterationClusteringModel(java_model)
PowerIterationClusteringModel
Model produced by PowerIterationClustering.
PowerIterationClustering
Power Iteration Clustering (PIC), a scalable graph clustering algorithm.
StreamingKMeans([k, decayFactor, timeUnit])
StreamingKMeans
Provides methods to set k, decayFactor, timeUnit to configure the KMeans algorithm for fitting and predicting on incoming dstreams.
StreamingKMeansModel(clusterCenters, …)
StreamingKMeansModel
Clustering model which can perform an online update of the centroids.
LDA
Train Latent Dirichlet Allocation (LDA) model.
LDAModel(java_model)
LDAModel
A clustering model derived from the LDA method.
BinaryClassificationMetrics(scoreAndLabels)
BinaryClassificationMetrics
Evaluator for binary classification.
RegressionMetrics(predictionAndObservations)
RegressionMetrics
Evaluator for regression.
MulticlassMetrics(predictionAndLabels)
MulticlassMetrics
Evaluator for multiclass classification.
RankingMetrics(predictionAndLabels)
RankingMetrics
Evaluator for ranking algorithms.
Normalizer([p])
Normalizer
Normalizes samples individually to unit Lp norm
StandardScalerModel(java_model)
StandardScalerModel
Represents a StandardScaler model that can transform vectors.
StandardScaler([withMean, withStd])
StandardScaler
Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.
HashingTF([numFeatures])
HashingTF
Maps a sequence of terms to their term frequencies using the hashing trick.
IDFModel(java_model)
IDFModel
Represents an IDF model that can transform term frequency vectors.
IDF([minDocFreq])
IDF
Inverse document frequency (IDF).
Word2Vec()
Word2Vec
Word2Vec creates vector representation of words in a text corpus.
Word2VecModel(java_model)
Word2VecModel
class for Word2Vec model
ChiSqSelector([numTopFeatures, …])
ChiSqSelector
Creates a ChiSquared feature selector.
ChiSqSelectorModel(java_model)
ChiSqSelectorModel
Represents a Chi Squared selector model.
ElementwiseProduct(scalingVector)
ElementwiseProduct
Scales each column of the vector, with the supplied weight vector.
FPGrowth
A Parallel FP-growth algorithm to mine frequent itemsets.
FPGrowthModel(java_model)
FPGrowthModel
A FP-Growth model for mining frequent itemsets using the Parallel FP-Growth algorithm.
PrefixSpan
A parallel PrefixSpan algorithm to mine frequent sequential patterns.
PrefixSpanModel(java_model)
PrefixSpanModel
Model fitted by PrefixSpan
Vector
DenseVector(ar)
DenseVector
A dense vector represented by a value array.
SparseVector(size, *args)
SparseVector
A simple sparse vector class for passing data to MLlib.
Vectors
Factory methods for working with vectors.
Matrix(numRows, numCols[, isTransposed])
Matrix
DenseMatrix(numRows, numCols, values[, …])
DenseMatrix
Column-major dense matrix.
SparseMatrix(numRows, numCols, colPtrs, …)
SparseMatrix
Sparse Matrix stored in CSC format.
Matrices
QRDecomposition(Q, R)
QRDecomposition
Represents QR factors.
BlockMatrix(blocks, rowsPerBlock, colsPerBlock)
BlockMatrix
Represents a distributed matrix in blocks of local matrices.
CoordinateMatrix(entries[, numRows, numCols])
CoordinateMatrix
Represents a matrix in coordinate format.
DistributedMatrix
Represents a distributively stored matrix backed by one or more RDDs.
IndexedRow(index, vector)
IndexedRow
Represents a row of an IndexedRowMatrix.
IndexedRowMatrix(rows[, numRows, numCols])
IndexedRowMatrix
Represents a row-oriented distributed Matrix with indexed rows.
MatrixEntry(i, j, value)
MatrixEntry
Represents an entry of a CoordinateMatrix.
RowMatrix(rows[, numRows, numCols])
RowMatrix
Represents a row-oriented distributed Matrix with no meaningful row indices.
SingularValueDecomposition(java_model)
SingularValueDecomposition
Represents singular value decomposition (SVD) factors.
RandomRDDs
Generator methods for creating RDDs comprised of i.i.d samples from some distribution.
MatrixFactorizationModel(java_model)
MatrixFactorizationModel
A matrix factorisation model trained by regularized alternating least-squares.
ALS
Alternating Least Squares matrix factorization
Rating
Represents a (user, product, rating) tuple.
LabeledPoint(label, features)
LabeledPoint
Class that represents the features and labels of a data point.
LinearModel(weights, intercept)
LinearModel
A linear model that has a vector of coefficients and an intercept.
LinearRegressionModel(weights, intercept)
LinearRegressionModel
A linear regression model derived from a least-squares fit.
LinearRegressionWithSGD
Train a linear regression model with no regularization using Stochastic Gradient Descent.
RidgeRegressionModel(weights, intercept)
RidgeRegressionModel
A linear regression model derived from a least-squares fit with an l_2 penalty term.
RidgeRegressionWithSGD
Train a regression model with L2-regularization using Stochastic Gradient Descent.
LassoModel(weights, intercept)
LassoModel
A linear regression model derived from a least-squares fit with an l_1 penalty term.
LassoWithSGD
Train a regression model with L1-regularization using Stochastic Gradient Descent.
IsotonicRegressionModel(boundaries, …)
IsotonicRegressionModel
Regression model for isotonic regression.
IsotonicRegression
Isotonic regression.
StreamingLinearAlgorithm(model)
StreamingLinearAlgorithm
Base class that has to be inherited by any StreamingLinearAlgorithm.
StreamingLinearRegressionWithSGD([stepSize, …])
StreamingLinearRegressionWithSGD
Train or predict a linear regression model on streaming data.
Statistics
MultivariateStatisticalSummary(java_model)
MultivariateStatisticalSummary
Trait for multivariate statistical summary of a data matrix.
ChiSqTestResult(java_model)
ChiSqTestResult
Contains test results for the chi-squared hypothesis test.
MultivariateGaussian
Represents a (mu, sigma) tuple
KernelDensity()
KernelDensity
Estimate probability density at required points given an RDD of samples from the population.
KolmogorovSmirnovTestResult(java_model)
KolmogorovSmirnovTestResult
Contains test results for the Kolmogorov-Smirnov test.
DecisionTreeModel(java_model)
DecisionTreeModel
A decision tree model for classification or regression.
DecisionTree
Learning algorithm for a decision tree model for classification or regression.
RandomForestModel(java_model)
RandomForestModel
Represents a random forest model.
RandomForest
Learning algorithm for a random forest model for classification or regression.
GradientBoostedTreesModel(java_model)
GradientBoostedTreesModel
Represents a gradient-boosted tree model.
GradientBoostedTrees
Learning algorithm for a gradient boosted trees model for classification or regression.
JavaLoader
Mixin for classes which can load saved models using its Scala implementation.
JavaSaveable
Mixin for models that provide save() through their Scala implementation.
LinearDataGenerator
Utils for generating linear data.
Loader
Mixin for classes which can load saved models from files.
MLUtils
Helper methods to load, save and pre-process data used in MLlib.
Saveable
Mixin for models and transformers which may be saved as files.