Java library and command-line application for converting Scikit-Learn pipelines to PMML.
- Functionality:
- Three times more supported Python packages, transformers and estimators than all the competitors combined!
- Thorough collection, analysis and encoding of feature information:
- Names.
- Data and operational types.
- Valid, invalid and missing value spaces.
- Descriptive statistics.
- Pipeline extensions:
- Pruning.
- Decision engineering (prediction post-processing).
- Model verification.
- Conversion options.
- Extensibility:
- Rich Java APIs for developing custom converters.
- Automatic discovery and registration of custom converters based on
META-INF/sklearn2pmml.propertiesresource files. - Direct interfacing with other JPMML conversion libraries such as JPMML-H2O, JPMML-LightGBM, JPMML-StatsModels and JPMML-XGBoost.
- Production quality:
- Complete test coverage.
- Fully compliant with the JPMML-Evaluator library.
For a full list of supported transformer and estimator classes see the features.md file.
- Python 2.7, 3.4 or newer.
- Scikit-Learn 0.16.0 or newer. This is not a typo - all Scikit-Learn version from the past 10 years (2015 or newer) should work equally fine.
- Java 11 or newer.
Enter the project root directory and build using Apache Maven:
mvn clean installThe build produces a library JAR file pmml-sklearn/target/pmml-sklearn-1.9-SNAPSHOT.jar, and an executable uber-JAR file pmml-sklearn-example/target/pmml-sklearn-example-executable-1.9-SNAPSHOT.jar.
A typical workflow can be summarized as follows:
- Use Scikit-Learn to assemble and fit a pipeline.
- Serialize this pipeline in
pickledata format to a file in a local filesystem. - Use the JPMML-SkLearn command-line application to convert this pickle file to a PMML file.
Assembling and fitting a pipeline:
from sklearn.compose import ColumnTransformer
from sklearn.datasets import load_iris
#from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
iris_X, iris_y = load_iris(return_X_y = True, as_frame = True)
iris_X.columns = [col.rstrip(" (cm)") for col in iris_X.columns]
pipeline = Pipeline([
# Column-oriented feature engineering
("transformer", ColumnTransformer([
("scaler", StandardScaler(), [0, 1, 2, 3])
], remainder = "drop")),
# Table-oriented feature engineering
#("pca", PCA(n_components = 3)),
# Final model
("classifier", LogisticRegression())
])
pipeline.fit(iris_X, iris_y)Serializing the pipeline in Joblib-flavoured pickle data format:
import joblib
joblib.dump(pipeline, "pipeline.pkl")Please see the test script file main.py for more classification (binary and multi-class) and regression workflows.
Converting a pickle file to a PMML file:
java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.9-SNAPSHOT.jar --pkl-input pipeline.pkl --pmml-output pipeline.pmmlGetting help:
java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.9-SNAPSHOT.jar --helpIntegrations:
- Training Scikit-Learn GridSearchCV StatsModels pipelines
- Converting Scikit-Learn H2O.ai pipelines to PMML
- Converting customized Scikit-Learn estimators to PMML
- Training Scikit-Learn StatsModels pipelines
- Upgrading Scikit-Learn XGBoost pipelines
- Training Python-based XGBoost accelerated failure time models
- Converting Scikit-Learn PyCaret 3 pipelines to PMML
- Training Scikit-Learn H2O.ai pipelines
- One-hot encoding categorical features in Scikit-Learn XGBoost pipelines
- Training Scikit-Learn TF(-IDF) plus XGBoost pipelines
- Converting Scikit-Learn TF(-IDF) pipelines to PMML
- Converting Scikit-Learn Imbalanced-Learn pipelines to PMML
- Converting logistic regression models to PMML
- Stacking Scikit-Learn, LightGBM and XGBoost models
- Converting Scikit-Learn GridSearchCV pipelines to PMML
- Converting Scikit-Learn TPOT pipelines to PMML
- Converting Scikit-Learn LightGBM pipelines to PMML
Extensions:
- Extending Scikit-Learn with feature cross-references
- Extending Scikit-Learn with UDF expression transformer
- Extending Scikit-Learn with CHAID models
- Extending Scikit-Learn with prediction post-processing
- Extending Scikit-Learn with outlier detector transformer
- Extending Scikit-Learn with date and datetime features
- Extending Scikit-Learn with feature specifications
- Extending Scikit-Learn with GBDT+LR ensemble models
- Extending Scikit-Learn with business rules model
Miscellaneous:
- Upgrading Scikit-Learn decision tree models
- Measuring the memory consumption of Scikit-Learn models
- Benchmarking Scikit-Learn against JPMML-Evaluator
- Analyzing Scikit-Learn feature importances via PMML
Archived:
JPMML-SkLearn is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.
If you would like to use JPMML-SkLearn in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-SkLearn available under the terms and conditions of the BSD 3-Clause License instead.
JPMML-SkLearn is developed and maintained by Openscoring Ltd, Estonia.
Interested in using Java PMML API software in your company? Please contact [email protected]