Skip to content

Java library and command-line application for converting Scikit-Learn pipelines to PMML

License

jpmml/jpmml-sklearn

Repository files navigation

JPMML-SkLearn Build Status

Java library and command-line application for converting Scikit-Learn pipelines to PMML.

Table of Contents

Features

Overview

  • Functionality:
    • Three times more supported Python packages, transformers and estimators than all the competitors combined!
    • Thorough collection, analysis and encoding of feature information:
      • Names.
      • Data and operational types.
      • Valid, invalid and missing value spaces.
      • Descriptive statistics.
    • Pipeline extensions:
      • Pruning.
      • Decision engineering (prediction post-processing).
      • Model verification.
    • Conversion options.
  • Extensibility:
    • Rich Java APIs for developing custom converters.
    • Automatic discovery and registration of custom converters based on META-INF/sklearn2pmml.properties resource files.
    • Direct interfacing with other JPMML conversion libraries such as JPMML-H2O, JPMML-LightGBM, JPMML-StatsModels and JPMML-XGBoost.
  • Production quality:
    • Complete test coverage.
    • Fully compliant with the JPMML-Evaluator library.

Supported packages

For a full list of supported transformer and estimator classes see the features.md file.

Prerequisites

The Python side of operations

  • Python 2.7, 3.4 or newer.
  • Scikit-Learn 0.16.0 or newer. This is not a typo - all Scikit-Learn version from the past 10 years (2015 or newer) should work equally fine.

The JPMML-SkLearn side of operations

  • Java 11 or newer.

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces a library JAR file pmml-sklearn/target/pmml-sklearn-1.9-SNAPSHOT.jar, and an executable uber-JAR file pmml-sklearn-example/target/pmml-sklearn-example-executable-1.9-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

  1. Use Scikit-Learn to assemble and fit a pipeline.
  2. Serialize this pipeline in pickle data format to a file in a local filesystem.
  3. Use the JPMML-SkLearn command-line application to convert this pickle file to a PMML file.

The Python side of operations

Assembling and fitting a pipeline:

from sklearn.compose import ColumnTransformer
from sklearn.datasets import load_iris
#from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

iris_X, iris_y = load_iris(return_X_y = True, as_frame = True)
iris_X.columns = [col.rstrip(" (cm)") for col in iris_X.columns]

pipeline = Pipeline([
    # Column-oriented feature engineering
    ("transformer", ColumnTransformer([
        ("scaler", StandardScaler(), [0, 1, 2, 3])
    ], remainder = "drop")),
    # Table-oriented feature engineering
    #("pca", PCA(n_components = 3)),
    # Final model
    ("classifier", LogisticRegression())
])
pipeline.fit(iris_X, iris_y)

Serializing the pipeline in Joblib-flavoured pickle data format:

import joblib

joblib.dump(pipeline, "pipeline.pkl")

Please see the test script file main.py for more classification (binary and multi-class) and regression workflows.

The JPMML-SkLearn side of operations

Converting a pickle file to a PMML file:

java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.9-SNAPSHOT.jar --pkl-input pipeline.pkl --pmml-output pipeline.pmml

Getting help:

java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.9-SNAPSHOT.jar --help

Documentation

Integrations:

Extensions:

Miscellaneous:

Archived:

License

JPMML-SkLearn is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use JPMML-SkLearn in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-SkLearn available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

JPMML-SkLearn is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact [email protected]

About

Java library and command-line application for converting Scikit-Learn pipelines to PMML

Resources

License

Stars

Watchers

Forks

Packages

No packages published