whisk is a ML project framework that makes collaboration, reproducibility, and deployment “just work”. Let’s go through a few concepts that are important to understanding how whisk makes this happen.
The directory is also a Git repository, pre-initialized with a first commit that contains the directory structure. This means you can keep track of changes over time and/or push this code to a version control hosting system such as GitHub.
To check out a few examples of projects that were created with whisk, head over to the whisk GitHub org.
A Python3 virtual environment is created as well when you run
whisk create <project_name>. This is a virtual version of a Python environment that contains specific (AKA explicit) versions of packages that are being used. All of these packages and their versions are listed in
This environment will follow your project wherever it goes. If someone else decides to build off of your project and make changes, the environment will be identical for them as well.
Your whisk project will contain a variety of different sources of code. You may start your analysis or modeling inside a Jupyter Notebook within the
notebooks/ directory. When you are ready to collaborate or deploy your model, the needed code should be moved into the
src/<project_name>/ directory. This directory contains all of the code that will be packaged alongside your model or deployed as a web app.
Raw training data should be version-controlled alongside the project code to ensure your experiments are reproducible. Place training data within the
data/ directory of your project. You can access the location of this directory via
Once you land on a version of your model that performs well, you’ll want to save the model to disk with a library like pickle.
These should be stored in
src/<project_name>/artifacts/ and their location can be referenced throughout your project with
<project_name>.artifact_dir. These artifacts are automatically included in your model’s Python package.
Your model can be released as both a Python Package and deployed to Heroku as a web service.
As you are developing your project and placing code in the
src/<project_name>/ directory, whisk automatically converts your project into a functioning Python package. In the same way that you might
import pandas at the beginning of an analysis, you can
import <project_name> within your project and access functionality that was developed with your model. For example, your model is accessible with
from <project_name>.models.model import Model.
The purpose of keeping the code packaged and available is to encourage easy collaboration. If you are ready to share your project with others, its easy to run
whisk package dist so that others can also import and use the functionality from your project.
The model that you developed may be helpful to others as a standalone web service. This will allow yourself or other developers to access your model as service through an API endpoint. To use this API, the developer will send a set of model inputs and will receive your model’s predictions in return.
To make project development easier, whisk contains a few helper functions. These functions contain commonly-used methods that speed up development of your model. For example:
import <project_name> <project_name>.data_dir # location of your stored data at `data/`. <project_name>.artifact_dir # location of the artifacts (ie trained models saved to disk) at `src/<project_name>/artifacts`
This is not to be confused with the commands that sit alongside your packaged model code. That will be referenced with your project name. For example:
from <project_name>.models.model import Model model_object = Model()