In this section, we will introduce how to utilize carefree-learn to solve computer vision tasks.

What differs computer vision tasks to other deep learning tasks most is that, most of the dataset could be interpreted as 'Image Folder Dataset' (abbr: IFD). In this case, the source images (which will be the input of our models) will be stored in certain folder structure, and the labels (which will be the target of our models) can be represented by the hierarchy of each image file.

Therefore, carefree-learn introduces ImageFolderData as the unified data api for computer vision tasks. We will first introduce how different tasks could be represented as IFD, and will then introduce how to construct ImageFolderData in the next section.


Since generation tasks (usually) don't require labels, any image folder will be IFD itself.


There are many ways to construct classification tasks as IFD:

  • Specify labels with the sub-folders' names.
  • Specify labels with a .csv file, in which each row contains a file name and its corresponding label.
|--- data
|-- dog
|-- 0.png
|-- 1.png
|-- cat
|-- 0.png
|-- 1.png


The simplest way to construct segmentation tasks as IFD is to mimic the image folder structure and replace .png (image file) with .npy (mask file).

|--- data
|-- images
|-- 0.png
|-- 1.png
|-- masks
|-- 0.npy
|-- 1.npy


carefree-learn provides a convenient API (see prepare_image_folder_data) to construct ImageFolderData. But before we dive into details, it's necessary to know how carefree-learn organizes its IFD and how does it convert an IFD to the final ImageFolderData.

Design Principles

Since every task may have its own image folder structure, it will be very difficult to design a unified API to cover all the situations. carefree-learn therefore designs its own IFD pattern, and implements prepare_image_folder to convert other image folder structure to this pattern:


Unified IFD

In carefree-learn, we will expect the unified IFD to be as follows:

|--- data
|-- train
|-- xxx.png
|-- xxx.png
|-- labels.json
|-- path_mapping.json
|-- valid
|-- xxx.png
|-- xxx.png
|-- labels.json
|-- path_mapping.json
|-- idx2labels.json
|-- labels2idx.json
  • The train folder represents all data used in training.

    • We will call it the 'train split' in the future.
  • The valid folder represents all data used in valitation.

    • We will call it the 'valid split' in the future.
  • The labels.json in each split represents the label information:

    "/full/path/to/first.png": xxx,
    "/full/path/to/second.png": xxx,

    The keys will be the absolute paths of the images, and the values will be the corresponding labels.

    • If the labels are strings and end with .npy, we will load the corresponding np.ndarray.
    • Other wise, the labels should be integers / floats and will be kept as-is.
  • The path_mapping.json in each split represents the path mapping:

    "/full/path/to/first.png": "relative/path/to/original/first.png",
    "/full/path/to/second.png": "relative/path/to/original/second.png",

    This means that the IFD in carefree-learn should be a copy of the original IFD, because we want to keep an individual version of each IFD.

  • The idx2labels.json represents the mapping from indices to original labels.

    • This is useful iff we are solving classification tasks and the original labels are strings.
  • The labels2idx.json represents the mapping from original labels to indices.

    • This is useful iff we are solving classification tasks and the original labels are strings.


def prepare_image_folder(
src_folder: str,
tgt_folder: str,
to_index: bool,
prefix: Optional[str] = None,
preparation: _PreparationProtocol = DefaultPreparation(),
force_rerun: bool = False,
extensions: Optional[Set[str]] = None,
make_labels_in_parallel: bool = False,
num_jobs: int = 8,
train_all_data: bool = False,
valid_split: Union[int, float] = 0.1,
max_num_valid: int = 10000,
lmdb_config: Optional[Dict[str, Any]] = None,
use_tqdm: bool = True,
) -> str:
  • src_folder
    • Path of the original IFD.
  • tgt_folder
    • Specify the path where we want to place our unified IFD.
  • to_index
    • Specify whether should we turn the original labels to indices.
    • This is useful iff we are solving classification tasks and the original labels are strings.
  • prefix [default = None]
    • Specify the prefix of src_folder.
    • Sometimes this is useful when we need to customize our own _PreparationProtocol.
    • See hierarchy section for more details.
  • preparation [default = DefaultPreparation()]
    • Specify the core logic of how to convert the original IFD to our unified IFD.
    • See _PreparationProtocol section for more details.
  • force_rerun [default = False]
    • Specify whether should we force rerunning the whole prepare procedure.
    • If False and caches are available, prepare_image_folder will be a no-op.
  • extensions [default = {".jpg", ".png"}]
    • Specify the extensions of our target image files.
  • make_labels_in_parallel [default = False]
    • Whether should we make labels in parallel.
    • This will be very useful if making labels from the original IFD is time consuming.
  • num_jobs [default = 8]
    • Specify the number of jobs when we are:
      • making labels in parallel.
      • making a copy of the original IFD to construct the unified IFD.
    • If 0, then no parallelism will be used.
  • train_all_data [default = False]
    • Specify whether should we use all available data as train split.
    • Basically this means we will use train set + validation set to train our model, while the validation set will remain the same.
  • valid_split [default = 0.1]
    • Specify the number of samples in validation set.
    • If float, it will represent the portion.
    • If int, it will represent the exact number.
    • Notice that the outcome of this argument will be effected by max_num_valid.
  • max_num_valid [default = 10000]
    • Specify the maximum number of samples in validation set.
  • lmdb_config [default = None]
    • Specify the configurations for lmdb.
    • If not provided, lmdb will not be utilized.
  • use_tqdm [default = True]
    • Specify whether should we use tqdm progress bar to monitor the preparation progress.


In order to provide a convenient API to implement the core logic of converting the original IFD to our unified IFD, carefree-learn implemented _PreparationProtocol and exposed some methods for users to override. By default, carefree-learn will use DefaultPreparation which can handle some general cases:

class DefaultPreparation(_PreparationProtocol):
def extra_labels(self) -> Optional[List[str]]:
return None
def filter(self, hierarchy: List[str]) -> bool:
return True
def get_label(self, hierarchy: List[str]) -> Any:
return 0
def get_extra_label(self, label_name: str, hierarchy: List[str]) -> Any:
def copy(self, src_path: str, tgt_path: str) -> None:
shutil.copyfile(src_path, tgt_path)

For specific cases, we can override one or more methods as shown above to customize the behaviours. We will first introduce these methods in details, and then will provide some examples on how to use it.

  • extra_labels
    • This property specifies the extra labels required by current task.
    • Usually we can safely leave it as None, unless we need to use multiple labels in one sample.
    • See extra_labels example for more details.
  • filter
    • This method is used to filter out which images do we want to copy from the original IFD.
    • Will be useful when:
      • the original IFD contains some 'dirty' images (truncated, broken, etc.).
      • we only want to play with a specific portion of the original IFD.
    • See hierarchy section for detailed definition of the hierarchy argument.
    • See filter example for more details.
  • get_label
    • This method is used to define the label of each input image.
    • If returning string and ends with .npy, it should represent an np.ndarray path, which will be loaded when constructing the input sample.
    • If returning other strings, they will be converted to indices based on labels2idx.json.
    • If returning integer / float, they will be kept as-is.
    • See hierarchy section for detailed definition of the hierarchy argument.
    • See get_label example for more details.
  • get_extra_label
    • This method is used to define the extra label(s) of each input image.
    • If returning string and ends with .npy, it should represent an np.ndarray path, which will be loaded when constructing the input sample.
    • If returning other strings, they will be converted to indices based on labels2idx.json.
    • If returning integer / float, they will be kept as-is.
    • See hierarchy section for detailed definition of the hierarchy argument.
    • See extra_labels example for more details.
  • copy
    • This method is used to copy images from the original IFD to our unified IFD.
    • Will be useful if we want to pre-check the quality of each image, because if this method raises an error, the corresponding image will be filtered out from the unified IFD.
    • See copy example for more details.


A hierarchy in _PreparationProtocol is a list of string, representing the file hierarchy. For example, if the original IFD looks as follows:

|--- data
|-- dog
|-- 0.png
|-- 1.png
|-- cat
|-- 0.png
|-- 1.png

Then the hierarchy of data/dog/0.png will be:

["data", "dog", "0.png"]

However, sometimes the original IFD may locate on shared spaces, which means it will be difficult to get the relative path:

|--- home
|-- shared
|-- data
|-- dog
|-- 0.png
|-- 1.png
|-- cat
|-- 0.png
|-- 1.png
|-- you
|-- codes

In this case, we can specify the prefix argument in prepare_image_folder:


Here, src_folder is set to "data" and prefix is set to "/home/shared", which means:

  • We will use "/home/shared/data" as the final src_folder.
  • The hierarchy will strip out "/home/shared", which means the hierarchy of /home/shared/data/dog/0.png will still be:
["data", "dog", "0.png"]

This mechanism can guarantee that the same _PreparationProtocol can be used across different environment (with only prefix modified), as long as the original IFD has not changed.



This section focuses on how to construct a unified IFD. For how to construct a DLDataModule from a unifed IFD, please refer to the ImageFolderData section.

extra_labels example

Suppose the original IFD looks as follows:

|--- data
|-- labels.csv
|-- images
|-- 0.png
|-- 1.png
|-- 2.png
|-- 3.png



Then the _PreparationProtocol could be defined as:

import cflearn
class MultiLabelsPreparation(cflearn.DefaultPreparation):
def __init__(self):
with open("data/labels.csv", "r") as f:
header = f.readline().strip().split(",")
self.sub_class_name = header[2]
self.classes, self.sub_classes = {}, {}
for line in f:
file, main_class, sub_class = line.strip().split(",")
self.classes[file] = main_class
self.sub_classes[file] = sub_class
def extra_labels(self):
return [self.sub_class_name]
def get_label(self, hierarchy):
return self.classes[hierarchy[-1]]
def get_extra_label(self, label_name, hierarchy):
if label_name == self.sub_class_name:
return self.sub_classes[hierarchy[-1]]
raise NotImplementedError(f"'{label_name}' is not recognized")

After executing:

preparation = MultiLabelsPreparation()

We will get the following unified IFD:

|--- data
|--- prepared
|-- train
|-- 0.png
|-- 3.png
|-- labels.json
|-- sub_class_labels.json
|-- valid
|-- 1.png
|-- 2.png
|-- labels.json
|-- sub_class_labels.json
|-- idx2labels.json
|-- labels2idx.json
|-- idx2sub_class.json
|-- sub_class2idx.json

The highlighted lines show the main differences when extra_labels mechanism is applied.

filter example

Suppose the original IFD looks as follows:

|--- home
|-- shared
|-- data
|-- dog
|-- 0.png
|-- 0_dummy.png
|-- 1.png
|-- 1_dummy.png
|-- cat
|-- 0.png
|-- 1.png
|-- 1_dummy.png

And we don't want those image files that end with dummy to be in our unified IFD. Then the _PreparationProtocol could be defined as:

import os
import cflearn
class FilterPreparation(cflearn.DefaultPreparation):
def filter(self, hierarchy):
name = os.path.splitext(hierarchy[-1])[0]
return not name.endswith("_dummy")
get_label example

Suppose the original IFD looks as follows:

|--- home
|-- shared
|-- data
|-- 0.png
|-- 1.png
|-- 2.png

And the images are RGBA images, whose alpha channel will be our segmentation mask (label). Then the _PreparationProtocol could be defined as:

import os
import cflearn
import numpy as np
from PIL import Image
class ExtractAlphaPreparation(cflearn.DefaultPreparation):
def __init__(self, prefix, labels_folder):
self.prefix = prefix
if prefix is None:
self.labels_folder = labels_folder
self.labels_folder = os.path.join(prefix, labels_folder)
os.makedirs(self.labels_folder, exist_ok=True)
def get_label(self, hierarchy):
if self.prefix is not None:
hierarchy = [self.prefix] + hierarchy
img =*hierarchy))
alpha = np.array(img)[..., -1]
name = os.path.splitext(hierarchy[-1])[0]
alpha_path = os.path.join(self.labels_folder, f"{name}.npy"), alpha)
return alpha_path
def copy(self, src_path, tgt_path):
img ="RGB")

After executing:

prefix = "/home/shared"
preparation = ExtractAlphaPreparation(prefix, "labels")

We will get the following unified IFD:

|--- data
|--- labels
|-- 0.npy
|-- 1.npy
|-- 2.npy
|-- 3.npy
|--- prepared
|-- train
|-- 1.png
|-- 2.png
|-- 3.png
|-- labels.json
|-- valid
|-- 0.png
|-- labels.json
  • A labels folder will be created to store the extracted alpha mask.
  • Neither idx2labels.json nor labels2idx.json will be generated, because all labels are .npy files.
copy example

The most common use case of overriding copy method is to pre-verify the original images:

import shutil
import cflearn
from PIL import Image
class VerifyPreparation(cflearn.DefaultPreparation):
def copy(self, src_path, tgt_path):
shutil.copyfile(src_path, tgt_path)



In this section, we will use loader to represent DataLoader from PyTorch.

After the unified IFD is ready, constructing ImageFolderData will be fairly straightforward:

class CVDataModule(DLDataModule, metaclass=ABCMeta):
test_transform: Optional[Transforms]
class ImageFolderData(CVDataModule):
def __init__(
folder: str,
batch_size: int,
num_workers: int = 0,
shuffle: bool = True,
drop_train_last: bool = True,
prefetch_device: Optional[Union[int, str]] = None,
pin_memory_device: Optional[Union[int, str]] = None,
extra_label_names: Optional[List[str]] = None,
transform: Optional[Union[str, List[str], Transforms, Callable]] = None,
transform_config: Optional[Dict[str, Any]] = None,
test_shuffle: Optional[bool] = None,
test_transform: Optional[Union[str, List[str], Transforms, Callable]] = None,
test_transform_config: Optional[Dict[str, Any]] = None,
lmdb_config: Optional[Dict[str, Any]] = None,
  • folder
    • Specify the path to the unified IFD.
  • batch_size
    • Specify the number of samples in each batch.
  • num_workers [default = 0]
    • Argument used in loader.
  • shuffle [default = True]
    • Argument used in loader.
  • drop_train_last [default = True]
    • Whether should we apply drop_last in loader in training set.
    • Notice that for validation set, drop_last will always be False.
  • prefetch_device [default = None]
    • If not specified, the prefetch mechanism will not be applied.
    • If specified, carefree-learn will 'prefetch' each batch to the corresponding device.
  • pin_memory_device [default = None]
    • If not specified, the pin_memory mechanism will not be applied.
    • If specified, carefree-learn will use pin_memory in loader to the corresponding device.
  • extra_label_names [default = None]
  • transform [default = None]
    • Specify the transform we would like to apply to the original batch.
    • See Transforms section for more details.
  • transform_config [default = None]
    • Specify the configuration of transform.
  • test_shuffle [default = None]
    • Argument used in loader in test set.
    • If not specified, it will be the same as shuffle.
  • test_transform [default = None]
    • Specify the transform we would like to apply to the original batch.
    • If not specified, it will be the same as transform.
    • See Transforms section for more details.
  • test_transform_config [default = None]
    • Specify the configuration of test_transform.
    • If not specified, it will be the same as transform_config.
  • lmdb_config [default = None]
    • Specify the configurations for lmdb.
    • If not provided, lmdb will not be utilized.
    • Should be the same as lmdb_config used in prepare_image_folder.


To make things easier, carefree-learn provides prepare_image_folder_data API to directly construct a ImageFolderData from the original IFD:

def prepare_image_folder_data(
src_folder: str,
tgt_folder: str,
to_index: bool,
batch_size: int,
prefix: Optional[str] = None,
preparation: _PreparationProtocol = DefaultPreparation(),
num_workers: int = 0,
shuffle: bool = True,
drop_train_last: bool = True,
prefetch_device: Optional[Union[int, str]] = None,
pin_memory_device: Optional[Union[int, str]] = None,
transform: Optional[Union[str, List[str], Transforms, Callable]] = None,
transform_config: Optional[Dict[str, Any]] = None,
test_shuffle: Optional[bool] = None,
test_transform: Optional[Union[str, List[str], Transforms, Callable]] = None,
test_transform_config: Optional[Dict[str, Any]] = None,
train_all_data: bool = False,
force_rerun: bool = False,
extensions: Optional[Set[str]] = None,
make_labels_in_parallel: bool = False,
num_jobs: int = 8,
valid_split: Union[int, float] = 0.1,
max_num_valid: int = 10000,
lmdb_config: Optional[Dict[str, Any]] = None,
use_tqdm: bool = True,
) -> PrepareResults:
tgt_folder = prepare_image_folder(
data = ImageFolderData(
# `PrepareResults` is a `NamedTuple`
return PrepareResults(data, tgt_folder)


Source code: transforms

Data augmentation plays an important role in Computer Vision. In carefree-learn, we provided three kinds of transforms to apply data augmentations:

These transforms are managed under the register mechanism, so we can access them by their names:

import cflearn
data = cflearn.prepare_image_folder_data(..., transform="to_tensor")

Where to_tensor transform is defined as follows:

class ToTensor(NoBatchTransforms):
fn = transforms.ToTensor()
