Skeletonization unsupervised
In this notebook we use the script skeletons/main_unsupervised_skeleton_estimation.py to automatically extract the length of the organisms from clips. No pre-trained model is necessary for this module (i.e. unsupervised).
First, we need to set up some running parameters for the script to know where to find input images and where to write outputs.
[1]:
import yaml
from pathlib import Path
ROOT_DIR = Path("D:\mzb-workflow") #Path("/home/jovyan/work/mzb-workflow")
arguments = {
"config_file": ROOT_DIR / "configs/mzb_example_config.yaml",
"input_dir": ROOT_DIR / "data/mzb_example_data/derived/blobs",
"errors": ROOT_DIR / "data/mzb_example_data/derived/classification",
"output_dir": ROOT_DIR / "results/bgb/skeletons/skeletons_unsupervised",
"save_masks": ROOT_DIR / "data/bgb/skeletons/skeletons_unsupervised",
"list_of_files": None
}
with open(str(arguments["config_file"]), "r") as f:
cfg = yaml.load(f, Loader=yaml.FullLoader)
cfg["trcl_gpu_ids"] = None # this sets the number of available GPUs to zero, since this script doesn't benefit from GPU compute.
Convert to dictionary for Python script using the custom function cfg_to_arguments:
[2]:
from mzbsuite.utils import cfg_to_arguments
# Transforms configurations dicts to argparse arguments
args = cfg_to_arguments(arguments)
cfg = cfg_to_arguments(cfg)
print(str(cfg))
{'glob_random_seed': 222, 'glob_root_folder': '/home/jovyan/work/mzb-workflow/', 'glob_blobs_folder': '/home/jovyan/work/mzb-workflow/data/derived/blobs/', 'glob_local_format': 'pdf', 'model_logger': 'wandb', 'impa_image_format': 'jpg', 'impa_clip_areas': [2700, 4700, -1, -1], 'impa_area_threshold': 5000, 'impa_gaussian_blur': [21, 21], 'impa_gaussian_blur_passes': 3, 'impa_adaptive_threshold_block_size': 351, 'impa_mask_postprocess_kernel': [11, 11], 'impa_mask_postprocess_passes': 5, 'impa_bounding_box_buffer': 200, 'impa_save_clips_plus_features': True, 'lset_class_cut': 'order', 'lset_val_size': 0.1, 'trcl_learning_rate': 0.0001, 'trcl_batch_size': 8, 'trcl_weight_decay': 0, 'trcl_step_size_decay': 5, 'trcl_number_epochs': 75, 'trcl_save_topk': 1, 'trcl_num_classes': 8, 'trcl_model_pretrarch': 'convnext-small', 'trcl_num_workers': 16, 'trcl_wandb_project_name': 'mzb-classifiers', 'trcl_logger': 'wandb', 'trsk_learning_rate': 0.001, 'trsk_batch_size': 32, 'trsk_weight_decay': 0, 'trsk_step_size_decay': 25, 'trsk_number_epochs': 400, 'trsk_save_topk': 1, 'trsk_num_classes': 2, 'trsk_model_pretrarch': 'mit_b2', 'trsk_num_workers': 16, 'trsk_wandb_project_name': 'mzb-skeletons', 'trsk_logger': 'wandb', 'infe_model_ckpt': 'last', 'infe_num_classes': 8, 'infe_image_glob': '*_rgb.jpg', 'skel_class_exclude': 'errors', 'skel_conv_rate': 131.6625, 'skel_label_thickness': 3, 'skel_label_buffer_on_preds': 25, 'skel_label_clip_with_mask': False, 'trcl_gpu_ids': None}
[3]:
from scripts.skeletons.main_unsupervised_skeleton_estimation import main as unsupervised_skeletonization
?unsupervised_skeletonization
Signature: unsupervised_skeletonization(args, cfg)
Docstring:
Main function for skeleton estimation (body size) in the unsupervised setting.
Parameters
----------
args : argparse.Namespace
Arguments parsed from command line. Namely:
- config_file: path to the configuration file
- input_dir: path to the directory containing the masks
- output_dir: path to the directory where to save the results
- save_masks: path to the directory where to save the masks as jpg
- list_of_files: path to the csv file containing the classification predictions
- v (verbose): whether to print more info
cfg : argparse.Namespace
Arguments parsed from the configuration file.
Returns
-------
None. All is saved to disk at specified locations.
File: d:\mzb-workflow\scripts\skeletons\main_unsupervised_skeleton_estimation.py
Type: function
Load in clips, excluding those predicted to be error by the DL model.
🐛 BUG: the path to the folder with the clips classified as error is currently hardcoded in the script!
In a nutshell, it uses the configuration parameters provided before to apply a series of morphological operations on the binary mask of each organism’s clip, subsequently thinning it into segment(s), eventually connecting and calculating the longest path through them, thus producing the skeleton, which should approximate well the length of the organism.
[ ]:
unsupervised_skeletonization(args, cfg)