Using FDS, an open-source tool, to version control your machine learning project fast & easy

FDS, Fast Data Science, is an open-source tool that makes version control for machine learning fast & easy. It combines Git and DVC under one roof, and takes care of code, data, and model versioning.

FDS will help you:

  • Avoid mistakes, by recommending where each file should be tracked, using a smart version control wizard 🧙‍♂️.
  • Automate repetitive tasks, by unifying commands (e.g. git status + dvc status = fds status)
  • Make version control faster, easier to use, and more friendly, by providing a human-centric UX — want to save a new version and push it to your shared remote…


Exploring data science hypotheses using Git Flow

Data Science is a research-driven field, and exploring many solutions to a problem is a core principle. When a project evolves and grows in complexity, we need to compare results and see what approaches are more promising than others. In this process, we need to ensure we don’t lose track of the project’s components or miss out on critical information. Moreover, we need to have the ability to reproduce results and manage past experiments so as not to waste time exploring the same hypothesis twice. For this reason, it's necessary to use a structured workflow to explore new experiments.

In…


How many times have you received a raw dataset and conduct the same action to pre-process it? Copy and passed code from different projects and re-used it? For this sake, the ‘NBprocessing’ python package was created. It provides many methods, under a variety of classes, that enable the user to explore the dataset, pre-processes it, and finally plot insights.

This blog-post will go over the following subjects:

  • What is the importance of preprocessing?
  • Package installation
  • Package libraries and utilities
  • Selected usage examples

To see the full usage documentation, click here

What is the importance of preprocessing?


In this blog post, I will explore how to perform transfer learning on a CNN image recognition (VGG-19) model using ‘Google Colab’. The model includes binary classification and multi-class classification of leaf images.

In parts I and II I’ve shown how to connect ‘Google Colab’ to drive, clone the GitHub repository to it and load the database to run time. We performed preprocessing actions on the database. create a model based on the VGG-19 model, trained it and also fine-tuned the wights. last, we saved the model, loaded it and preformed a prediction.

In this section, I will show how…


In this blog post, I will explore how to perform transfer learning on a CNN image recognition (VGG-19) model using ‘Google Colab’. The model includes binary classification and multi-class classification of leaf images.

In part I I’ve shown how to connect between ‘Google Colab’ notebook and ‘Google Drive’, clone git repository to it, load the database to runtime, and preprocess the database.

In part II I will show how to create a binary model using VGG-19 as the base model, train, save, and load the new model and finally make a prediction. …


In this blog post, I will explore how to perform transfer learning on a CNN image recognition (VGG-19) model using ‘Google Colab’. The model includes binary classification and multi-class classification of leaf images.

What Is ‘Google Colab’

‘Google Colab’ research project created to help machine learning education and research. It’s a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud. The main advantage of ‘Google Colab’ is that it provides FREE GPU usage which allows the user to train very complex models in a fast and easy way.

NirBarazida

Data Scientist at DAGsHub leading the advocacy and outreach activity worldwide. We are building the next GitHub for Data Science.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store