A Deep Dive Into Machine Learning Life Cycle

Machine Learning (ML) stands at the forefront, revolutionizing the way we perceive and interact with data. The journey of conceptualizing and implementing an ML model is complex and has different phases. The following table delves into the various stages of the ML Life cycle.

ML Phase
Description

Planning
The planning phase involves scope, success metric, cost and feasibility of the application needs to be defined. The cost-benefit analysis and the clear and measurable definition of successful business metrics which will include the model accuracy (F1 score, AUC) need to be defined.

Once the scope and other parameters are defined, the availability of data and its source, and the legal implications are also defined. The scalability and robustness of the application are also predicted for a said duration of time.

Data Preparation& Feature Engineering
Once the scope is determined the various sources for the data are identified, and the data can be extracted from internal as well as verified external sources. The Extracted data is then cleaned by filling out missing data and standardizing data. Once the data is cleaned the data is verified from a quality perspective.

Once the data is cleaned. The said data is transformed suitable for machine learning models (Feature Engineering), and data augmentation and normalizing are done.

On completion of the above step the data storage solutions, metadata storage, and data versioning are done. An ETL pipeline is created to ensure a constant stream of data to train the model.

Model Engineering
Once the data pipeline is created for the model. An appropriate algorithm is selected based on the approach to Machine Learning (Supervised or Unsupervised Learning). Once the algorithm is selected the model is developed. The development level testing is done on the model to check the results.

Model Evaluation
The developed model is trained which includes hyperparameter tuning for the model training activity. The trained model is tested with backtest data set / real-world data to ensure the model meets the business success criterion before signing off to move to production. The model evaluation is recorded and versioned to maintain reproducibility. Once the model is signed off by the business user, it is packaged.

Model Deployment
A deployment strategy (Container-based, in-house app-based, or Jupyter Notebook and awssagemaker) is evolved and the packaged model is deployed on the cloud infrastructure with edge locations or local infrastructure. APIs are used for accessing the predictions done by the model. The performance of the model is evaluated in the production environment. It should be ensured that the infrastructure has enough RAM, computing power, and storage to ensure the scalability of the model.

Model Monitoring
Once the model is deployed it is continuously monitored for performance and accuracy over time. The data that flows into the model is in a continuous state of motion and change which might result in model degradation. On encountering model degradation, the model will have to be retrained accordingly with new set of data and redeployed.

ML Phase	Description
Planning	The planning phase involves scope, success metric, cost and feasibility of the application needs to be defined. The cost-benefit analysis and the clear and measurable definition of successful business metrics which will include the model accuracy (F1 score, AUC) need to be defined. Once the scope and other parameters are defined, the availability of data and its source, and the legal implications are also defined. The scalability and robustness of the application are also predicted for a said duration of time.
Data Preparation& Feature Engineering	Once the scope is determined the various sources for the data are identified, and the data can be extracted from internal as well as verified external sources. The Extracted data is then cleaned by filling out missing data and standardizing data. Once the data is cleaned the data is verified from a quality perspective. Once the data is cleaned. The said data is transformed suitable for machine learning models (Feature Engineering), and data augmentation and normalizing are done. On completion of the above step the data storage solutions, metadata storage, and data versioning are done. An ETL pipeline is created to ensure a constant stream of data to train the model.
Model Engineering	Once the data pipeline is created for the model. An appropriate algorithm is selected based on the approach to Machine Learning (Supervised or Unsupervised Learning). Once the algorithm is selected the model is developed. The development level testing is done on the model to check the results.
Model Evaluation	The developed model is trained which includes hyperparameter tuning for the model training activity. The trained model is tested with backtest data set / real-world data to ensure the model meets the business success criterion before signing off to move to production. The model evaluation is recorded and versioned to maintain reproducibility. Once the model is signed off by the business user, it is packaged.
Model Deployment	A deployment strategy (Container-based, in-house app-based, or Jupyter Notebook and awssagemaker) is evolved and the packaged model is deployed on the cloud infrastructure with edge locations or local infrastructure. APIs are used for accessing the predictions done by the model. The performance of the model is evaluated in the production environment. It should be ensured that the infrastructure has enough RAM, computing power, and storage to ensure the scalability of the model.
Model Monitoring	Once the model is deployed it is continuously monitored for performance and accuracy over time. The data that flows into the model is in a continuous state of motion and change which might result in model degradation. On encountering model degradation, the model will have to be retrained accordingly with new set of data and redeployed.

Conclusion

The Machine Learning lifecycle is a fascinating journey that transforms data into actionable insights. Machine learning demands a deep understanding of the domain and the data that is involved in the ML Life cycle that would give out the ML model as the end result. The true potential of ML can be harness by mastering the ML lifecycle, business and data which would pave the way for innovation and growth.