As soon as in a hackathon, my teammates and I developed a machine studying python program that predicts the age of crustaceans primarily based on a set of options. It’s a really primary regression downside.
We started by creating this system in a typical workflow :
Characteristic engineeering, information cleansing, information splitting,mannequin stacking, fashions analysis and comparaisons, mannequin choice and eventually hyper-parameters tuning.
Since an image is price an image is price a thousand phrases, Right here’s the UML ( Unified Modeling Language ) workflow diagram to summarize it.
All lengthy the evening, our personal rating was rising step-by-step and we very completely satisfied imagining ourselves securing the primary place.
At 6 am, the personal rating have been revealed, the rating extremely decreased and we sadly secured the nineteenth place out of 37.
It was an actual disappointment.
The phenomenon that occurred is just known as overfitting. Our mannequin was unable to generalize ( make extrapolation ) to the brand new information which may be completely different.
The purpose is that my teammates and I didn’t deal with the Exploratory Information Evaluation half. I discovered how necessary this step is and started studying medium articles, books and watching coursera programs on this extremly necessary step.
This text is a summarization of all of the ressources that I discovered very helpful.
Coursera :
Exploratory Information Evaluation for Machine Studying by IBM
Structuring Machine Studying Tasks by DeepLearning.AI
Bettering Deep Neural Networks : Hyperparameter Tuning, Regularization and Optimization by DeepLearning.AI
Kaggle :
Characteristic Engineering course
Information Cleansing course