Right here is frequent actual time inference structure if you wish to deploy your inference. For instance I’ve used AWS Sagemaker however you should utilize every other service to deploy endpoint.
There could be numerous selections for asynchronous system right here. Only for our understanding I’ve chosen AWS SNS, SQS and Lambda
If you deploy endpoint for low latency and excessive throughput app, this structure is sort of useful and can make it easier to obtain the aim.
Further options to suppose earlier than deploying mannequin
— auto scaling and roll again technique
There are 4 totally different deployment technique you need to determine earlier than you deploy mannequin :
1. A/B testing (Use to check efficiency of statistical significance of two fashions)
2. Blue Inexperienced Deployment (Progressively shift visitors if Inexperienced(new mannequin) is performing higher than Blue(present mannequin in manufacturing))
3. Canary (Solely check few customers with new customers to keep away from mitigate massive issues earlier than full fledge deployment)
4. Shadow testing (Ship similar manufacturing visitors to new mannequin to keep away from spoiling consumer expertise)
Let me know in the event you want any pattern code for AWS Sagemaker for any of the technique I’ll share it with you.
Like, Share and Subscribe to maintain motivating me to jot down extra such articles.
https://x.com/punitvara