Information Regarding - AI/ML Classification and prediction
Data collection is the foundation of the pyramid, the stage where you identify what data you need and what is available. If the goal is a user-facing product, are all relevant interactions logged? If it is a sensor, what data is coming through and how? Without data, no machine learning or AI solution can learn or predict outcomes.
Identify how the data flows through the system. Is there a reliable stream or extract, transform, and load (ETL) process established? Where is the data stored, and how easy is it to access and analyze?
This is a time-consuming and underestimated stage of the data science project life cycle. At this point, you realize you are missing data, your machine sensors are unreliable, or you are not tracking relevant information about customers. You may be forced to return to data collection and ensure the foundation is solid before moving forward.
After you can reliably explore and clean data, you can start building what is traditionally thought of as business intelligence or analytics, such as defining key metrics to track, identifying how seasonality impacts product sales and operations, segmenting users based on demographic factors, and the like.
Now is the time to determine:
To avoid real-world disasters, before the sample data is used to make predictions, create a framework for A/B testing or experimentation and deploy models incrementally. Model validation and experimentation can provide a rough estimate of the effects of changes before you implement them. Establish a very simple baseline or benchmark for performance tracking. For example, if you are building a credit card fraud detection system, create test data by monitoring known fraudulent credit card transactions and compare them to the results of your model to verify it accurately detects fraud.
After you reach this stage, you can improve processes, predictions, outcomes, and insights by expanding your knowledge, understanding, and experience with new methods and techniques in machine learning and deep learning.