Real World as Mathematical Space in Regard of Machine Learning Field
- Sarthak Niwate
- Jan 3, 2020
- 4 min read
Updated: Oct 1, 2020
The Mathematical Space in Data Science and Machine Learning field is a virtual or imaginary concept. It helps us to understand Machine Learning in a bit easy way. The given below data is clean and pristine.

Every line represents a person collecting all the attributes called entity. An entity is recorded as one record. The data represents a patient’s heart health. Some of them are healthy and some are having heart problems. The data is taken from a single clinic or lab where people had visited. Note that the data is historical data.
Your job as Data Scientist is to build a model in regard to given parameters, that is the person is healthy or not. Essentially, what you’re doing is, you’re trying to find out: Is there any relationship between these attributes or features given in the data and predict the condition of the person?
When you look at the data, you don’t know such pattern or say hidden pattern exists or not. You can’t even raise queries for the data set and in such case, we use Machine Learning.
The way you have to understand this is, the data is represented in the form of 3-dimensional space. Human minds can see the 3-D space, but can’t see more than 3-D. However, Machine Learning can work very easily in 3-D space or in 300-D space or more above! Whatever concepts we discuss in 3-D are easily extensible to n no. of dimensions.
The three vertical axes of this cube: sugar, age and BP-level are common in the given data set. Take any record say; suppose the last record (in below dig.2) pointed by the yellow arrow of Male having 15 age is green and on the other hand, the male having age 42 pointed to red. And so for all the data points in the given data set are in the Mathematical Space.

Now, the objective is to find out, that is there any relationship between the 3-dimension, the sugar, the age and the BP-level and the colour of the ball? That relationship if exists and if you find it then, that relationship will be your model which the algorithm will find out for you or will emit for you.
So, when you run the algorithm say, for example, I’m taking a linear model throughout this article. The objective of the algorithm is to find out Mathematical Space; a linear surface which separates the red balls from the green. This linear surface is the model. So, any data point lying above the surface is likely to be red because the majority of data points above the surface or plane is red and it represents that the person is a heart patient. Any data points lying below the surface is likely to be green and it represents that the person is healthy.
Keep in mind, this data set we are looking at is historical data, we already know whether those people are red or green. The question is when a new patient comes to the clinic or lab. Given the values of these three parameters the sugar, the BP-level and the age. What it is likely to be?

Prediction: If the coloured ball representing a person is lying above the surface, then the person is likely to be heart patient and vice versa if a person is lying below the surface.
Now this particular plane, which is shown in the dig. It is expressed in form of an equation.
ax + by + cz = d ; this is a linear equation
The algorithm will share this equation with you and this linear equation is your model. Now, for given data, (in dig.3) What the grey ball should be green or red? Or red because it is lying above the plane?

Suppose, the algorithm analysed the data and resulted in the ball in red colour. Hence, the person is most probably a heart patient then, your algorithm has done the right prediction. In case, the test points out that the ball is green and the person is healthy, then your model has done a mistake. As I introduced the plane, some green above the plane and some red below the plane are all the mistakes done by an algorithm.
What we need to understand is, when we will play with models in the real world, we don’t expect the model to be 100% accurate. We always have a tolerance level, where we allow the model to do errors. But, what is the range of errors? Which we are willing to tolerate is a difficult question!
All models in the real world will do mistakes, errors in prediction but, errors should be in some acceptable range. Depending upon the domain you are working in, the tolerance level of the model will vary. Generally, we look for 95% accuracy when we are not working in a life-critical domain or else 99.99% or more than that.
This is the concept behind the Mathematical Space is virtual concepts and it has a lot to do with Machine Learning. Then you can easily learn the basics of Machine Learning.
- Sarthak Niwate
Sources: www.medium.com
www.greatlearning.in (Diagrams)
Comments