Tuesday, April 29, 2014

Mathematics and Machine Learning

In my previous post I spoke about the skills that one needs to know or acquire before stepping into machine learning. One of the foremost skill is mathematics. Math is not difficult. Let me explain this with an analogy, suppose we goto a good restaurant and order an item of the menu. The food comes in front of us, it looks really wonderful, we taste it, we enjoy the food. After that we begin to think, how do I recreate the food with the same taste. For that we need to know the ingredients of the dish and the recipe.
Now machine learning is like that good food, and math is the ingredient and the person who makes the dish, the scientist. Of course even to become a good critic of different machine learning techniques one needs some expertise, but let me ignore that for the time being. 
In the mean while let me continue about the math that is necessary to understand the language of machines. Like I mentioned in my previous blog a sincere student of machine learning should possess at least elementary level knowledge in probability and statistics, calculus and linear algebra. I ll talk about each of the aforementioned topic in detail.

Calculus
The moment we hear the name calculus, our mind immediately relates to differentiation and integration. I'll speak briefly about why this is important. Calculus is that field of mathematics which deals with functions, polynomials and its properties. A good understanding gives one the power of intuition. After listening to the linear regression lecture of Andrew Ng, I began understanding how much important Calculus is. Following are some of the courses which I followed and loved-

  1. Calculus: Single Variable by R. Ghrist from Pennsylvania University (clear and lucid explanations, builds a solid foundation)
  2. Calculus One, The Ohio State University from Coursera
There are many other resources on the web, YouTube for example has lots of links to different playlists. Some of them are really good. I have given a link to these coursera sites because, it is like a one stop shop.

Probability and Statistics
This is one of the most important topics that every student from any stream of engineering needs to learn. It is not an option at all. Unless one understands probability, the concepts of machine learning will continue to remain a mystery. Here I must also speak about statistics, probability and statistics are not very different, humans and computer alike understands only statistics (discrete probability). There are some very good resources on the internet, following are some of my favourites
  1. Introduction to Probability- by John Tsitsiklis
  2. OpenIntro Statistics
  3. Elements of Statistical Learning
The first link will lead to a series of lectures, the second and third link are books that complement the lectures. Again one of the finest materials available online. If someone has good links do kindly post or comment, I'll add that as well. 

Linear Algebra
Last but not the least, comes the monk of modern mathematics! This subject, so simple and subtle and yet so very powerful. I have no words to sing the praise of this subject. It is again very useful for anyone who is serious of about pursuing a career in data science and machine learning. There may be many materials out there, but this particular series by Prof. Gilbert Strang is simply mind blowing. I cant remember the last movie that I have watched this many times. His lectures are so simple, conveyed so innocently, that you cannot help but fall in love with this subject. 
  1. Linear Algebra by Gilbert Strang
I conclude my post here, more details regarding the course plan action will be continued for the next couple of weeks. 

2 comments:

  1. Found it good reading mat, but would wait for more detailed write up on the subject. Wishing you all the best in your next assignment. kvr

    ReplyDelete