Tuesday, April 29, 2014

Mathematics and Machine Learning

In my previous post I spoke about the skills that one needs to know or acquire before stepping into machine learning. One of the foremost skill is mathematics. Math is not difficult. Let me explain this with an analogy, suppose we goto a good restaurant and order an item of the menu. The food comes in front of us, it looks really wonderful, we taste it, we enjoy the food. After that we begin to think, how do I recreate the food with the same taste. For that we need to know the ingredients of the dish and the recipe.
Now machine learning is like that good food, and math is the ingredient and the person who makes the dish, the scientist. Of course even to become a good critic of different machine learning techniques one needs some expertise, but let me ignore that for the time being. 
In the mean while let me continue about the math that is necessary to understand the language of machines. Like I mentioned in my previous blog a sincere student of machine learning should possess at least elementary level knowledge in probability and statistics, calculus and linear algebra. I ll talk about each of the aforementioned topic in detail.

Calculus
The moment we hear the name calculus, our mind immediately relates to differentiation and integration. I'll speak briefly about why this is important. Calculus is that field of mathematics which deals with functions, polynomials and its properties. A good understanding gives one the power of intuition. After listening to the linear regression lecture of Andrew Ng, I began understanding how much important Calculus is. Following are some of the courses which I followed and loved-

  1. Calculus: Single Variable by R. Ghrist from Pennsylvania University (clear and lucid explanations, builds a solid foundation)
  2. Calculus One, The Ohio State University from Coursera
There are many other resources on the web, YouTube for example has lots of links to different playlists. Some of them are really good. I have given a link to these coursera sites because, it is like a one stop shop.

Probability and Statistics
This is one of the most important topics that every student from any stream of engineering needs to learn. It is not an option at all. Unless one understands probability, the concepts of machine learning will continue to remain a mystery. Here I must also speak about statistics, probability and statistics are not very different, humans and computer alike understands only statistics (discrete probability). There are some very good resources on the internet, following are some of my favourites
  1. Introduction to Probability- by John Tsitsiklis
  2. OpenIntro Statistics
  3. Elements of Statistical Learning
The first link will lead to a series of lectures, the second and third link are books that complement the lectures. Again one of the finest materials available online. If someone has good links do kindly post or comment, I'll add that as well. 

Linear Algebra
Last but not the least, comes the monk of modern mathematics! This subject, so simple and subtle and yet so very powerful. I have no words to sing the praise of this subject. It is again very useful for anyone who is serious of about pursuing a career in data science and machine learning. There may be many materials out there, but this particular series by Prof. Gilbert Strang is simply mind blowing. I cant remember the last movie that I have watched this many times. His lectures are so simple, conveyed so innocently, that you cannot help but fall in love with this subject. 
  1. Linear Algebra by Gilbert Strang
I conclude my post here, more details regarding the course plan action will be continued for the next couple of weeks. 

Thursday, April 24, 2014

Logical Progression into Computer Science

I am an electronics under-graduate student. I had little, or modestly speaking no idea what it was really that I wanted to do. I was romanticizing on the idea that I was good for anything that I choose to do. It was a time when studying electronics was considered glamorous, and I did it. I saw how loads and loads of people were being recruited by some of the IT giants so obviously I did not want to be a part of the herd! But ironically, I ended up joining one of them as well. My job was good, I was actually training with some of the best minds in my company (CTS), but I was simply too snobbish and blind to see this (I got the proper perspective quite later!!). I blindly believed I was destined for greater (more sophisticated) jobs. To cut the long story short, after many pitfalls I realized what I loved doing the most was writing computer programs!!

At this point, I knew that I did not have the necessary math skills to excel in the field of computer science. As soon as I realized what it was that I really wanted to do, I started finding out more information on how and where to begin. At this point I cannot help but express my gratitude to institutes like IIT, MIT, Berkeley, Stanford, etc., for opening their courses online. I remember I was one of the first to register for the courses offered by Coursera. I was so excited when I watched the materials, that for the first time in my life I regretted not putting efforts to get into premier institutes like the IITs and MIT.

Even though I thought it was too late, I realize there is no better time than now. I am here to learn. I am going to pursue that which piques my interest. One particular course machine learning, taught by Prof. Andrew Ng was so fascinating for me that I decided then that I am going to pursue my career in this field. I finally had found my passion - Machine Learning and Data Analysis. This is a vast field and there can be no single expert, it was all left to the creativity of the student. The following picture composed by Drew Conway, explains my previous statement.

Essentials of Data Science

I made a list of things which I realized I needed to learn to become a Data Science expert. Following is a list that I made based on my findings. Any suggestion contrary to/or on top of this, is quite welcome in the comments section.  
  1. Fundamental math
    • Probability and Statistics (more Statistics, since computers can perform only numerical calculations)
    • Calculus (for understanding and designing algorithms) for applications in computer vision, data mining applications and numerous other applications
    • Linear Algebra (this is quite essential for anyone desirous of making serious inroads in the field of machine learning)
  2. Programming language
    • Mastery over a scripting language such as R or Matlab and an object oriented language such as Java or C++ is a must. There are many libraries in these languages to perform a lot of machine learning tasks (I ll be discussing more in my up-coming blogs). 
  3. Computer Science Engineering
    • Analysis and design of algorithms (how to program to perfection)
One very important lesson that my Prof. Soman,  once told me was that, not every computer science programmer is a computer science engineer. It had a very deep impact on my perspective on engineering as I had known. I realized that to have substantiative expertise I needed to put in substantial efforts in the right direction. In the following blog I ll give the links and resources to some of the best materials (in my humble opinion) which I found very useful for learning the above mentioned topics. The objective of my blog to present those materials, which I found after considerable effort, more easily to those in search of it.