Skip to main content

Language, Learning and Quizzes - The Power of Guessing

Humans go from guessing to logic, by correction. This is how ML learns.

Machine Learning can tend to be a very mystical subject. How a machine can seemingly simulate what is in some way, our greatest ability as living beings- learning- is quite difficult to understand. The best way to do this is by analogy- and that’s what this blog is for. Welcome! This is our first blog, on the philosophy of human learning and how machine learning embodies some part of this.
DISCLAIMER: I’m not an expert on Neuroscience. This article is based on my perceptions as a human and a learner. 


Learning to talk was probably a pain. I definitely don’t remember it, but watching children around me makes it pretty clear. Relearning a language for school has only accentuated that experience.

 But how it happens is very interesting. Here’s how it starts:


Someone tells you a couple of rules for the language. Some words are verbs and others are nouns, and if you don’t want to be boring, please use pronouns.
OK, great, now that you know the rules, you’re ready to speak the language, right?


Well, you know it’s not that simple. What you do next, when your parents next bring you some Similac, you babble a word at them. “Papa” and “Mama” seem favorites, but each to his/her own. Regardless, once your excited parents get over the shock, the first thing they’ll do is pronounce the word back at you. In other words, they correct you. 

As you watch and hear your parents, you see them pronounce the word in one way. 

At this point, you notice the slight lisp you’re adding to the word, or the syllable you’re missing. And you respond, better- not perfect, but definitely better. Over repeated exposure though, your pronunciation improves dramatically. And eventually, you can talk the language- you’ve finally figured it out! Your teachers will, over the course of school, refine your speaking ability, but no very major changes usually.


Ok. So now that we’ve had a quick recap of some of your childhood, what does this have to do with Machine Learning?


It’s quite like how Supervised Learning works. Supervised Learning is a branch of Machine Learning which involves the programmer giving the algorithm both the input information and the right answer in the training set. Don’t worry if you didn’t get a word of that, you really don’t need to. We’ll get to all of it.

Let’s first define what you were doing as a kid. You were learning a language - in other words, a system, in which if you provided an input(in this case, the idea that you wish to express), the system would provide an output( the words you were going to speak and how to speak them). In Machine Learning, this kind of system is known as a model. 


But how did you come up with this ‘model’? You had what engineers call a training dataset. This refers to two separate items: the inputs, and the correct outputs. Your inputs, as we mentioned earlier are the things to be expressed, but here, they’re not yours. Instead, they’re your parents’. They’ll point to objects(input) around you, and say the words(correct output) that correctly express or describe them. Then you guessed. In the case of a human, this tends to be an informed guess, based on the impressions you’ve already learned from your parents or the innate tendencies of the human body, but with Machine Learning, this guess is truly random- a shot in the dark. This’ analogous to your attempt to say the word. Then your parents say the word back to you. 


At this point, you notice how different your guess was from theirs. In particular, you notice that they pronounce some things in ways that you hadn’t been doing yet. Based on how far your guess, and therefore as a whole your model, was from them, you put proportional effort to fix your pronunciation.

 

This is analogous to how a supervised learning algorithm works. It’s all based on the idea of a cost function. This is an incredibly important idea in Machine Learning - it tends to be a standard requirement for a kind of learning algorithm. A cost function basically just looks at your training data, and compares it to your model. How good is your model doing? How far are its predictions matching the correct answers given in your training dataset? 

The cost function, called J, solidifies the answers to these questions into numbers. In the future, we’ll analyze the nuts and bolts of various cost functions, but for now let’s leave it as an abstract idea. The key use of J is that by taking measures to reduce the mathematical function, we reduce the error of the function. This in turn gives us better and better models, until our models are almost always right - much like how, by making your mental model of language much closer to what your parents are saying, you make it closer and closer to perfect. This is the basis of most learning algorithms. 


I was initially disappointed when I found this out. When you hear the glamorous name “Machine Learning”, or the larger subset, “Artificial Intelligence”, you expect some kind of brilliant intelligence, some kind of black box to manifest in the algorithm, not Calculus and Linear Algebra, which are what you use to do things like reduce the cost function to its minimum value. All the same, these are still remarkable breakthroughs, because ignoring the question of whether AI can ever really simulate an intelligence, Machine Learning displays a lot of the useful effects of having one. 

Comments

Popular posts from this blog

Phase Spaces 1 : Graphs and Geometry

Phase Spaces One of the least heard of, and most interesting techniques of the sciences, that you rarely realize you’ve used before. Phase spaces are symbolic representations of a particular problem, which you can then use to solve it. Let’s start with a simple problem - in physics maybe. Let’s say we have a car, as all good physics problems do. You’re driving at a set initial speed, and a set acceleration. At what time would you have travelled exactly 15 ft? Let’s look at it in terms of "a phase space". I have a velocity-time graph down here:                                                                                                                                  Linear Velocity-Time Graph Nothing very exciting, but it’s a useful analogy. Here, the two variables involved (more on that later), are effectively the speed and the time. What you want to know are the success cases (totally a technical term), where the car travels 15 ft, no more, no less. How could you do tha

Phase Spaces 2 : Math and Gradient Descent

I'm going to start from where we left off in the last part of this series. If you haven't read that yet, check that out first if you want a more detailed understanding: We explored what a Phase Space is, why it's useful, what it has to do with Machine Learning, and more!  I'm assuming you've read the previous article, or you know what I talked about there: so let's get to it. At the end of the last article, we discovered that it was the power of mathematics that would help us find the best values of the parameters for the lowest cost function. Before we get into what the Math does, however, we'll need to define some things in the math. If you've done Calculus, and in particular, partial derivatives, you can skip this section, but otherwise I would suggest at least a cursory glance. I don't go into too much detail on the subject, but that's only because you won't need it.  Calculus Interlude: Derivatives- The slope of a graph is a concept you

Stochastic Optimization: NEW MINISERIES!

This is part of my new miniseries on Stochastic Optimization. While this is not taught in a lot of Machine Learning courses, it's an interesting perspective, applicable in an incredible number of fields. Nevertheless, this won't be a very long series, and when we exit it, it'll be time to dive straight into our first Machine Learning algorithm! Introduction to Optimization: Ok, so what is Optimization? As the name may suggest, Optimization is about finding the optimal configuration of a particular system. Of course, in the real world, the important question in any such process is this: in what sense? i.e. By what criteria do you intend to optimize the system? However, we will not delve too much into that just yet, but I promise, that will bring about a very strong connection to ML. Introduction to Stochastic Optimization: So far, as part of our blogposts, we have discussed Gradient Descent and the Normal Equation Method . These are both Optimization algorithms, but they di