Machine Learning can tend to be a very mystical subject. How a machine can seemingly simulate what is in some way, our greatest ability as living beings- learning- is quite difficult to understand. The best way to do this is by analogy- and that’s what this blog is for. Welcome! This is our first blog, on the philosophy of human learning and how machine learning embodies some part of this.
DISCLAIMER: I’m not an expert on Neuroscience. This article is based on my perceptions as a human and a learner.
Learning to talk was probably a pain. I definitely don’t remember it, but watching children around me makes it pretty clear. Relearning a language for school has only accentuated that experience.
But how it happens is very interesting. Here’s how it starts:
Someone tells you a couple of rules for the language. Some words are verbs and others are nouns, and if you don’t want to be boring, please use pronouns.
OK, great, now that you know the rules, you’re ready to speak the language, right?
Well, you know it’s not that simple. What you do next, when your parents next bring you some Similac, you babble a word at them. “Papa” and “Mama” seem favorites, but each to his/her own. Regardless, once your excited parents get over the shock, the first thing they’ll do is pronounce the word back at you. In other words, they correct you.
As you watch and hear your parents, you see them pronounce the word in one way.
At this point, you notice the slight lisp you’re adding to the word, or the syllable you’re missing. And you respond, better- not perfect, but definitely better. Over repeated exposure though, your pronunciation improves dramatically. And eventually, you can talk the language- you’ve finally figured it out! Your teachers will, over the course of school, refine your speaking ability, but no very major changes usually.
Ok. So now that we’ve had a quick recap of some of your childhood, what does this have to do with Machine Learning?
It’s quite like how Supervised Learning works. Supervised Learning is a branch of Machine Learning which involves the programmer giving the algorithm both the input information and the right answer in the training set. Don’t worry if you didn’t get a word of that, you really don’t need to. We’ll get to all of it.
Let’s first define what you were doing as a kid. You were learning a language - in other words, a system, in which if you provided an input(in this case, the idea that you wish to express), the system would provide an output( the words you were going to speak and how to speak them). In Machine Learning, this kind of system is known as a model.
But how did you come up with this ‘model’? You had what engineers call a training dataset. This refers to two separate items: the inputs, and the correct outputs. Your inputs, as we mentioned earlier are the things to be expressed, but here, they’re not yours. Instead, they’re your parents’. They’ll point to objects(input) around you, and say the words(correct output) that correctly express or describe them. Then you guessed. In the case of a human, this tends to be an informed guess, based on the impressions you’ve already learned from your parents or the innate tendencies of the human body, but with Machine Learning, this guess is truly random- a shot in the dark. This’ analogous to your attempt to say the word. Then your parents say the word back to you.
At this point, you notice how different your guess was from theirs. In particular, you notice that they pronounce some things in ways that you hadn’t been doing yet. Based on how far your guess, and therefore as a whole your model, was from them, you put proportional effort to fix your pronunciation.
This is analogous to how a supervised learning algorithm works. It’s all based on the idea of a cost function. This is an incredibly important idea in Machine Learning - it tends to be a standard requirement for a kind of learning algorithm. A cost function basically just looks at your training data, and compares it to your model. How good is your model doing? How far are its predictions matching the correct answers given in your training dataset?
The cost function, called J, solidifies the answers to these questions into numbers. In the future, we’ll analyze the nuts and bolts of various cost functions, but for now let’s leave it as an abstract idea. The key use of J is that by taking measures to reduce the mathematical function, we reduce the error of the function. This in turn gives us better and better models, until our models are almost always right - much like how, by making your mental model of language much closer to what your parents are saying, you make it closer and closer to perfect. This is the basis of most learning algorithms.
I was initially disappointed when I found this out. When you hear the glamorous name “Machine Learning”, or the larger subset, “Artificial Intelligence”, you expect some kind of brilliant intelligence, some kind of black box to manifest in the algorithm, not Calculus and Linear Algebra, which are what you use to do things like reduce the cost function to its minimum value. All the same, these are still remarkable breakthroughs, because ignoring the question of whether AI can ever really simulate an intelligence, Machine Learning displays a lot of the useful effects of having one.
Comments