Get Started

The Rabbit Hole Podcast

Welcome to The Rabbit Hole, the definitive developers podcast. If you are a software developer or technology leader looking to stay on top of the latest news in the software development world, or just want to learn actionable tactics to improve your day-to-day job performance, this podcast is for you.

rabbit-hole-with-tail.jpg

60. Machine Learning II With Tom Benham

On today’s episode of the The Rabbit Hole we welcome back Tom Behnam to continue our exploration of machine learning. After the broad introduction of part one, we’ll be getting into a bit more of the specifics and addressing some examples to help explain the concepts. First we categorize some of the differences between machine learning and deep learning and supervised and unsupervised algorithms. Tom helps us picture some of the current and then the possible applications for these systems. From there we compare the skillsets of humans and machines and get to grips with which tasks suit which intelligence. We also look at the short term future and imagine more self driving cars, which does not seem far away. Tom also unpacks some of the hinderances and weaknesses facing the field, most notably data biases and how this can be approached. Lastly we look at the recent scandal involving Facebook and Cambridge Analytica as an example of AI and data’s dark side.

Key Points From This Episode:

  • Defining the differences between machine learning and deep learning.
  • What are the fundamental reasons to pursue machine learning?
  • Applications for machine learning from insurance to photographs.
  • The two different groups of algorithms within machine learning.
  • Understanding supervised learning and what it can be used for.
  • Some examples of the areas in which humans are currently superior to machines.
  • Imagining the next five years of developments in machine learning.
  • Data biases and other negatives faces the models of machine learning.
  • Looking at the example of Cambridge Analytica and Facebook.
  • Understanding unsupervised machine learning algorithms.
  • And much more!

Transcript for Episode 60. Machine Learning II with Tom Benham

[0:00:01.9] MN: Hello and welcome to The Rabbit Hole, the definitive developer’s podcast in fantabulous Chelsea Manhattan. I’m your host, Michael Nunez. Our co-host today.

[0:00:09.8] DA: Dave Anderson.

[0:00:10.8] MN: Our producer.

[0:00:12.0] WJ: William Jeffries.

[0:00:13.7] MN: Today, it’s Machine Learning 2, the electric boogaloo.

[0:00:17.7] DA: My gosh, what does that mean?

[0:00:19.4] MN: It’s going down.

[0:00:21.0] DA: It’s the sequel.

[0:00:20.7] MN: You went with it.

[0:00:22.0] DA: Yeah, I really went with it guys.

[0:00:24.3] WJ: We’ve been listening to too much vinyl.

[0:00:27.1] MN: We had a previous conversation on machine learning with Tom Benham and I’d like to introduce once again, Tom Benham, welcome to the show.

[0:00:34.9] TB: Thanks for having me back.

[0:00:36.6] MN: Awesome, glad to have you here and joining us in the electric boogaloo of machine learning.

We’re going to dive in a little deeper on the conversation of machine learning, it seems that everyone in the community loves machine learning and we have an expert here in the room. We can just jump into that, does anyone have any starting conversations or questions that they want to ask Tom now that he’s here?

[0:01:04.3] DA: Yeah, I thought last time we like kind of really scratched the surface. Pretty well, but it was just a scratch so, I guess we go a little bit deeper. Into some specific kinds of machine learning maybe.

[0:01:15.7] TB: Yeah. We were telling a little bit before we started but I think in terms of like scope, maybe we’ll try in interests of getting into areas of a little more depth maybe, we’ll try and stay focused on what we’re maybe we’ll define as machine learning versus say, broader, deeper learning, just part of the broader AI. Set of algorithms that we can make some distinction there first and then maybe dive into the different areas of what machine learning is. Maybe put it within a more practical context.

[0:01:47.4] DA: Right, you got to say something for the end of the trilogy.

[0:01:50.5] MN: Yeah, exactly.

[0:01:52.9] WJ: Do you want to define those terms for us? Machine learning versus deep learning?

[0:01:57.7] TB: I’ll do my best. Machine learning has more potential transparency if you like and it’s more of a hands on user driven set of algorithms versus deep learning which is more self-learning algorithms using neural networks and algorithms within that space.

There’s much less transparency and more self-learning within the deep learning neural net space. With machine learning, it’s more hands on, defined algorithms that the modeler or code or whoever is iterating on in a more transparent way.

[0:02:35.0] DA: Right. I guess we are kind of driving to is like, how does one define features and a neural net versus a deep learning?

Sorry, like an neural net like with deep learning versus a more traditional machine learning thing like a regression.

[0:02:55.1] TB: Yeah, that’s exactly right, that’s the appropriate terminology. Feature extraction, definition, tuning is more of an engineering exercise whereas, one of the reasons deep learning I think now is much more popular both because it has a higher degree of accuracy and it’s less requires less hands on tuning.

The flip side of that though is that the features within deep learning tend to be extremely high volume and extremely low transparency.

[0:03:27.1] DA: Yeah, hard to interpret what it is.

[0:03:29.3] TB: Impossible, yeah. So where you know, all you know is this sort of performance and working and we don’t know why it’s accurate or not.

[0:03:35.4] DA: Yeah, it’s kind of like puts things on its head a little bit like with deep learning. I’ve seen some really good explanations of how like a neural net is working and how it breaks down, especially with some of Google’s demonstrations of either like Flow, how Sandbox works.

But, with more traditional machine learning methods, the expertise is really upfront where like, you need the expertise and this domain on order to understand it instead of expertise in the neural net, explain why it’s working.

[0:04:11.1] TB: Yeah, I think it may make sense if we stay within the machine learning space and the subcategories within that and then as you said, I can – gives me an excuse to come back again and we can do the trilogy like you said.

[0:04:25.2] DA: Yeah, cool, yeah. What are some examples of some traditional machine learning?

[0:04:31.4] TB: Algorithms.

[0:04:32.6] DA: Yeah, algorithms.

[0:04:33.9] TB: Do you want to start maybe in talking about like what’s the motivation around machine learning? And even, you know, this is true for the deep learning as well but yeah, sure.

To give people context on why you're doing this. I mean, obviously there’s sort of technical aspects to this which are fun and geeky and interesting to engage but at the end of the day, you're really – machine learning is about enabling decision making through machines as supposed to human beings, right?

We take in information all the time, visual, audio, sensory information and we respond to that in some way based on aspects of things that we’re not going to get into today.

[0:05:12.0] DA: Right. Kind of like the CIA title of like an analyst where they’re just like, taking in all the information and synthesizing it in their head.

[0:05:20.6] TB: Well, I’m just thinking more about us generally as human beings walking around the street and we take in information and we process it and we react back to the machine learning is about doing that same thing but it’s – we’re doing that through machines. If you think about examples are going to be that we do with it work for example.

What’s the probability of someone defaulting on a loan? Or what’s – if you think about putting your – I don’t know if you deposit checks and do an ATM at all, but the machines interpreting the text on that to translate that into a number for example or passing over photos into Google and Google categorizing those into which person that photo belongs to, which people are in that photo.

They’re all examples of, what you talked definitely. Whether it’s a dog or a blueberry muffin.

[0:06:13.5] DA: Yeah, really is challenging problems.

[0:06:15.5] TB: Yeah, there’s millions of people who deposit these checks and upload their photos to Google Photos, I’m definitely one of them and being able to take that information and then learning how to sort it to an individual. Google then hones its algorithm much better so that it can identify sooner or later, the difference between a blueberry muffin and a Chihuahua.

[0:06:41.9] DA: Yeah.

[0:06:43.4] TB: Yeah, machine learning is about enabling machines to make decisions using probabilistic algorithms and statistics and training machines to make its best judgment on, is that a two written on that paper or is that an eight or a five or is that Nuñez in that photo or is it William or David.

It's enabling that type of decision making, using algorithms is – I mean, again, that’s true, deep learning as well as machine learning. That’s the sort of motivation that maybe makes more sense to put the discussion in context and then we can get into the different categories.

[0:07:21.1] DA: Yeah. I guess you brought up one example of machine learning of like determining someone’s credit worthiness or like minus to default on a loan. That sounds like, that’s something that you would want a lot of transparency into. You’d want to understand exactly why a person may not be considered a safe bet in case like you need to explain it to someone later.

[0:07:46.6] TB: Yeah, there’s a lot of financial decisions can be based on those types of information and the better you are at interpreting what drives someone to be a good or bad creditor. You know, the better, the less bad debts you're going to have on your balance sheet basically.

And better decisions you’re going to make about lending money.

[0:08:06.0] MN: Yeah, I imagine that like, there’s like money implication, right? Because you want, banks want to be able to give out a loan at a particular interest rate that will best interest the bank and the – but also the individual to pay that back.

If Google messes up on a particular identification of a photo, one time, it will learn that and then afterwards but identifying when if there was a big loan and then that person defaulted then there is like, there could potentially be money lost.

There’s a lot more to lose in a loan, that gets defaulted than like a photo that gets misidentified.

[0:08:39.3] DA: Right, even like that person you know, on the other side of the money equation, it’s like that person’s life like, maybe if they didn’t have that loan, then you know, they would have like had to find some other way or not spend the money in that way.

[0:08:53.4] TB: Right.

[0:08:54.3] DA: They wouldn’t have to go through that process.

[0:08:56.1] TB: Right. We’ve kind of setup some of the motivations behind why you might use machine learning, I mean, lots of other examples are out, making decisions about what I had, just to push to which people and what’s the probability of them liking a given product if you put it in front of them and all that kind of stuff.

[0:09:14.4] DA: What kind of algorithm might be good to use with you know, one of that applications we were talking about earlier? Like the credit default application?

[0:09:27.7] TB: Generally, in that case, we – you know, more complex models, you’re going to use a classification model so maybe that’s a good segue into some of the different types of models within machine learning.

Maybe I’ll start up high and we can work our way into specific model for that particular use case.

Generally, within machine learning, got two different algorithm groups if you like, supervised learning and unsupervised learning. Maybe we’ll start with this supervised learning.

Supervised then splits into regression models or –

[0:10:01.4] DA: Just to take a step back maybe, what’s the difference between supervised and unsupervised?

[0:10:05.6] TB: Yeah, I was going to get over the supervised. But the fundamental differences with supervised models, you have a dependent variable. What that means is that you know – you have a set of data where you know the inputs and you know the result related to those inputs. Pick an example of facial recognition could be one where you have a set of photos where you know someone is angry, okay?

The dependent variable is an angry expression. You may use, maybe that’s not the best example but you have a set of photos where you know where the label that you're trying to predict.

[0:10:43.7] DA: Or I think, a common one is maybe like with house prices where you have some features of a market of house is in, what it’s in and whatever, and then the predictor that have been variable is the price of the house.

[0:11:00.7] TB: I mean, supervised learning is any data set where you effectively know, you have a data set where the inputs, you can connect the inputs with the result or the outcome, the dependent variable and the supervised – I’m sorry an unsupervised data set is one where you have a data set and you’re just trying to understand latent structure, you don’t have a known dependent variable that you can then tune your algorithm on or model your algorithm on.

[0:11:27.4] DA: You just got to have accept whatever it comes out with?

[0:11:30.6] TB: Yeah, that’s more what we call clustering, you’ll have – you’re clustering the data set into groups but you don’t know what the label of those groups are until you’ve actually run through the algorithm.

One example of that is natural language processing, topic modeling, I think we’ve touched on that last time.

You might take a set of like 5,000 articles from New York Times or something and you can process them through certain algorithms and it will start to automatically group them into different clusters based on the features inherent in that data set.

[0:12:03.5] DA: To some user to inspect and be like, okay, this is the topic about you know, the stock market.

[0:12:10.4] TB: This is that sports or this is about real estate or this is about economics or whatever the topic’s going to be.

[0:12:16.8] DA: Cool. Do you want to drill a little bit more into supervised learning?

[0:12:21.7] TB: Yes, the supervised learning, these two groups, regression which is really looking at sort of numeric predictors and classifications which is trying to – which is sort of categorizing the data into different groups.

The regression is your typical sort of intercept gradient equation, if you think of a very two dimensional data set, you know, you go the X and Y axis and then try to work out what is the curve that best predicts the outcome - for given X, what’s the best curve that predicts Y basically.

[0:12:52.8] DA: Right, your classic like, Y equals X plus B slip intercept in the simplest case.

[0:12:59.6] TB: That can be broken out to be much more multidimensional and you can add in sort of on them sort of to t he power, some factors within that great curvature within the curve and you can put different penalties on those models to try and restrict your bias various tradeoffs that you make building those out.

That’s the numeric and then the classification is, you’re trying to categorize the results that in a different groups. The classic data set is the iris dataset that if you’ve got  - if you go online, you can go to Kaggle, you can get, you can download any R and certain set libraries you can download and that’s basically a data set predict categorizing different types of iris flowers I think that’s three different types into that, which type they are based on petal, length, width and the stem length and width or whatever. Within the flower. That’s a very –

[0:13:56.6] DA: For someone like numerical value, you end up getting some qualitative thing about that flower?

[0:14:04.3] TB: In that classification example, yeah.

[0:14:06.8] DA: Yeah, cool.

[0:14:07.6] TB: The other thing I was going to touch on actually, the interesting – or the connection between humans making the difference I should say between humans making decisions and machines making decisions is that machine learning can consume all of these different sensory inputs, you know, can be visual, it can be audio, it can be raw data and sensory, less so at this stage.

The route is always the same, anything, if you take in audio information, you’re still going to convert that to some regulars or standardized format generally, sort of two dimensional matrices, then you’re going to run your algorithms over that, that’s the same with photos or other visual input and the same with data, raw data coming in.

[0:14:56.3] DA: Right, you got to do the extra leg work of building the feature out of your picture.

[0:15:03.8] TB: You’re going to take some information like a photo, you’re going to deconstruct that into some and convert them to some sort of matrices view and then you’re going to consume it within your –

[0:15:14.4] DA: There’s a class NAST handwriting samples where just like perfect 64x64 pixel images of different sloppy handwritings and in order to work with it, you need to put it all in a row and you know, turn it into greyscale and add some kind of a number that the computer can do some math with.

[0:15:36.8] TB: You got thousands and thousands of sixes and you start to iterate on exactly what a six looks like when it’s converted into a, what is it, 26x26 pixel view, whatever that turns out to be.

[0:15:50.7] DA: Right, yeah. That’s like another classification where we have supervised because we have a label on the data set, we know that this is a six and we have a whole bunch of them so we can learn what a six looks like and we’re crossing the data like we’re talking about. Each engineer.

[0:16:11.1] TB: Exactly.

[0:16:12.2] MN: I have a question. Studying machine learning, what is something that humans can do better than machine learning if that’s even possible?

[0:16:21.8] TB: You know, I think there’s lots of things right now that’s totally true. The thing – the discussion right now around machine learning or AI is that AI is very – it could be extremely powerful, a superhuman, better than humans, at very specific things. That might be –

[0:16:40.9] DA: Like autism spectrum, good, that’s one thing.

[0:16:46.2] MN: Yeah.

[0:16:48.5] TB: Yeah. That could be playing Go, it could be playing chess or it could be –

[0:16:53.8] DA: Or identifying handwriting. Really good, finding sloppy handwriting?

[0:16:59.5] TB: Yeah, definitely machines breaching human level error in those sort of basic functions now as well. Classifying photos. They can do much higher volume tasks even if the accuracy or performance is not better than humans like classifying photos into different groups and whatever.

Generally, they can be trained to be really good at one thing as supposed to the broader – what’s called general intelligence which is taking some level of intelligence like we have as human beings and be able to apply it in multiple domains. Machines can be trained to be extremely good in one domain but not really traverse.

Not that same single AI condense, reverse multiple domains and then – the other part of that is the clearly not better at everything in isolation but not as good in general, I mean in terms of human emotion and empathy, they just don’t have that capability and I think just navigating the actual three dimensional environment, robots just don’t have that capability yet and I am sure there’s millions of other examples where that would be true where humans are far better than machines still.

[0:18:14.2] DA: Yeah, they are a lot of layers for ideas where people are fostering about, it’s kicking over robots there.

[0:18:18.8] MN: Oh don’t do it to us, that’s how it starts. We’ll pay for that in the future I guess.

Where do you see machine learning affecting humans within the next five years? Do you think that self-driving cars, there will be 50% more self-driving cars in five years or what are some tasks that we’ll see a huge boom in machine learning in the next five years?

[0:18:45.8] TB: I mean yeah, that’s probably one.

[0:18:47.5] MN: Yeah, self-driving cars would be pretty amazing.

[0:18:50.4] TB: I mean what are the – they said they had already had a few deaths around that and one recently it doesn’t seem to be slowing things down much at all.

There’s a lot of automation going in the financial services industry. People don’t really have a scale that about yet but I mean it is already true within high speed trading on the stock exchange and in other markets. I mean machines have basically taking over to the stock market.

[0:19:16.1] MN: Just like applications like Betterment and Smart Investments and stuff like that that automatically does trading for you on a day to day basis, something like that.

[0:19:24.4] TB: Yeah, so it is already I think true within the health care industry as well that machines, machine learning, deep learning can be used in ways that is classifying medical diagnoses in a more accurate way than doctors are in certain areas based both from a visual perspective and also just sort of more multi-dimensional inputs but –

[0:19:44.4] DA: Yeah. I like the idea that although you really want to have the final say on something like if you have cancer or not to be deferred to a human because we’ve had the most experience with that and we have a lot of trust in humans. I like the idea about machine learning making people’s lives a lot easier. I had a buddy who actually did that. He looked at skin cancer slides and that was his job. He’d spend nine hours a day looking at these slides and it’s like, “Oh not cancer, not cancer, not cancer.”

[0:20:19.6] MN: Yeah, even after a machine looked at it or?

[0:20:22.2] DA: No that’s what his job, yeah.

[0:20:25.1] MN: Oh okay.

[0:20:25.8] DA: Yeah, I like the idea like maybe some of those obvious not cancers could be screened out and then if there is like a maybe, I think you ought to get a dude that in there or a woman and check it out.

[0:20:38.6] TB: Yeah and I think it is going to be across industry but I think definitely a cause is definitely going to be a big thing or the autonomous driving. But it’s like prevalent now already and it’s just going to continue to be. I mean certainly, like I said financial services, health care or whatever industry.

[0:20:54.9] DA: Or even like subtle ways, I think that’s going to be the most interesting thing where maybe you won’t have a car that will drive itself but you could have a car that has braking assist feature or it has like a traffic driving feature and you won’t even think about that as being like self-driving but like say, “Oh this is a really nice and cool thing I have.”

[0:21:18.5] TB: Well that’s already there as well.

[0:21:20.8] DA: Yeah, exactly.

[0:21:21.7] MN: Yeah, I am waiting for the self-driving car that I own, that picks me up, takes me wherever I go and then it parks itself like I have to not worry about my car at all. I think that is the ideal use of automation in the car, for me at least.

[0:21:40.3] DA: Yeah, save life.

[0:21:42.6] WJ: At that point you don’t really need to own the car. It’s just Uber.

[0:21:45.2] TB: Exactly.

[0:21:46.3] MN: Yeah but then they’ve got to throw a fee on it and I don’t know, I just feel like if I owned it then it would probably be cheaper, I am not sure. I’m not there yet, that’s future Mike’s problem.

[0:21:56.4] WJ: But if it’s just your car and no one else is using it then you have nobody to share the cost of the car.

[0:22:01.3] MN: That’s true.

[0:22:02.1] WJ: Whereas if it’s crowd sourced, there are lots using that car whenever you don’t need it.

[0:22:07.4] MN: Or I can start a company that does that and then make money. I don’t know this is again –

[0:22:11.8] DA: I will call it Uber.

[0:22:15.0] MN: Yeah, Mal-Uber, it’s the machine learning Uber there you go. I think machine learning is definitely a concept that will continue to grow within the next five years. I don’t want to say I am nervous but I am really anxious to know what are some of the things that will happen because a lot of good things will happen out of this like you know, being able to analyze things that doctors may not be able to see because they’ve had to deal with a very long procedure or surgery or whatnot that we can hand it off to machines that could then read charts and dig in to some of the maybe’s like Dave mentioned to find out whether it is skin cancer or not.

And I mean I don’t know, it’s just really exciting. I can see why people are excited about it.

[0:23:01.8] DA: I think we can talk a little bit about some of the potential pitfalls with machine learning too especially with classification. You always have to worry about your error rates if you are doing a regression like you are predicting the price of a house then you have to worry about how far off you are and know what kind of errors are prevalent in the system that you’ve trained but it is especially true with something in like cancer classification. You want to know what quadrants you have of true positives and true negatives versus false positives and false negatives.

[0:23:39.6] TB: Yeah there’s lots of different issues in relation to just for the model development. I think we touched on this in the last podcast. I mean from a model development perspective you’re always making these tradeoffs between minimizing bias but also minimizing the model sensitivity. Say if you over tune your model it is going to be super high predictor on the data you’ve used to tune the model but if there’s a piece of data, or outcome that sits in between the variables that you use to train the model then the model maybe highly erratic and highly variant in terms of the stability to make those predictions.

So you have to avoid making these balance between how sensitive the model is to the new data versus how accurate it is based on the dataset that you used in the train up but that is sort of inherent issue that any model is going to deal with.

But then you’ve got issues like depending on what new data you use to train the model then your model is effectively as biased as the data that’s been used, that you’ve trained it on and it’s like that.

[0:24:43.7] DA: That’s true, yeah I figure that if your dataset doesn’t cover all of the cases which is like definitely going to happen. You definitely don’t have all the data.

[0:24:51.4] TB: Yeah or I have it trained on a particular portion of the population and therefore it is going to be more sensitive to I maybe more punitive in a given scenario to a certain subset of the population versus another. And so I think data biases is going to be a big issue and then transparency is going to be an issue as well and I think we have already seen pitfalls in the recent election around the use of AI and algorithms to auto disseminate information.

Based on sort of slight of metric analysis of different individuals to manipulate their thinking and behavior and to that what the whole Cambridge Analytica example with Facebook is a great example of AI being used for the various purposes and that is only going to get more sophisticated and harder to detect.

[0:25:42.3] DA: Yeah, it’s tricky too because you can see how people might think, “Okay this is actually not a big deal. I am actually making the most efficient use of the advertising revenue that I am trying to do. I am trying to target the very specific person that I want to talk to and convince them to take some action” but then –

[0:26:03.1] TB: That’s not what Cambridge Analytica was doing.

[0:26:06.2] MN: Yeah.

[0:26:06.4] WJ: What was Cambridge Analytica doing?

[0:26:08.8] TB: Well if you read, one of the guys that used to work for it, they’ve build this capability to run to sort of create a psycho metric profile of individuals based on their activity on Facebook. So they had certain inclinations one way or another, biases one way or another and therefore they would be more susceptible to certain information versus other information in terms of increasing their likelihood of forwarding something, responding in a certain way.

And so they were building algorithms that were essentially trying to make a prediction about what type of personalities someone had and therefore feeding them specific information similar to an ad based on your psycho-metrics, you get different ads on your phone versus Dave and so they are doing the same thing but they are feeding you information and in this case, fake information that they think that you will respond to in a way that is positive in their perspective.

[0:27:04.2] MN: And knowing your profile and the profiles and your behaviors on Facebook, you would be less likely to then see things if it wasn’t targeting for you in the first place whether it was real news or fake, depending on the people who have shared it and the people who would be interested in that and that is an interest with quotes. You may not see me do that but if you are interested in that then you would see it and then your response would feed into your profile that then they could use even more deeper so you’re constantly getting –

[0:27:37.1] TB: So if you were susceptible to the idea that Muslims were laughing on Jersey City as the tower came down which is totally like not true. The stories were written using AI around that and then sent - targeted to people on Facebook who would be more –

[0:27:53.4] MN: Who have shared that yeah.

[0:27:54.4] TB: Who would be more susceptible to saying, “Yeah that’s true,” and passing it onto other people in their group -

[0:27:59.2] WJ: So what algorithms would be used to do that like what type of AI? Because we talked about different models, is that supervised learning? Is that unsupervised learning? What model under these categories?

[0:28:10.2] TB: So you could use classification algorithms and so you would be putting people into different groups based on certain information, geography, age, gender, work -

[0:28:24.2] MN: Location.

[0:28:25.1] TB: Yeah.

[0:28:26.1] WJ: So that’s your feature set?

[0:28:29.0] TB: That’s your feature set but the results there that you may be able to label them in some way based on other information whether they’re alt-right, conservative, super left, whatever it is that you think that they are. But again, you may not have that. You may not be able to label that dataset and say you are using more unsupervised models or deep learning models to understand that latent structure of whatever the dataset that you’ve poured out. In Cambridge Analytica’s case, I suppose that was like 50 million, 80 million profiles or something like that.

[0:29:04.7] WJ: So I guess you could go through, find a bunch of articles that you would like - or that you would say are of the type that you want and more people to share and then anybody who actually shared those articles would be considered a positive case and then anybody like an equal size group of people who didn’t would be considered the negative case and then that would be your labels?

[0:29:28.3] TB: Yeah, so you could AB test it like that. So you set one group a set of stories and another group set of stories that you think a part of that same cluster and look at how they respond and then flip that switch maybe the next day and see how they respond but yeah, I think the way you described it makes sense.

[0:29:47.2] WJ: What about using a reinforcement algorithm?

[0:29:49.3] TB: Well see now you are getting into deep learning again, you know?

[0:29:51.8] WJ: All right, I got this part of me that is there.

[0:29:55.9] DA: But I guess like maybe we could use that as a segue to talk about like declassification or I’m sorry, the unsupervised clustering and something like that.

Like how an algorithm like Cambridge Analytica as a nefarious thing might work as an unsupervised?

[0:30:14.4] TB: Yeah, I don’t know if I can make a coherent connection directly back to them or not. So there’s different categories within the unsupervised as well. There’s areas of what they call dimensionality reduction. Topic modeling was another one and then there’s algorithms like [inaudible] which is just classifying data based on its proximity to other data round up. It probably sounds a bit abstract and ridiculous.

So dimensionality reduction is saying we’re looking at a dataset and then saying, “Well what are the features that are really important in terms of being predictive around how the state of my response or what the results of the relationship between the inputs and the result?”

[0:30:56.7] DA: Yeah, I can see that like as being something like if in your Facebook profile like in the Cambridge Analytica example, if you like something that’s more right leaning like trucker hats and country music, then maybe you will cluster more with people who are similarly likeminded than people who like opera and skinny jeans.

[0:31:23.1] MN: And hipster to coffee shops or something. I don’t know.

[0:31:26.4] DA: Right although there is an overlap maybe with the trucker hats and where with hipsters but.

[0:31:31.5] TB: Yeah, so I think that’s probably a good way to think about it. You’re going to look at given their profile type, if you throw some information out them, what the probability that they are going to consume that and then pass that on and then start to build some relationships around and predictive power around what type of information they’re going to consume and how they’re going to react to it.

So if you throw an article in their feed related to the latest symphony at the New York Philharmonic Orchestra, they are probably going to skim straight through that versus maybe if you are staying with the trucker people.

[0:32:11.3] DA: Yeah, that’s interesting.

[0:32:13.8] TB: Well I don’t want to be biased, yeah.

[0:32:17.0] DA: Right, I guess kinda that ties into stacking different kinds of models together because you might try to cluster people together in a way that you don’t really understand but then maybe you could throw those links to the Philharmonic or Fox News at them and see how they react to that as you can measure it from your click throughs and then get an understanding about what things are going to hit or miss with different types of people.

[0:32:45.8] TB: Yeah, exactly.

[0:32:46.8] MN: Well that is a very, very insightful conversation on machine learning and how machines even right now our social media profiles and the applications that we use are constantly collecting data on us to make our lives better.

I am going to say better for the future. I just want to keep thinking that.

[0:33:05.4] DA: Yeah, I think so.

[0:33:06.5] MN: I think it should be a lot more positive.

[0:33:07.8] TB: I think machine learning is like money and guns, I mean it is as good or as bad as people make it so.

[0:33:16.2] MN: Yeah, exactly.

[0:33:18.1] DA: So I guess we’ll get to see some self-driving cars, we’d do some skeet shooting and make a lot of money.

[0:33:23.8] MN: Oh yeah.

[0:33:24.8] DA: For charity.

[0:33:25.2] MN: For charity, Tom how can people reach you online?

[0:33:28.1] TB: The easiest way is probably at LinkedIn, type Thomas Benham.

[0:33:32.0] DA: Spell the last name?

[0:33:33.1] TB: Benham.

[0:33:34.8] MN: Awesome. Last time Tom was here he spoke about Jujitsu and I am curious, how’s that going for you?

[0:33:41.0] TB: Yeah it’s good. I try to go three times a week. I am slowly racking up the stripes on my belt.

[0:33:46.3] MN: Oh nice, where are you now?

[0:33:47.6] TB: I’m only two stripes on my white belt at this stage. I’m still getting it and I’ve cracked a rib and broke my thumb.

[0:33:57.3] MN: For two stripes? You’ve got to crack your skull to get a new belt color? How does that work?

[0:34:05.3] TB: Yeah, my hands are constantly swollen it feels like.

[0:34:08.6] DA: It’s from punching people all the time, that’s brutal.

[0:34:11.3] MN: That’s not punching in Jujitsu.

[0:34:12.8] TB: Yeah that’s why I am liking Jujitsu, there’s no punching.

[0:34:14.0] DA: Oh that’s fair.

[0:34:15.4] TB: It’s like grabbing and grappling, you’re holding people all the time.

[0:34:19.8] MN: I don’t want to see you in a dark alley when you are angry so I’d make sure I am far away.

[0:34:24.7] TB: I don’t really feel that lethal to be honest.

[0:34:27.4] DA: Well I mean regardless, it’s wonderful having you at the podcast.

[0:34:31.4] TB: Thanks again for having me.

[0:34:32.8] MN: Thanks for coming on Tom. Thank you so much, you might as well going to come back. We could have a part three on the machine learning whenever you’re ready come on down.

Let’s keep the conversation going on Twitter, follow us now @radiofreerabbit. Like what you hear? Give us a five star review, it helps developers just like you find their way into The Rabbit Hole. And never miss an episode, subscribe now however you listen to your favorite podcast.

On behalf of our producer extraordinaire, William Jeffries and my amazing co-host, Dave Anderson and me, your host, Michael Nunez, thanks for listening to The Rabbit Hole.

Links and Resources:

The Rabbit Hole on Twitter

Tom Benham on Linkedin

Google Sandbox

New York Times

Kaggle

Betterment

Uber

Cambridge Analytica Facebook

Comments