Artificial Intelligence

Democratizing Machine Intelligence

July 1, 2016 Leave a Comment

It is easy to make the mistake of focusing on UI, and “what you can see”, when you think about the service that you are offering services to people.

The UI is the last mile of a great user experience. You could have an amazing backend, but if the last mile is unusable then it is all for naught, thus the focus on great frontends is vital. However, you can also have the most amazing frontend that powers an awful backend and it is also not going to do the job.

When the most recent mobile explosion occurred it unleashed new opportunities thanks to the capabilities of a mobile device in your hand with sensors galore. The form factor required businesses to adapt, and new models became possible. This last mile opened the mind to come up with a service such as Uber, where you could call a car to you wherever you are. Before hand it would be useful as a desktop service, but nowhere near as useful. In fact it could be very painful:

“hmm I am waiting by the curb and the car is 5 mins late…. do I run inside to my laptop to see what is happening?”

The last mile also heavily relied on the backend system that solves the travelling salesman problem to optimize the entire ecosystem so a user gets a car quickly, and a driver doesn’t have to wait too long between rides.

When I think about the type of college course that I would want to send my kids through, they wouldn’t focus solely on the front-end, but instead walk them through the entire process opening their eyes to what it takes to deliver a truly fantastic experience for everyone in the ecosystem. Today, this means that they need to understand what is possible no only with the capabilities on a range of devices, but also on backend clusters.

We are well on the way to democratizing front-end app development, and a good mobile UX has become table stakes for a modern service, so what it is next?

I believe it is time to democratize machine intelligence. If you look at the largest category winners on the Internet, they all tend to have fantastic machine intelligence capabilities:

Google Search: search relevance
Gmail: filtering (e.g. spam), and smart actions
Google Maps: smart POIs, real time directions
Netflix: recommendations
Amazon: recommendations
Facebook: feed filtering

Or course, these are just the tip of the iceberg. Each of those companies use machine intelligence all over the map. When Google Photos came out, it had a nice UI, but what gave it the world class UX was in giving me results for [my son wearing yellow on his birthday].

Many could build a UI that equals Google Photos, but not many could deliver the search magic.

We have commoditized much on the server side. We have abstracted away compute, storage, and a myriad of other services that used to be hard to deliver, let alone scale. We got used to building simple CRUD based apps, and our “search” features started out as wrappers around SQL queries. Then we got great software such as Lucene, which gave us new primitives to customize our search experience.

We have many other primitives for machine intelligence now. We have the big data side with Hadoop and friends, and nice abstractions such as TensorFlow, but we are also seeing higher level solutions.

Going back to Google Photos, you actually can deliver that magic by using the Cloud Vision API.

Machine intelligence is becoming table stakes, and we need the primitives and higher level abstractions to democratize it. This is one reason why I am excited to be at Google right now, as we are getting to the point where we can reach the vision of external developers should be able to build experiences that are as powerful as the ones that we can build internally.

I love this vision partly because I think it is vital to spread this out as much as possible to enable thriving competition.

The Beyonce Law of Machine Learning

May 24, 2016 Leave a Comment

AI/ML and the important role in forgiveness with technology evolution

There is a lot of chatter on AI and Machine Learning, and recently on specific user experiences they open up: e.g. chat bots, as shown by Benedict Evans:

Others are discussing the overall role of machine learning and how it is becoming a vital component in all great experiences.

So, it is fascinating to hear the perception on how close we are to a particular break through, and when an application of ML can be actually used in production.

For some examples you see real world use cases right now, and for others it feels like we are so far away. Why is that? It is partly due to the complexity of the solution, but I also think there is another important element… the element of forgiveness and resilience that the solution can have. If you can provide forgiveness, it is easier to ship ML today and iterate on it.

Beyonce’s Law is the forgiveness curve of a solution.

Let’s look at some examples.

Search

A Google search has a lot of forgiveness built in. When it launched it gave back a series of blue links, but a human could help select the result from there (that human being: you). Compare this to search that only have the “I’m Feeling Lucky” feature, where you don’t have the option. Over time a search engine can improve, and there is a high likelihood that you will tap on the top result or two, but nothing is perfect and you can play an important role.

Chances are that you will find a result that works for you, so you will feel pretty satisfied if you tap and get some kind of answer to your query. This satisfaction will occur even if there is actually a “better” answer elsewhere. If you are looking to define a word and you get to dictionary.com, chances are you are OK with that, even if Wordnik had something more compelling.

One thing you really care about is performance, and top search engines always respond quickly. This gives you the confidence to come back for more. Google has always known this, and Jeff Dean has talked about all of the layers of resilience that Google has to make sure that it gets good results back to you.

Ilya brought this up in the context of modern web applications just last week at Google I/O and reiterated that:

Search of 99.9% of docs in 200ms is better than 100% in 1000ms

Perfect is very much the enemy here, and it is important to realize that. Now, search has evolved a lot over the years and we see more and more one box answers that try to solve your issue there and then, and that bar is much higher. Get that wrong and you lose trust, but all in all search has a high forgiveness factor.

Hints on Maps

Another example of something with high forgiveness is hints on Google Maps. I am sure you have seen an area where a few landmarks are highlighted for you.

Sometimes the selection feels magical “wow it put my kids school and the lil coffee shop I always go to!”, yet there may also be other landmarks that don’t make sense to you, but chances are you just don’t notice them.

In the map above, I see areas that make sense to me (Googleplex, shoreline movie theater), but I also see Costco, a store I have never been too in my life. Some of these may be personalized, but many may just be popular enough to show almost anyone.

The Maps team can slowly tweak things and get better and better at showing me what I need for my current context. A great forgiving experience.

Go and DeepMind

Winning at Go is an epic achievement in computing, and it was very exciting to watch. It has also been fascinating to see how Lee Sidol has been playing so well after that match, winning his last series of matches and talking about how he feels his game has changed.

As amazing as DeepMind is, it got to a level of “good enough” to beat the worlds best player over time, but who knows how much further it can go. You can never tell if certain moves are “the best move” as the game is so complex.

Conversational UI

So, what else is on the low side, maybe even lower? I think that conversational UI (and thus chat bots, voice UI, and the like) are right up there.

When a human interacts with a bot they are looking for the mistake and the minute one comes back they often throw up their hands with “a ha! see! I can’t trust it. I am done!”.

You see a lot of demos of chat bots and them truly seem magical, but have you gotten your hands on one and fully gotten there yet? I was using one recently and even on a happy path it quickly got confused and started telling me about restaurants on “Pork Lane”, instead of restaurants that have good pork sandwiches.

It is just bloody hard to nail it time and time again, with each interaction. How can you hack something to be a lil more forgiving? Is it enough to have humans behind the bots, able to jump in from time to time to take control? It feels like for plenty of use cases this would allow the bar of the bot to be a bit lower, but that obviously comes with its own issues.

Having other UI that works alongside voice is also a useful technique. It allows you to get deep to an action with voice, and then go through some selection process that could be more suited to tactile feedback and the UI of a screen.

We saw the pain of voice systems in telephony. How many people jumped to hit “0” to speak to a human rather than go through the phone tree? How many of you still do that? At least it gave these companies a line of defense and they can keep improving their offerings to a point where they can actually be much *better* than humans, since they can interface into multiple systems easier than the human on the other end.

We can see where this is going, and we see glimpses of this with voice tech such as Alexa. Going from simplish commands to rich back and forth interaction is going to be fun to see, as is solving the current discovery problem. How do you know what capabilities you can tap into with voice? Trial and error.

The a-ha! Moment

For use cases where it is hard to have forgiveness, you often just need to keep grinding and making progress.

With the iPhone, hard work and engineering over generations got us to the point where the electronics were small enough to fit the form factor, screens were powerful enough to display, and touch sensitivity had the right level of resistance. And all of this could be produced at industrial scale with systems that could handle daily usage.

We will always see “it won’t happen for ten years!” and sometimes get surprised. Those surprises will happen most often for scenarios where the forgiveness factor is low, and thus it is hard to get into market.

A Pool Cleaner Bot…. for our code?

April 13, 2016 Leave a Comment

There is a lot of talk of bots these days. The ones I tend to enjoy are the ones that do a lot of things for me, and I often don’t interact with them.

When I dream about the type of bots that I want to see, they normally aren’t of the conversational variety. They are more like a Roomba, or one of those cool pool cleaning bots. You set them off and they clean up for you.

When you combine bots with machine learning, things can get really interesting and I can’t see what happens.

One of the bots that is high on my list is one that fixes my typoz, grammar, and generally takes my writing and takes it from meh to yeah!

The bot (or bots, really) that I want to see unleashed are coding bots. You wake up in the morning and a kinda one has done a pull request updating you to a new version of a library, including all of your tests and making sure coverage and performance are AOK. This way it is easier for you to keep your project up to date, as you make the small tweaks to get from 1.2.3 to 1.2.4 vs. waiting for the 1.2.3 to 2.4, which makes you scream and rewrite the darn thing from scratch on a different platform.

A security minded bot is looking out for you in that regard. Tools such as snyk can let you know about certain vulnerabilities, and can even already walk you through a fix, and in the future they will get more advanced.

You can have the bots compete…. “Bot1 and Bot2…. make this code run faster!”. Before you know it your code won’t be readable so you will need the “make it readable too, and have it code like I would code it” bots.

With GitHub and the like, the PR system is perfect for this, added to the fact that we are getting much better at putting continuous integration tests in place that tell you much about the side effects of a PR.

For open source, you can have outside people run bots. For non open source, it would be nice to tie into a /robots.txt type system to be able to ping the owner?