NVIDIA and Deep Learning Research with Bryan Catanzaro

Bryan Catanzaro, the VP Applied Deep Learning Research at NVIDIA, joins Mark and Melanie this week to discuss how his team uses applied deep learning to make NVIDIA products and processes better. We talk about parallel processing and compute with GPUs as well as his team’s research in graphics, text and audio to change how these forms of communication are created and rendered by using deep learning.

This week we are also joined by a special co-host, Sherol Chen who is a developer advocate on GCP and machine learning researcher on Magenta at Google. Listen at the end of the podcast where Mark and Sherol chat about all things GDC.

Bryan Catanzaro

Bryan Catanzaro is VP of Applied Deep Learning Research at NVIDIA, where he leads a team solving problems in domains ranging from video games to chip design using deep learning. Bryan earned his PhD from Berkeley, where he focused on parallel computing, machine learning, and programming models. He earned his MS and BS from Brigham Young University, where he worked on higher radix floating-point representations for FPGAs. Bryan worked at Baidu to create next generation systems for training and deploying deep learning models for speech recognition. Before that, he was a researcher at NVIDIA, where he worked on programming models for parallel processors, as well as libraries for deep learning, which culminated in the creation of the widely used CUDNN library.

Cool things of the week

NVIDIA Tesla V100s coming to Google Cloud site
Automatic Serverless Deployment with Cloud Source Repositories blog
Magenta site
- NSynth Super site
- MusicVAE site
- Making music using new sounds generated with machine learnnig blog
Building Blocks of Interpretability blog

Interview

NVIDIA site
NVIDIA GPU Technology Conference (GTC) site
CUDA site
cuDNN site
NVIDIA Volta site
NVIDIA Tesla P4 docs
NVIDIA Tesla V100s site
Silicon Valley AI Lab Baidu Research site
ICML: International Conference on Machine Learning site
CVPR: Computer Vision and Pattern Recognition Conference site

Referenced Papers & Research:

Deep learning with COTS HPC System paper
Building High-level Features Using Large Scale Unsupervised Learning paper
OpenAI Learning to Generate Reviews and Discovering Sentiment paper
Progressive Growing of GANs for Improved Quality, Stability, and Variation paper and CelebA dataset
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs paper
Deep Image Prior site
How a Japanese cucumber farmer is using deep learning and TensorFlow blog

Sample Talks:

Future of AI Hardware Panel video
High Performance Computing is Supercharging AI blog/video
AI Podcast: Where is Deep Learning Going Next? blog/video

Sample Resources:

Coursera How Google does Machine Learning site
NVIDIA Deep Learning Institute site
Udacity AI Nanodegree site
Kaggle site
TensorFlow site
PyTorch site
Keras site

Question of the week

What to watch out for and get involved in at the Game Developers Conference (GDC) this year and in the future?

International Grame Developers Association (IGDA) site
Fellowship of GDC Parties site
ALtCtrlGDC site
Experimental Gameplay Workshop site
Women in Games International (WIGI) site
Blacks in Gaming (BIG) site
Serious Games (SIGs) site
What’s New in Firebase and Google Cloud Platform for Games site
Summits to Checkout:
- AI Game Developers Summit site
- Game Narrative Summit site
- Independent Games Summit site
Additional Advice:
- The first two days are summits which are great because topic focused
- Expo floor takes a good hour to get through
- WIGI, BIG and SIGs (Google and Microsoft) have the best food
- GDC is composed of various communities
- Bring business cards
- Check out post-mortems
Favorite Games:
- Mass Effect site
- Final Fantasy wiki
Games Mark & Sherol are currently playing:
- Hearthstone site
- Dragon Age Origins wiki

Where can you find us next?

Mark and Sherol are at the Game Developer’s Conference (GDC). You can find them via the Google at GDC 2018 site.

Sherol will be at TensorFlow Dev Summit speaking about machine learning research and creativity next week.

Transcript

show full transcript

MARK: Hi, and welcome to episode 119 of the weekly "Google Cloud Platform Podcast." I'm Mark Mandel, and I'm here with my colleague, as always, Melanie Warrick. Melanie, how are you doing today?

MELANIE: I'm doing good, Mark. How are you? And we have a special co-host with us today.

MARK: We do have a very special co-host for this day. Sherol, how are you doing today?

SHEROL: I'm doing great. Thanks for having me.

MARK: Excellent. Thank you so much for joining us. We are talking to Nvidia today, which is actually a really interesting discussion on our interview.

MELANIE: Definitely, we're going to be getting into more of the details around GPUs and AI in our interview later on with Bryan. But before we do that, as always, we go into the Cool Things of the Week. And we have a Question of the Week. In this week's question, we're going to-- and this is part of the reason why Sherol's joining us today. We want to talk about GDC, the Game Developers Conference, that is going on as we speak and that you and Sherol will be attending. So we want to talk about what you guys find as exciting. And as veterans of GDC, what do you all recommend to those who might be new or exploring it for the first time?

But first, as always, we like to get into our Cool Things of the Week. And one of the cool things of the week is that we have the V100s, which is the Nvidia GPU. And that is coming to Google Cloud. That will be available in all the clouds, all the Google Clouds.

MARK: All the Google Clouds.

MELANIE: All the Google Clouds, because there's a lot of Google Clouds.

MARK: Yeah. That's a lot of clouds. [LAUGHING]

MELANIE: [LAUGHING] There's a lot of clouds.

MARK: We have a lot of regions now. [LAUGHING] Awesome. I want to have a quick plug for a blog post written by one of our fellow Dev [? Rel ?] members, Chris Broadfoot here in San Francisco, also fellow Australian and guest on the podcast previously.

MELANIE: Nice.

MARK: There we go. He wrote a blog post talking about "Automatic Server-less Deployments with Google Cloud Repositories and Container Builder." He has a great blog post talking through how you can build CI and CD pipelines using Cloud Functions, Source Repositories, Container Builder. It's actually really easy to do. I've done it for the CI pipeline for [? Gonez. ?] It's actually kind of cool how much you can get done with just a few bits of JavaScript and some Cloud Builder scripts.

MELANIE: Nice. And Sherol, did you have something you wanted to share?

SHEROL: Oh, yeah. Two of the projects on the Magenta Team have had a launch this past week. It's been really exciting, a lot of buzz. It's interesting because one is on the research side of machine learning and creativity. And then the second one is on the productization of machine learning for creativity.

So both of these, for the research side, we have MusicVAE. And on the product side, we have NSynth Super, which is an open source piece of hardware that runs machine learning models to generate sounds for music.

MELANIE: That's pretty cool.

MARK: Awesome.

MELANIE: We'll have to dive into more detail in a later, I think, podcast.

MARK: I think we might have to do a Magenta episode.

MELANIE: Yeah, I think we will.

MARK: I think that might be something we need to do.

MELANIE: Well, so the last thing we want to mention for this week, in terms of cool things of the week, is that there's this great article out there on the building blocks of interpretability. And we'll share a link in the show notes, as always. What this article goes over is the different techniques and approaches that have been used and explored for interpretability for machine learning, especially neural nets, like feature visualization, attribution, dimensionality reduction.

But it's looking at it and trying to take those approaches as building blocks and combine them. And that's what's unique in this article, in particular. And it's trying to do it in a way that's "human scale," as it says, something that's understandable by humans. So we'll share that. It's interesting. And, yeah, I think that covers us for cool things of the week.

MARK: Awesome. Well, why don't we go have a chat with our buddy over in Nvidia?

MELANIE: Sounds good.

MARK: Let's do that.

MELANIE: So on this week's podcast, we are excited to have join us Bryan Catanzaro, who is the VP of Applied Deep Learning Research at Nvidia.

BRYAN: Hey.

MELANIE: Good afternoon. And so Bryan, can you tell us a little about yourself and the work that you're doing now and the work you've done in the past?

BRYAN: Sure. So I lead a team of researchers here at Nvidia that's focused on using deep learning to make Nvidia's products better and Nvidia's processes better. And so we're looking at a bunch of different ways of using AI in lots of different domains, ranging from video games to chip design and all sorts of stuff in between. So it's an exciting time to be working in AI applications. There's a lot of diamonds on the beach waiting for people to pick them up and change the way that we do things. And so it's a pretty exciting project.

MELANIE: Definitely. And in terms of Nvidia, I know Nvidia's led the way when it comes to GPUs. And GPUs have played such a significant role in the deep learning space, as well as the gaming space. We were talking about this actually right before we started the podcast, about how impressive it is that Nvidia's strategy around GPUs, in particular, has shifted over the last few years alone. And can you speak to that a little bit?

BRYAN: I think Nvidia has always had a strong belief that GPUs as a throughput-oriented processor were going to change the world beyond gaming. And more than 10 years ago, Nvidia started investing in CUDA, which is our language and, sort of, ecosystem for programming throughput-oriented computers. And at the beginning, we were a little bit less certain about what applications were going to need this power and what was actually going to change in the world because we had this new way of thinking about algorithms and computation.

But I think the belief has been consistent, and the focus has been there for a long time. Deep learning happened to be the most important of all the applications that need high throughput computation. And so once Nvidia saw that, it was basically instant.

The whole company latched onto it because they realized this is our chance to make a difference. And I happened to be involved in that a little bit. I was one of the very first people working on deep learning back at Nvidia research back in 2012. And we were having fun and doing this research project. And like many research projects, it wasn't clear to me at the time whether this was going to be something fun or whether this was going to be something that really changed the world.

But the amount of interest in our work was really surprising. The results were good. People started paying a lot of attention to it. And I started meeting with a lot of different people at Nvidia. And they saw-- the whole company kind of saw it like, wow, this could be really big for us. And so everybody latched onto it.

MELANIE: Do you remember a specific time when you were doing that research that stood out to you that made you go, oh, wow, this is much more impactful than I expected, or a specific moment that comes to mind?

BRYAN: Yeah. So one of the things that attracted some attention, we published a paper at ICML in 2013. And this was me collaborating with a group of researchers at Stanford, including Adam Coates and Andrew Ng, where we were able to take approach to unsupervised learning that was pioneered actually in the Google Brain Team by [? Kwok ?] [? La. ?] And we were able to replicate that using far, far fewer computers.

So [? Kwok ?] had published this paper in 2012, and he needed something like 1,000 nodes in order to train this model. And we were able to train the model with three nodes using GPUs and more efficient software that was kind of a little bit more HPC-focused. And so we published this paper at ICML. And people were like, wow, this is really exciting because it democratizes the work. If it makes deep learning a thing that you can do on three nodes instead of 1,000, then the number of people that can start applying this technology to new problems just goes up dramatically. And that was something that caught a lot of attention.

MELANIE: And just to confirm, when you're talking about what you were working on, this is CUDNN.

BRYAN: So this was actually before CUDNN existed.

MELANIE: Ah, OK.

BRYAN: Yeah, so I did this with Adam and Andrew. And that was really exciting. And I started talking to more deep learning researchers. And actually the person that inspired me to make CUDNN was Rob Fergus, who was a professor at New York University, one of [? Jan ?] [? Lacoun's ?] collaborators. And I think Rob is now at Facebook. And Rob was like, look, we are doing all these convolutional models all the time, and we've written our own convolutions. But we think it's a lot of work to make them fast. And Nvidia GPUs change all the time, because that's the primary way Nvidia innovates is by making the architecture better.

But then that means that the software needs to adapt to the new architecture. And so Rob was telling me, like, we're spending too much time working on these kernels, we think Nvidia should make a library for it. And I thought, yeah, that's a great idea. So then I set about convincing the rest of Nvidia that they should make this product. And it took a little bit of work.

So while I was convincing them, I just started writing the code for a research prototype. And then as the conversations progressed, things got more concrete. And then finally, people kind of caught the vision and decided, yes, we should productize this, and that became CUDNN.

MARK: So this sounds really cool. I know, as we said before on the podcast, Melanie's the [? ML ?] expert, and I'm sort of the person who--

MELANIE: You're the gaming expert.

MARK: --who barely understands math. For those people who aren't as familiar with this ecosystem and maybe are more like, doesn't Nvidia making game cards for stuff--

BRYAN: We do, yeah.

MARK: Yeah. But, like, how does that translate to doing-- like what is that pathway for like, I have a graphics card. I thought it was just for graphics. Why is it also applicable, or how is it also applicable to doing what seems to be like intense mathematical calculations?

BRYAN: When you draw a picture in a 3D game, you're essentially running a lot of programs that figure out what every pixel looks like. And over the years, those programs have gotten more complex, as you might expect, and also a lot more flexible. It used to be that GPUs were very fixed-function machines, where you kind of just gave it a list of triangles and it rasterized them.

But nowadays, to do the complex lighting and texturing and reflections and so forth, all that is done by writing programs, but very parallel programs. And so it turns out actually that the workload in modern games often is, majority of the time is being spent running compute programs rather than running traditional graphics workloads. And so there's been an evolution in graphics towards more flexibility and more parallel computing and less fixed-function graphics. And so GPUs have had to evolve to support that. At the same time, if you have a really high throughput parallel processor that can be used for arbitrary computation, it turns out there's a lot of things that you can do with it, including AI.

MELANIE: Well, and I'm curious, how did you even get involved with parallel processing compute? What inspired you to go down this path to begin with?

BRYAN: Yeah. When I was an undergrad, I did some internships at Intel, and I was working on circuit design. And I made some really, really fast circuits that were going to run at like 10 gigahertz. And this was for transistors that hadn't even been built yet.

So we were kind of dreaming about the future and trying to build it. And we kept figuring out that the power consumption was going to be impossible. There was just no way that we were going to be able to make single-threaded computer that held onto the performance scaling trends that everyone expected.

So I saw at Intel this realization that the world of computing in the future must become parallel. And so when I went to graduate school, I decided to focus on parallel computing frameworks for programming parallel computers and applications of parallel computers. And the one that really struck me as most promising was machine learning. And so that's how I ended up spending my PhD on programming frameworks for parallel processors and machine learning applications, because I felt like that was the most important thing you could do with parallel computer.

MELANIE: Nice. And then I know in terms of your background, here you're working on CUDNN. Then you made a shift in you and you joined Baidu. And you were working on machine learning with Baidu. And then you've come back.

BRYAN: That's right.

MELANIE: What's led to that trajectory?

BRYAN: Well, so I mentioned before that my kind of introduction to the world of deep learning happened with Andrew Ng and Adam Coates. They founded this lab at Baidu, the Silicon Valley AI Lab, and asked me to come join them. And I felt like it was a great opportunity for me to get closer to applications.

I really loved working on CUDNN. I'm very proud of it. And I felt like I want to be a little bit closer to how we take AI and make it into something that solves a problem that people really need to be solved. And so I felt like working with the Silicon Valley AI Lab at Baidu gave me a great chance to get closer to applications. And then I'd been there for about 2 and 1/2 years, and I felt like I had accomplished my mission, was looking around to figure out what my next project should be. And Jensen Huang, Nvidia's CEO, called me up and said, hey, we'd really like you to come back to Nvidia and found a lab focused on research figuring out how to apply AI to problems that we have in Nvidia. And I thought, well, that's great, because I love Nvidia.

I've always been sort of an ally of Nvidia because I believe in the technology. And a chance to come back and focus on the applications that I really wanted to do, it seemed like a great opportunity.

MELANIE: What kind of application do you work on?

BRYAN: So my team right now is mostly focused on three areas. One of them is graphics. One of them is text. And the other is audio.

MELANIE: Nice.

BRYAN: So with graphics, we're really excited about the opportunities to change the way that graphics is created and rendered by using deep learning. We have a paper actually in this year's CVPR that is about a generative model for creating graphics, photorealistic, high-resolution graphics, given very simplistic input images. And so the idea would be that we could create a new kind of graphics engine, where the traditional graphics is actually doing very little work at all. And the majority of the graphics rendering is being done by a generative model. And the advantage of doing that would be that we believe it will be far cheaper to create virtual worlds.

Because in order to create a new world, we just need to train a new model rather than spend a lot of money with an army of artists that can manually create all the textures and lighting that go into that environment. So we're pretty excited about that. And we feel like there is a lot of applications for AI in graphics. And so that's one of the things going on in my group.

We're also working on some text problems. We're really inspired by this OpenAI paper doing unsupervised learning for text classifications, specifically sentiment analysis, where you train a model to reconstruct a large text database character by character. And it turns out that in so doing, your model has to understand the meaning of the text, not just the characters in the text. And you can use that with just a little bit of labels to solve some really applied problems, like sentiment analysis. And so we're working on stuff like that. And then in the audio space right now, we're working on some things related to speech and speech synthesis.

MELANIE: So in terms of doing this type of work, you're trying to also set it up to make it accessible to users of the GPUs, I assume.

BRYAN: Absolutely, yeah. Great things about working at Nvidia is that we get to open source a lot of the work that we're doing. In fact, all the projects that I'm mentioning, we have either open sourced or we're in the process of open sourcing. And one of the reasons that we can do that is because the more we inspire other people to do AI, than the better it is for Nvidia, because so much of AI happens on GPUs. And so we can make the world a better place by giving people new technology that they can use to solve problems. And it also helps Nvidia's business. And so it's a great thing for us to open source our work.

MELANIE: Right. And then I know you've done the work in the past on CUDNN and really driving software to help people optimize working on GPUs. But then there's GPUs that have been actually built, specifically Volta, to help with deep learning. Can you speak to you a little bit Volta and how it works and helps people improve their processing of deep learning?

BRYAN: Yeah. So Volta is the first GPU that end Nvidia made after it woke up to the possibilities of deep learning. And Nvidia believes, I think, more strongly than any other organization that I've seen in the power of specialized architecture. That's kind of really what has made Nvidia successful from the beginning, as the company builds chips to solve important problems and that are tailored to those particular problems.

So when Nvidia decided deep learning is the most important thing that we're going to be using our GPUs for, the next step was to customize the architecture for deep learning. And Volta's the first fruits of that effort. And so the computational capabilities of Volta are really astounding. One Volta GPU can run at over 120 teraflops when you're training a model. And that's about 10 times faster than the speed that you would get on our prior generation GPU. And so that's a really exciting capability.

MARK: How does the architecture for something like Volta differ from something like graphics GPU?

BRYAN: Volta is different in a lot of ways from our previous GPUs. It's just a lot more focused on compute workloads. So especially the memory subsystem has been beefed up to handle the kinds of data patterns that are required in a lot of compute workloads. It also has a lot faster integer arithmetic, which is really useful for addressing math and more, I would, say complicated workloads, where you need to do a lot of indexing math to figure out what data you're going to load and operate on.

So the whole architecture has been revamped. And then in addition to all those other features, we added tensor cores, which are essentially matrix multiply units that can do small matrix four-by-four-by-four matrix multiplications. And the thing that's really great about tensor cores is that they dramatically reduce the energy cost of reading and writing the operands from the register file, because the majority of the work happens inside of this bigger, chunkier unit, and you don't have to be constantly reading and writing all the intermediate results to a register file. And so that dramatically reduces the energy that's required to do a matrix multiplication, which then makes the processor dramatically more efficient.

MELANIE: Nice. So you mentioned about teraflops. Can you help, for those who might not understand teraflops, exaflops, explain a little bit there, I guess, in layman's terms what that means and what they can really conceptualize in terms of speed?

BRYAN: Yeah. Well, it's hard actually for humans to understand these numbers because they're really big numbers. So is tera is 10 to the 12. So 120 teraflops, that's some 120 trillion math operations per second, which is a lot.

MELANIE: Yeah, that is a lot.

BRYAN: And training a deep learning model can often require exaflops. So after tera comes peta, so that's 10 to the 15. And after peta comes exa, which is 10 to the 18. And so training a deep learning model routinely takes tens of exaflops.

I like to point out that people often estimate the number of grains of sand on planet Earth at about 10 exa-grains of sand. So if you're doing 10 exaflops, it's kind of like a multiplier and add for every single grain of sand. And you do that every time you train a model. So it's extremely computationally intense.

MELANIE: And it turns it around-- I mean, the speed at which it can turn around some of these models that are becoming larger and larger and have more data that they're trying to process, it has its impact. It's very important at the end of the day.

BRYAN: It is, yeah. I mean, when you're trying to solve a problem with AI, you have to iterate because it's a search. You have to find the right model. You have to find the right hyperparameters. You have to find the right data. And that requires a lot of search. And so as you create a model and you train it, you learn things about it.

You learn when it is doing well. You learn when it's not doing well. You find bugs. You fix hyperparameters. This requires training a model many, many times. And the quicker you can do it, the faster a developer can iterate. And that leads to better results. And so people find that having access to this kind of compute capability, it not just makes the work go faster, but it actually changes the kind of problems that they can attempt because there's problems that they can try that they just never would have been able to try without it.

MELANIE: And when you talk about training, there's training, and there's also inference. And so Volta is a wonderful GPU to be using from a training perspective. Is that also architected for inference, or do you have other options specifically? And I know the answer, but I wanted to give you the chance to--

BRYAN: Yeah. So let me explain a little bit of the difference between training and inference first.

MELANIE: That would be great.

BRYAN: So training is a search, like I said. You're trying to find a model. So the model may have hundreds of millions or billions of parameters, and you need to fine-tune every one of those to find the right number. And so it's a search, and you're searching for it, guided by your data. And so we often talk about back propagation.

What that means is you do forward propagation, where you use the model to make a guess about what the output should be. Say, if we're doing speech recognition, the input is a WAV file, and the output is text. So we run a bunch of WAV files through, and we get the guesses for what those WAV files contain.

But then we know what the right answer should be. And we have what's called labels. We can use those labels to figure out what the error was. Where was the model wrong? And then we use that to update the WAVs. Just tweak them just a little bit. And we do that a million times or a billion times, right? And so through this process-- it's kind of evolutionary-- we search through this space to find a good model. So that's the training process.

The serving process, or inference process, that's just performing the forward propagation. Or given an input, what is the output of that model. So we're just pushing the data forward through the model.

MELANIE: Just the prediction or the classification.

BRYAN: Just the prediction or the classification. So that computationally is simpler, because it's only doing the forward pass. It's not doing the backward pass, and it's not doing the learning. But it has different characteristics algorithmically. And so it makes sense to have software and hardware that is tailored for both of them in different ways. And so it turns out Volta is actually a great GPU for inference.

I'm actually very excited about using Voltas for inference. The biggest problem, I think, right now with Voltas is that there's just not enough Voltas in the world. And all the Voltas that we make get sucked up to go into training because people need that so much.

Whereas inference, workloads are often smaller, and so people can get by with smaller processors. But I do believe in the future that we're going to see big inference workloads that require even a Volta to deploy. So I am excited about both training and inference for Volta.

But Nvidia also has other products specifically designed for the inference market, for example the Tesla P4. It's a small-- I think it's a 50- to 75-watt GPU. And it has some low-precision arithmetic instructions. Because in inference, you don't need nearly as many bits in order to get a good result as you do during training. And so the Tesla P4 is a really good ops-per-second per-watt processor for the inference market.

MARK: You said something interesting just before, where you were talking about with the newer GPUs, you were able to do things you were possibly able to do before, possibly inspirationally. What are the things that you're able to do now that you weren't actually able to do before?

BRYAN: Yeah. Well, I'll speak a little bit to graphics because we're really excited about that at Nvidia. But recently there's been some incredible generative models for graphics that we've seen, including the progressive GAN, which generates very high-resolution, very high-quality images. They trained it on the celeb face data set, and the faces that it generates are pretty indistinguishable from real pictures of people, even though they're completely invented by this model. And I think the leap in fidelity for GANs, when you are able to train them on larger data sets for longer periods of time, is quite incredible and changes the kinds of applications you can do with GANs.

The paper that I mentioned that we've been working on about synthesizing photorealistic images from semantic labels, that work also, I think, is pretty surprising. Like, I often look at the output of that model, and it takes me quite a long time to figure out is this a fake image, or is this a real image? And that just would not have been possible without this processing power.

MELANIE: When I first met you, you were doing a presentation on parallelizing data for-- models versus paralyzing a model, and there's different methodologies in terms of how do you split out, especially if you're trying to do mass processing of deep learning model, in particular. How do you break that out if you wanted to try to do distributed computing?

What are your thoughts now in terms of distributed computing for deep learning? Do you see that as really a thing to do? Or do you see just making these chips better?

BRYAN: We need both. So the great thing about making the chip better is that it plugs into the software environment without disruption, or the disruption is much less. And so that allows people to smoothly move from one generation to the next and get performance improvements without having to do big changes to their code. And it also-- there's an accessibility argument as well.

A lot of people doing AI research are operating on a smaller scale. And so if we can make the small scale processing better and better, then that really helps accessibility and sort of democratizing this technology. And then for people that have the resources to use more processors to train a model, then being able to have the software scale to using many GPUs to train one model provides really compelling benefits. And so we need that as well. And we focus on both those things at Nvidia.

MELANIE: Nice. You mentioned a number of different groups in terms of their work and what's going on out there in the deepening space and the machine learning. How much are you collaborating with other outside groups? How do you collaborate with like OpenAI, Facebook, Google?

BRYAN: Yeah. Nvidia collaborates with pretty much everyone. I think we work pretty hard on our relationships with research labs and academic labs and industrial labs. The way we collaborate, we like to basically help people solve problems.

So people often come to us and say, we're having problems getting good efficiency with this particular model. Can you help us? And we'll look into it and realize, oh, CUDNN is choosing the wrong implementation for this particular convolution in this particular case. So we can change the heuristic and make that more efficient, or we can maybe add a new implementation for one of those primitives that turns out can be a lot better fitted to the architecture. And so we're constantly iterating with researchers from all over the world about what their needs are and how we can make our products better.

MELANIE: What are you most excited about in terms of where things are going with AI and machine learning?

BRYAN: I'm really excited about applications. That's what I'm doing, right, for a living. I feel like there are just so many new ways of applying this technology to pretty much every field inside the world economy. We talk a lot about self-driving cars, and I'm very excited about self-driving cars.

I think self-driving cars are one of the applications that are kind of on the vanguard of really changing the world. I'm really excited about never having to find a parking spot again and, like, building houses where all those parking lots are. That would be really great here in the Bay Area, where there's not enough space.

MELANIE: Yeah.

BRYAN: So I think everybody's excited about self-driving cars. But I think that's just the beginning. I think every industry, from agriculture to education to art, there's just going to be an explosion of applications of AI. And it's exciting to be part of that.

MELANIE: And for those who are starting out wanting to get into it, I mean, even just doing chip design or trying to get into parallel processing and deep learning, do you have any recommendations or any things that you'd say, like, spend some time working on or are some resources that come to mind?

BRYAN: Yeah. Well, let's see, there's a couple of things on my mind right now. One of them-- so I saw Google put out a class recently and sort of an introduction to machine learning. I looked through that briefly. It looked pretty good. So that seems like an interesting thing to look at.

Nvidia also has something called the Deep Learning Institute, where we give tutorials to help people get hands-on experience training models and using deep learning frameworks. I think that that has been pretty interesting for some people. I'm also a fan of some of these online education programs. For example, I work with Udacity.

Udacity has some great nano degrees in AI that I think people can get a lot of value from going through. And then, I guess, the two other things that also come to mind are open source. So there's just so much out there. If you search on the open source on GitHub, you can find implementations of all sorts of really amazing deep learning models, and you can download them for free and just start playing around with them.

You don't even need to build a machine that has a GPU unit. You can use a cloud service, like the Google Cloud. And [INAUDIBLE].

MARK: Nice. Thank you

BRYAN: And you can get started really easy, right? You don't have to invest a whole bunch to jump in. And then, I guess, the last thing that I wanted to mention also, I feel like I'm advertising for Google here, but it is actually a really good thing is Kaggle.

So Kaggle is this online competition, where people can try to solve real problems using AI. And as somebody that's always looking for talent to come help us apply AI to new problems, when I see a resume with somebody that's been competitive in Kaggle, I feel like, wow, this is a person who is excited about jumping in to solve problems and is very hands-on. And I think Kaggle is a great way to learn practical skills about applying AI.

MARK: Awesome. I also have a question before we end up wrapping up. As someone who's been involved, especially on, like, the computational side of things for such a long period and seen the history, I'm curious to hear about what's been the most surprising thing for you or most surprising application you've seen come out of this stuff, or maybe even the most interesting or weird or wacky?

BRYAN: Well, one of the most surprising and weird and wacky papers that I saw recently was this one called "The Deep Image Prior." Basically what they're doing is instead of training a deep learning model on, like, a million images, they train a deep learning model on one image. And this is a super crazy idea. But the results are actually kind of compelling.

They're able to do some tasks, like image completion. Like imagine that you have, like, a part of the image that's damaged and you want to repair it. They're able to fill in that image by training a deep learning model just on that very single image all by itself. I thought that was pretty crazy.

I think there's a ton of interesting applications in agriculture. I remember this blog post from Google, actually, about the cucumber sorting. Do you guys remember that? I think that--

MELANIE: One of our colleagues put that one together actually, yeah.

BRYAN: I love that blog post. I always reference it because it was very surprising to me that I don't know too many cucumber farmers. So my image of them is maybe a little wrong. But I just wouldn't imagine a cucumber farmer picking up TensorFlow and training a model to solve a problem that they had on their farm. And I thought that was really cool.

MELANIE: Agreed. That was definitely a fun one. Well, as Mark was mentioning, I know we're getting short on time. But was there anything else that you wanted to touch on our cover before we let you go?

BRYAN: Yeah. I guess one of the things that I think is really important in the advancement of AI is the software environment. Maybe that's obvious to say. But we still have a lot of work to do to make it better. And the environments that we have now represent a huge amount of investment. And I think sometimes we take them for granted.

But I just wanted to highlight that as we progress forward, there's a lot of different ways of increasing compute capacity for deep learning and for AI. And I think that it's important for all of us to remember that it has to, sort of, connect all the way to the developers. And the way that happens is through the frameworks. And so I wanted to, kind of, give a shout out to all the people that are building TensorFlow and PyTorch and all the frameworks that people use to train and deploy these models, because I think that that software stack is incredibly important and it's changing rapidly. And I think we're going to see a lot of advancements there that make deep learning available for even more applications.

MELANIE: Nice. Well, Bryan, thank you, again, for joining us. We really appreciate it.

BRYAN: Great. It's been fun.

MARK: Thanks.

MELANIE: All right, well, thank you, again, Bryan. That was great to talk to you about what's going on at Nvidia and what kind of research your group is doing. And as we mentioned in the cool things, we have the Nvidia Tesla V100s that are the world's most advanced data center GPU that's going to be available in the Google Cloud.

Now I think it's time for us to dive into Question of the Week. And so the Question of the Week roughly is, Sherol, Mark, you both have been GDC a lot. And you know a lot about the conference and the community. So what can you tell us as experts of the space?

MARK: I think Sherol should go first. She's been going for way longer than I have.

SHEROL: Yeah. I think at some point in my computer science undergrad, I was like, the reason I'm in this was to make video games, ever since I was five years old or something. And I was like, where do I go to find out? So I went to my first GDC, was, like, amazed. Because back then, when I was an undergrad, people were like, making games? Like, you can't do that for a living. And then I found this huge event with like tens of thousands of people, where that's what everybody did and have been going for 10 years to GDC.

MELANIE: Nice. And how long have you been going for, Mark?

MARK: Like two years, three years. [LAUGHS]. I feel like I'm still taking baby steps compared to Sherol.

MELANIE: Well, what's your favorite thing about going to GDC?

SHEROL: I think this is probably similar to Mark. We both love games. We love playing games. We like the idea of that creative space of making games. We love new experiences that come through games. And GDC is where you kind of find the people that are like us, people who are out there creating and innovating in this space. And after a few years, it just becomes about connecting with developers.

MARK: Yeah. I would echo that 100%, what Sherol said. Definitely, it's the people, the diversity of people and the backgrounds that they have is wonderful. You meet so many people who come from academic backgrounds with a variety of research, people who come from creative backgrounds, from writing backgrounds, from UX, from programming. It's always an interesting mix of ideas and people coming together to produce some often very interesting things.

MELANIE: Granted, we're releasing this on a Wednesday, as always. It's in the middle of the week, in the middle of the conference. But what is some of the activities that you're like, you must go to this, or you must check this out when you go to GDC, especially if you're new to GDC?

SHEROL: So for me, I think one of the greatest gateways to GDC is the parties. It's great because after going once, you immediately notice that it's divided up into communities, as well as a whole gathering of game developers. And honestly, if you're there, I think the IGDA group, International Game Developers Association, is always putting out events to kind of cultivate a great community within developers in games. And finding a community to plug into is actually not hard. It's just meeting one friend to the next friend to the next friend. And eventually you'll be having to be triple booked because you're part of so many different communities.

MARK: So I'll tell you the story here, since Sherol's here and I think it's [INAUDIBLE]. So the first year I went to GDC, I was there for Google. I staffed the booth, did that. It was all fine. I had a good time. It was great.

Next year, Sherol joins Google. And--

SHEROL: We actually-- wait, wait, wait. We actually met at that booth, right?

MARK: We did meet at a boot that year, yeah. It was before you joined.

SHEROL: Right before I joined, I was at GDC. And I decided, I was like, oh, I'm going to go work for Google. I had already been in that process. I had met with the hiring manager. Things were looking good. And then I was like, I'm going to work on [? dev rel ?] for Google. And I got to meet Mark for the first time. But I think that was the first time we met.

MARK: That's true.

SHEROL: Yeah.

MARK: That is true. That is true. Anyway, yeah. So second year that I go to GDC, Sherol's on the team. And I'm like, we're going to GDC. And she's like, cool, I'm coming to and doing stuff as well. And she was like, but let me let me show you how to GDC. [LAUGHS] And it was at that point she introduced me to the fellowship of GDC parties. It's a group on Facebook.

Every year they get together and do this crowdsourcing of all the parties at GDC. She introduced me to all the CAs that are all the volunteers that run-- they're not volunteers actually. They're employees. They run all of the events. You see them wandering around, helping people out. But, yeah, Sherol was amazing in really, sort of, taking me to that next level of GDC. She took me under her wing and really kind of showed me the ropes.

MELANIE: She was your sage or your--

MARK: Yeah. She was my mentor.

MELANIE: Your mentor through the process, that's fantastic. Well, that's great. And anything else you'd say, like, make sure to plug for GDC for this year that you're like hey, don't forget about?

MARK: Always, always, always, always go to All Control GDC. That's the experimental area. I love that place. You always see some of the weirdest stuff. I think last year I saw a game that involved people having to blow into a box. And that was your interaction. And there were two people on either side, and it was just weird. There's a lot of weird stuff. But it's also wonderful when you see some really creative stuff there.

MELANIE: That sounds fun.

SHEROL: That also reminds me of the experimental game design challenge that Robin [? Hunecke ?] does with a group of people. And I think the experimental stuff is definitely very cool because it's the most forward thinking of the stuff that you'll see. You also see really cool postmortems of some of the biggest hits of the last year, which is really interesting to dissect as well.

MELANIE: That's fun. So in terms of where people will find you guys next, where will people find you?

MARK: So at GDC itself, so we're sponsoring the Women in Games International Party, which was yesterday. So that doesn't help anybody. On Wednesday itself, I will be talking at 3: 30. There's a sponsor session we're doing with Firebase and Google Cloud. So we're doing a little what's new session, where we're going to talk about a bunch of stuff there. Thursday, I will also be talking at the Google booth, doing a little talk on [INAUDIBLE] at 12: 30 and then doing an Ask the Expert session after that for about an hour as well. Sherol, I know you're up to a bunch of stuff. What are you up to?

SHEROL: Yeah, actually, GDC, we're in the middle of GDC right now. There's been really amazing events that have happened. At Google, we sponsored Blacks in Gaming Awards Ceremony. That happened yesterday. Also we're really happy to have been sponsors of Serious Games Awards. These are both IGDA events. IGDA is an amazing organization that does a lot in the community, of course.

My talk was actually yesterday. I talked about machine learning in games, just looking forward. I mean, Wednesday's kind of the big day. This is where the majority of the people show up.

So Wednesday, tonight, it's going to be-- it's going to be a lot of parties. It's going to be a lot of parties tomorrow night. And I'll just be kind of hanging around catching up with people. A lot of this is going to be lunches with different groups and different collaborators that I've had in the past.

MELANIE: And you both were just saying before we started recording about how this year it's very heavy machine learning based. Is that right?

MARK: I don't know about heavy, but I've definitely seen that influence there. There's so many sessions that at GDC for a variety of stuff. But I'm seeing some roundtables. I'm seeing a bunch of talks about neural networks. I think there's going to be a definite crossover between from AI from a game sense and then AI from a machine learning sense. What do you take on that? You definitely have much more introspection into this area than I do?

SHEROL: Yeah. Well, also one of the things that we're sponsoring is the AI Game Developers Summit. They have a reception. So we're not sponsoring the summit itself. But, yeah, it's interesting because I think there's been apprehensions of deep learning in games, because in games it's really easy to create cognitive dissonance in an experience. It's interesting that, like, now we're in a day where deep learning's effectiveness has gone up by so much that we're trying to figure out how to leverage this technology to create better, deeper, more immersive games.

MELANIE: So I will ask one last question for both of you, favorite game?

MARK: Oh, that's easy. So for me, I still put it out to the "Mass Effect" series. I'm just a huge fan of that whole world. I love the writing, love the direction. But just a huge fan.

SHEROL: Yeah, for me, "Final Fantasy." My Twitter handle is FFPaladin. And "Final Fantasy" has just-- it's the game that made me want to make games. So it's got to be a favorite. But I have to say I've been playing a lot of "Hearthstone." That dungeon crawl is so good. I don't know if you've played it, Mark, but it's really good.

MARK: Not in a while, actually. I'm in the middle of a big Bioware binge, talking about "Mass Effect." I've been playing "Dragon Age Origins." So I go back to the beginning.

SHEROL: Oh, my gosh.

MARK: I've gone back to the beginning. And I'm just going to play all the way through all of it. That's just-- and I'm just having so much fun.

SHEROL: Yeah. I've been wanting to play that series. I've heard a lot. I haven't actually played that many Western RPGs.

MARK: It's good.

SHEROL: But "Hearthstone," you have to try it out. Like, just log in and do the dungeon crawl. It's, like, excellent game design and really fun. They did a really good job with that.

MELANIE: Nice.

MARK: Cool. I will check it out.

MELANIE: Well, cool. Well, anything else you both wanted to talk about in regards to GDC that we didn't already covered?

SHEROL: My last advice is just bring a lot of business cards because you end up meeting so many people, and you'll leave with a huge stack of contacts.

MARK: Definitely.

MELANIE: Well, great. Thanks again, Sherol. I'm so glad you were able to join us. And I'll do one little last plug that has nothing to do with GDC and instead mentioned GTC, which is Nvidia's technology. It's their GPU technology conference that's coming up. And that's going to be next week. So just give a little shout out to that. And then Sherol, you had one other thing?

SHEROL: The TensorFlow Dev Summit, it's next week. TensorFlow Dev Summit is coming up. And that's going to have livestream talks. And the talks will definitely be there afterwards. I'm actually speaking about machine learning research and creativity. So definitely check that out as well.

MELANIE: Thanks for covering it. Anything else, Mark, that we wanted to mention before we close out this week?

MARK: I plan on sleeping a lot after GDC.

MELANIE: So no one can find you.

MARK: Yeah.

MELANIE: Don't look for you.

MARK: I'm taking some time off.

MELANIE: You're finally [INAUDIBLE] GDC.

MARK: I'm done.

MELANIE: You are done for the rest of the year.

MARK: Awesome. Well, Melanie, Sherol, thank you so much for joining us for yet another episode of the podcast.

SHEROL: Thanks so much for having me.

MELANIE: Thank you.

SHEROL: Take care.

MARK: Wonderful. And thank you all for listening. And we'll see you all next week.

[MUSIC PLAYING]

Hosts

Mark Mandel and Melanie Warrick

Continue the conversation

Leave us a comment on Reddit

NVIDIA and Deep Learning Research with Bryan Catanzaro

Bryan Catanzaro

Cool things of the week

Interview

Question of the week

Where can you find us next?

Transcript

Hosts

Continue the conversation

Up next

OpenCensus with Morgan McLean and JBD

Questions

Cool Things of the Week