Aggregated Knowledge: Obligatory ML/AI blog

I was told by my NVIDIA rep that I am on the right path and at the very beginning of it. I'm trying to set up some basic AI dev environments for people to screw around on and start thinking about what they want to focus on if they decide to create a project. Enterprise grade AI dev environments are really big compared to what you might have at home. This isn't a single user with a graphic card running TensorFlow. Nope, it will be about 5 users with graphic cards running TensorFlow. I am using it as a method to determine demand before I go and grab whatever purpose built chipsets I can get my hands on. Since I am looking into new methods to support potential customers, I figured I should share my very basic research into AI.

First off, I'm using nvidia-docker. This decision was made lightly, because it provides the portability and resource configuration capabilities that I need. Also, I am using a 1070 in my system at home. Second, I am using TensorFlow. The other major offerings come from companies like Facebook, which don't deserve the leniency that Google has earned after years of not being offensively obvious about data mining. There are more blogs about setting it up than I can link, but I will share a few things with you about this so I can complain on a public forum. Not many instruction sets include merging your docker config file and there have not been many updates recently (as of this writing). Let's get to it. Nvidia-docker install with a merge to make it work!

Aggregate 1:
The basic install instructions

Aggregate 2:
Adding the NVIDIA runtime to docker

After these steps are followed, in a bit of an aggregate 1-2-1 kind of way to make nvidia-docker work, you should have a working copy of TensorFlow. The annoyance after that was Chrome becoming unresponsive. I have netdata running on my box, and there was absolutely nothing out of the ordinary the first few times I ran TensorFlow and had it crash my browser. I then opened a new Chrome window exclusive to Jupyter notebook and everything ran fine. One thing stuck out, and it was mildly hilarious. TensorFlow complained when the amount of available RAM dropped below 4GB. It was in the shell that I had executed from initially, but I looked at it wondering how often it would show up in the logs when placed into a dev environment. Not to sound like a crusty old admin, but 4GB is a lot of RAM.

Once the Jupyter notebook was accessible based on the instructions, I went ahead and ran all of the sample notebooks. Running them in their own window earned me the ability to complete all notebooks without having chrome lock up. At the end of notebook 3, they mention that you can run one of the last sets multiple times to increase accuracy. It does do that, but it will take some time. My graphic card chews through it in under 5 seconds, but validation error decreases by about .1% after every 5-10 runs once you hit the 1.5% test error threshold. Next steps, figuring out how to export what the system learned. Not a good problem to have, considering that many users can't get their systems to complete the sample notebooks. That means documentation goes down by about 90%, and I may not have much to share that isn't behind a pay wall.

What I have learned from the obligatory AI test is that there is still a whole branch of computing that requires support and security that may be overlooked for the time being. But it is coming, and we should all push for this to become a highlight with the admins. Typical AI uses revolve around big data. We should also focus on data security regarding input and output to ensure that we do not process bad data or present bad data. Hardware requirements and messages/errors need to be able to be tuned so that we can capture and configure how the data is processed and determine if there are problems. These issues are form fitted to the environment, so it will take some effort to determine what is appropriate.

After roughly 20 tests, I have a 0.8% error rate in determining what a hand drawn number is. I will blog again soon about how to push that knowledge outside of the AI container.

Aggregated Knowledge

Friday, May 25, 2018

Obligatory ML/AI blog

No comments:

Post a Comment

3d design for printing

Report Abuse