StarTrekBots.com: Part 1- The Introduction

Welcome to blog series where I talk about how I made my latest toy startrekbots.com. It was a fun little project and I have already pissed away too much of my life watching it babble on (literally) mindlessly.

In this series of three blog posts I’m going to discuss at you (1) high level of the math (with links if you care to dive deeper) (you are here), (2) how I made the three iterations of “bot”, and (3) the technology required for training the Star Trek Bots, as well as for web hosting them. Odds on you will primarily care about one of these facets and the other two will give a passing read at best. Good News Reader! I wrote them all to be high level overviews with lots of links for you to get started on a dig in if you care to.  But first I’m going to lay out some context.

Varieties of Chat Bots.

Retrieval Bots. Odds on you have interacted with lots of Retrieval Bots, online and in some cases even on the phone (my bank uses a decision tree bot when I call in).  A decision tree bot may have menu buttons or it may try to infer which path on the tree to take by asking you questions and looking for key words in your response.  These are handy for common chat bot applications such as support chat bots, Alexa / Google Home, etc.

Generative Bots. Generative bots, you likely do not have as much experience with. They are harder to program, and like small children, you never know what they are going to say next. We will delve into the differences in part two, but suffice to say here, generative chat bot is a bot that “reads” the chat up to the current point and “generates” the next response (as opposed to retrieving it from a repository). 

In practice that means, after an initial few lines of text to “seed” the document, you’re always going to get new content, because the algorithm is “making it up” as it goes. Specifically, this bot takes the last 100 “words” of “script” and generates the next 15 “words”.  More on this in a bit.

Haters Haters Everywhere.

There’s always haters everywhere. And I’ve gotten plenty of hating on my dear little bots that they don’t develop complex plot structures. OK, that’s fair. But let’s take a bigger step back and look at how humans come up with stories.   In general we “scaffold” that is we have an idea- maybe a one liner. Then we come up with an outline of what will happen in each act.  Then we outline those acts into scenes.  Then we fill in some actions and dialogue for each scene (I know I’ve oversimplified all this, but bear with me.).  What we don’t usually do is tell the entire story on the first pass making it up as we go.  That is, UNLESS we are doing improv comedy.  I do some improv (at least I take classes because Chicago is the greatest improv city in the world and it would be a waste not to.)

In some long form improv, more specifically in The Harold, an entire story in three acts with three to five scenes per act is invented on the fly. (I’m also butchering the finer points of a Harold, but whatever, it works here sharp shoot me in the comments).

That said, to all my hater friends, it is probably more appropriate to compare startrekbots.com to something like The Improvised Star Trek than professionally written story arcs of The Next Generation or Deep Space Nine.  I think we can all agree however, that startrekbots.com has far superior writing to whatever interns did the later seasons of Voyager.

But in all seriousness, The Improvised Star Trek is one of my favorite pod casts, and considering you’re already this far in, you might as well start listening to it.

Emergent Properties

There is ironically a Star Trek episode that deals with this explicitly. In the episode Emergence, the Enterprise is trying to express its self through Holodeck characters from various people’s programs on an old-timey train ride to New Vertiform City.  It’s a fine episode and I recommend you watch it rather than take my hacky summation.  A key point though is that Data succinctly describes emergent properties:

Complex systems can sometimes behave in ways that are entirely unpredictable. The Human brain for example, might be described in terms of cellular functions and neurochemical interactions, but that description does not explain Human consciousness, a capacity that far exceeds simple neural functions. Consciousness is an emergent property.

– Lt. Cmdr. Data

You can play with the chat bots and argue they have a very short attention span, and seem to just sort of bable from one topic to the next.

But here’s the thing- the have a short attention span. They reference other characters from their universe. For the most part, Data doesn’t use contractions.  If you try to do a “cross over” episode, they will quickly start using characters who spanned both episodes, and after a bit, you’ll be in the new series’s universe (Sometimes this even happens by accident).  When you get into a battle, you will be in a battle for sometime (though they will do weird things like “lowering sheilds” just as they’re about to be fired upon)..

This chat bot may be babbly nonsense, but it is coherent babbly nonsense.

I don’t want to give the thing credit where none is due, but considering it learned English from Star Trek, and at the nuts and bolts level, it is nothing more than addition and multiplication of vectors and tensors that was trained with a small amount of calculus, I think all of this is a really amazing emergent property.

Conclusion

The point of this post was to whet your appetite for things to come in future installments of this series.  All of the math will be in one post and it will be intentionally high level and easy to understand.  All of the tech will be in another (also high level), as will an entire post on differences between bots, and the various bots I fielded before startrekbots.com.

I think generative chatbots stand to be a really exciting disruptive technology, though probably not for another decade or so (maybe less).  Something akin to The Diamond Age maybe? Thanks for reading and be sure to follow @rawkintrevo on twitter for updates on future posts in this series, as well as whatever fun new toy I invent next.

PS: Two last little points I want to make before you run off to play with my toy:

  1. The algorithm needs a few rounds of dialogue to “warm up”.  Don’t talk to it twice and then get miffed that it’s not doing anything fun.
  2. There is a “Let it Ride” button in the top right. Hit that and just let it go for a while (especially good for warming up).

 

Behind the Scenes Pt. 4: The K8s Bugaloo.

Let me start off by saying how glad I am to be done with this post series.

I knew when I finished the project, I should have just written all four posts, and then timed them out for delayed release.  But I said, “nah, writing blog posts is future-rawkintrevo’s problem and Fuuuuuuug that guy.”  So here I am again.  Trying to remember the important parts of a thing I did over a month ago, when what I really care about at the moment is Star Trek Bots. But unforuntaly I won’t get to write a blog post on that until I haven’t been working on it for a month too (jk, hopefully next week, though I trained that algorithm over a year ago I think).

OK. So let’s do this quick.

Setting up a K8s On IBM Cloud

Since we were using OpenWhisk earlier- I’m just going to assume you have an IBMCloud account.  The bummer is you will now have to give them some money for a K8s cluster. I know it sucks.  I had to give them money too (actually I might have done this on a work account, I forget).  Anyway, you need to give them money for a 3 cluster “real” thing, because the free ones will no allow Istio ingresses, and we are going to be using those like crazy.

Service Installation Script

If you do anything on computers in life, you should really make a script so next time you can do it in a single command line.  Following that them, here’s my (ugly) script.  The short outline is :

  1. Install Flink
  2. Install / Expose Elasticsearch
  3. Install / Expose Kibana
  4. Chill out for a while.
  5. Install / Expose my cheap front end from a prior section.
  6. Setup Ingresses.
  7. Upload the big fat jar file.

Flink / Elasticsearch / Kibana

The Tao of Flink On K8s has long been talked about (like since at least last Flink Forward Berlin) and is outlined nicely here.  The observant reader will notice I even left a little note to myself in the script.  All in all, the Flink + K8s experience was quite pleasant.  There is one little kink I did have to hack around, and I will show you now.

Check out this line.  The short of the long of it was, the jar we made is a verrrry fat boy, and blew out the limit. So we are tweaking this one setting to allow jars of any size to be uploaded.  The “right way” to do this in Flink is to leave the jars in the local lib/ folder, but for <reasons> on K8s, that’s a bad idea.

Elasticsearh, I only deployed single node. I don’t think multi node is supposed to be that much harder, but for this demo I didn’t need it and was busy focusing on my trashy front end design.

Kibana works fine IF ES is running smoothly. If Kibana is giving you a hard time, go check ES.

I’d like to have a moment of silence for all the hard work that went in to making this such an easy thing to do.

kubectl apply -f ...
kubectl expose deployment ...

That’s life now.

My cheap front end and establishing Ingresses.

A little kubectl apply/expose also was all it took to expose my bootleggy website.  There’s probably an entire blog post on just doing that, but again, we’re keeping this one high level. If you’re really interested check out.

  • Make a simple static website, then Docker it up. (Example)
  • Make a yaml that runs the Dockerfile you just made (Example)
  • Make an ingress that points to your exposed service. (Example)

Which is actually a really nice segway into talking about Ingresses.  The idea is you K8s cluster is hidden away from the world, operating in it’s own little universe.  We want to poke a few holes and expose that universe to the outside.

Because I ran out of time, I ended up just using the prepackaged Flink WebUI and Kibana as iFrames on my “website”.  As such, I poked several holes and you can see how I did it here:

Those were hand rolled and have minimum nonsense, so I think they are pretty self explanatory. You give it a service, a port, and a domain host. Then it just sort of works, bc computers are magic.

Conclusions

So literally as I was finishing the last paragraph I got word that my little project has been awarded 3rd place, but there were a lot of people in the competition so it’s not like was 3rd of 3 ( I have a lot of friends who read this blog (only my friends read this blog?), and we tend to cut at each other a lot).

More conclusively though, a lot of times when you’re tinkering like me, its easy to get off on one little thing and not build full end to end systems. Even if you suck at building parts, it helps illustrate the vision.  Imagine you’ve never seen a horse. Then imagine I draw the back of one, and tell you to just imagine what the front is like. You’re going to be like, “WTF?”.  So to tie this back in to Brian Holt’s “Full Stack Developer” tweet, this image is still better than “close your eyes and make believe”.

 

fullstack - brian holt
Brian Holt, Twitter

 

I take this even further in my next post.  I made the Star Trek Bot Algorithm over a year ago and had it (poorly) hooked up to twitter. I finally learned some React over the summer and now I have a great way to kill hours and hours of time, welcome to the new Facebook. startrekbots.com

At any rate, thanks for playing. Don’t sue me.

 

 

 

Behind the Scenes of “…”: Part 3- Making it Pretty

At my first job, I was hired to do something (for 40 hours a week) that I was able to write a script to take care of in my first week.   I went to my boss and showed them how efficient I was (like an idiot).  Luckily, my bosses were cool, and the industry was structured as such that my company charged our clients what they were paying me times 3. So my bosses wanted me to stick around, and so gave me free range to do things that would “wow” the client.

I learned in my second through 16th week, in general, no one cares how good your code/data science/blah is, if you can’t make it pretty (I was working in marketing, however I feel that the lesson carries to just about everything beyond analytics and code monkeying).

Becoming one of the beautiful people

I started monkeying around with React and Javascript a year or two ago over the Chicago winter, but it turned to spring before I got too far with it.  This summer, I got back into it. My first project was a little diddy I like to call https://www.carbon-canary.com (which actually started out at http://carbon.exposed). Carbon Canary started off with my hacking on the Core-UI for React framework, but fairly quickly out growing it and implementing many of my own hacks. It does web login with Gmail and Outlook, and will read your email for flight receipts and convert those into your carbon footprint. In these days of flight shaming, etc. a handy tool to have.  It will also chart your foot print over time, and let you add a few things “by hand”.

Carbon-Canary.com was a good first step into playing with react.

I also made http://www.2638belden.com which is a little experiment for a property I own and am (currently) trying to rent out and soon will use as a portal for filing maintenance requests.

So why am I telling you about these? Obviously to boost my google search rank.

As a corollary though, it can also be pointed out, that I’m sort of getting the hang of writing React, and recommend all “back end devs” learn enough of this so you can tell the front end people how to do their job.

Becoming ugly again…

So this whole streaming IoT engine thing was for an IBM hackathon. I figured it would be in bad taste to not make the front end with IBM’s favorite design framework, Carbon.

The hardest part about learning Carbon-react was that all of the examples seem to be for some functional version of React, which I have not really seen out in the wild (I’m using object oriented).  Beyond that, it gets real weird with CSS, and other stuff.

I’m looking at my github code now, and I see it’s been a month since I messed with this.

The main things that stand out in my mind are:

  1. Dealing with Carbon was a bad experience.
  2. There are a lot of poorly documented “features” run amok.
  3. If you never knew anything else, this framework might seem passible, but then you would have a hard time learning any other frameworks (as is common in IBM products bc <design choices>).

My solution after a week of fighting with Carbon to make the simplest things work ended up being an accordion menu and some iFrames of the Flink WebUi and Kabana board.

If I were doing this project for real or if I had an infinite amount of time to learn Carbon, I would like to create charts using React (instead of Kibana), which could make it look nice and pro. Having continued on my React journey for another month, I think making a nice looking frontend (sans Carbon) could be fairly easy done, as well as the components for submitting new jobs, etc. (e.g. replacing the FlinkWeb UI i-frame).

Conclusion / Next Time:

I know this was a short one where I mainly just plugged my other sites and dumped on Carbon, but it’s hard to write some deep cuts about a thing you’re just learning yourself.  That said, I don’t really want to be a front end person, I just want to be able to hack some stuff enough to get a series A and pay one.  If you know anyone who wants to give me a series A please have them contact me.

Next time, we’ll return to the land of Trevor having some passing idea of what he’s talking about as we discuss laying out the K8s ingresses to make all of this work on IBMCloud.

Photo Credit: Trevor Grant; https://www.flickr.com/photos/182053726@N05/48632202606/in/dateposted-public/  ;                Subject “Robin Williams Mural” by Jerkface and Owen Dippie- Logan Square, Chicago IL

Behind the Scenes of “RT’s HoRT IoT-A, An AIoT-P MVP Demo”: Part 2- The Streaming Engine

Welcome back from Part 1. On our last episode we did an introduction, spoke briefly on the overall structure of the program, discussed our demo data source, and did some foreshadowing of this week. This week we’ll do a bit of what we foreshadowed.

The Streaming Engine

Or How to Do Weird Stuff with Apache Flink

Let’s start off by looking at some pictures. Let me show you the DAG of the Flink Streaming Job to provide some motivation, and then we can go through piece by piece and look at the implementations.

Stream DAG

From the “architecture diagram” in Episode 1, this is what the Flink Job is doing.

  1. We pick up the data from MQTT
  2. We apply a sliding window doing some simple analytics on the data (Count, Min, Max, Average, Sum Squared Deviations, Standard Deviation)
  3. We join each record to the last analytics emitted from the sliding window
  4. We update the list of REST endpoints which serve the models every 30 seconds
  5. We send each Record + Last Emitted Analytics pair as a REST request to each model, and then sink the Record, Analytics, and Model Result (for each model) to Elasticsearch.

Pretty easy peasy.  If you’re not interested in Apache Flink, you can basically stop here and know/understand that Record from Divvy Bike Station + Analytics + Model Results are sunk (written) into Elasticsearch and that by continuously polling an endpoint that returns other active endpoints we can dynamically add/delete/update the models being ran on this stream of data.

Getting Data from MQTT

Initially I was using a copy of luckyyuyong’s flink-mqtt-connector in my implementation and required some hacks and updates.

The most important call out is that MQTT will disconnect brokers with the same clientID because it thinks they are old and stale, so make we have to make the clientID random. This was a particularly nasty and inconsistent bug to track down, but turns out many other have had this problem.  The solution here was just to add a string of milliseconds since the epoch.  Probably would need something more for production but this is an MVP.

String clientBase = String.format("a:%s:%s", Org_ID, App_Id).substring(0, 10);
String clientID = clientBase + System.currentTimeMillis();
mqttProperties.setProperty(MQTTSource.CLIENT_ID, clientID );

Aside from that little quirk, this all works basically like an Apache Kafka or any other connector.

Apply the Sliding Window

This was a bit of a trick because I didn’t want to hard code the data structure into the engine. I would like to eventually auto determine the schema of the json, but because of time constraints I set it up to be passed as a command line argument (but ended up hard coding what the CLI argument would be – see here).

This is important because we don’t want the window to be trying to compute analytics on text fields, and the data coming in form MQTT is all going to look like a string.

If you look at the code in ProcessWindowAnalyticsFunction you can see that function expects a schema to come in with the records, and in that schema any field that is listed as ‘Numeric’ we will attempt to compute analytics on.  Admittedly here, we are trading off performance for a single engine that will compute any data source.

Joining the Records and Last Reported Analytics

At this point, I had been doing A LOT of Java, which to be candid, I really don’t care for- so I switched over to Scala.  It’s really not a particularly interesting function. It simply joins all records with the last reported analytics from the sliding window.  You’re free to check it out here and make comments if you have questions. I realize there’s a lot of magic-hand-waiving and me telling the reader to “go look for yourself”,  for this blog / tutorial, but I am assuming you are fairly familiar with all of the tools I am using and I’m trying to give the reader a high level view of how this all fits together. If you have specific questions, please ask in the comments or email me (you can find me around the interwebs pretty easily).

The “Other” Stream

Now let’s shift and consider our “other” stream. The stream that simply polls an endpoint which serves us other endpoints every thirty seconds.  This is accomplished effectively by doing a time windowed stream on all of the events coming from the records source- throwing them all away, and then once every 30 seconds (but you could configure that), sending an Asyncy REST request to a preordained URL that holds the REST endpoints for the models.

You can see this in my github lines approx 140-159.  The endpoint that this system is hitting is being served in Apache OpenWhisk (which I absolutely love if you haven’t been able to gleam from my other blog posts, it’s like AWS / Google Cloud Functions except not proprietary vendor lock-in garbage).

You can see the response this gives here.  Obviously, a “next step” would be for this to hit some sort of database where you could add/delete entries. (If you’re to lazy to click, it basically just returns a json of { endpoints: [ { modelOneName: “http://the-url&#8221;}, …]}

Merging it All Together and Making Async Calls

Now we  bring everything together.  From one stream we have Device Event Records and analytics on those records, from the other we have a list of URLs which will serve models.  Now- it’s worth pointing out here, that while not implemented in a “real” version an easy add would be to have another field in the model name that specifies what devices it applies to- since that model is expecting certain input fields and different devices will have different fields. Again- a MINIMUM viable product is presented.

The rest is pretty simple conceptually- for each record/analytics item- it goes through the list of all applicable URLs (in this case all of the URLs), and pings each with the record and analytics as the payload. The code is here and may be more illuminating.  The magic happens in the main program right here.

The Async ping models is nice because as different requests come back at different speeds, they don’t hold up the rest of the stream.  A bug/feature can be introduced though if you don’t want the entire program to go flying off the rails if there is a single REST request.  To do that you must set the “timeout” of the Async function, my choice was to “ignore” but you could in theory re-request, allow up to X fails in a Y time, etc.

Conclusion

I want to state one more time- that this was a lot of waive-my-hands magic, and “go look for yourself”dom.  I probably could have made a 5 part blog just out of this post- but 1. I’m trying to write a book on something else already, and 2. the point of this blog series is an over view of how I built a “full stack” IoT Analytics solution from scratch part time in a couple of weeks.

Next Time

We’ll follow our hero into his mis-adventures in React, especially with the Dread Design System: Carbon.

See you, Space Cowboy.

Behind the Scenes of “Rawkintrevo’s House of Real Time IoT Analytics, An AIoT platform MVP Demo”

Woo, that’s a title- amirigh!?

It’s got everything- buzzwords, a corresponding YouTube video, a Twitter handle conjugated as a proper noun.

Introduction

Just go watch the video– I’m not trying to push traffic to YouTube, but it’s a sort of complicated thing and I don’t do a horrible job of explaining it in the video. You know what, I’m just gonna put it in line.

Ok, so Now you’ve see that.  And you’re wondering? How in the heck?!  Well good news- because you’ve stumbled to the behind the scenes portion where I explain how the magic happened.

There’s a lot of magic going on in there, and some you probably already know and some you’ve got no idea. But this is the story of my journey to becoming a full stack programmer.  As it is said in the Tao of Programming:

There once was a Master Programmer who wrote unstructured programs. A novice programmer, seeking to imitate him, also began to write unstructured programs. When the novice asked the Master to evaluate his progress, the Master criticized him for writing unstructured programs, saying, “What is appropriate for the Master is not appropriate for the novice. You must understand Tao before transcending structure.”

I’m not sure if I’m the Master or the Novice- but this program is definitely unstructured AF. So here is a companion guide that maybe you can learn a thing or two / Fork my repo and tell your boss you did all of this yourself.

Table of Contents

Here’s my rough outline of how I’m going to proceed through the various silliness of this project and the code contained in my github repo .

  1. YOU ARE HERE. A sarcastic introduction, including my dataset, WatsonIoT Platform (MQTT). Also we’ll talk about our data source- and how we shimmed it to push into MQTT, but obviously could (should?) do the same thing with Apache Kafka (instead). I’ll also introduce the chart- we might use that as a map as we move along.
  2. In the second post, I’ll talk about my Apache Flink streaming engine- how it picks up a list of REST endpoints and then hits each one of them.  In the comments of this section you will find people telling me why my way was wrong and what I should have done instead.
  3. In this post I’ll talk about my meandering adventures with React.js, and how little I like the Carbon Design System. In my hack-a-thon submission,  I just iFramed up the Flink WebUI and Kibana, but here’s where I would talk about all the cool things I would have made if I had more time / Carbon-React was a usable system.
  4. In the last post I’ll push this all on IBM’s K8s. I work for IBM, and this was a work thing. I don’t have enough experience on any one else’s K8s (aside from microK8s which doesn’t really count) to bad mouth IBM. They do pay me to tell people I work there, so anything to rude in the comments about them will most likely get moderated out. F.u.

Data Source

See README.md and scroll down to Data Source. I’m happy with that description.

As the program is currently, right about here the schema is passed as a string. My plan was to make that an argument so you could submit jobs from the UI.  Suffice to say, if you have some other interesting data source- either update that to be a command line parameter (PRs are accepted) or just change the string to match your data.  I was also going to do something with Schema inference, but my Scala is rusty and I never was great at Java, and tick-tock.

Watson IoT Platform

I work for IBM, specifically Watson IoT, so I can’t say anything bad about WatsonIoT.  It is basically based on MQTT, which is a pub-sub thing IBM wrote in 1999 (which was before Kafka by about 10 years, to be fair).

If you want to see my hack to push data from the Divvy API into Watson IoT Platform, you can see it here. You will probably notice a couple of oddities.  Most notably, that only 3 stations are picked up to transmit data.  This is because the Free account gets shut down after 200MB of data and you have to upgrade to a $2500/mo plan bc IBM doesn’t really understand linear scaling. /shrug. Obviously this could be easily hacked to just use Kafka and update the Flink Source here.

The Architecture Chart

That’s also in the github, so I’m going to say just look at it on README.md.

Coming Up Next:

Well, this was just about the easiest blog post I’ve ever written.  Up next, I may do some real work and get to talking about my Flink program which picks up a list of API endpoints every 30 seconds, does some sliding window analytics, and then sends each record and the most recent analytics to each of the end points that were picked up, and how in its way- this gives us dynamic model calling. Also- I’ll talk about the other cool things that could/should be done there that I just didn’t get to. /shrug.

See you, Space Cowboy.

 

 

 

Building a License Plate Recognizer For Bike Lane Uprising

UPDATE 6/20/2019: Christina of Bike Lane Uprising doesn’t work in software sales, and therefor is terrified of “forward-selling”, and wants to make sure its perfectly clear that automatic plate recognition, while in the road map, will not be implemented for some time to come.

I was recently at the Apache Roadshow Chicago where I met the founder of Bike Lane Uprising, a project whose goal is to make bicycling more safe by reducing the occurrence of bike lane obstruction.

I actually met Christina when I was running for Alderman last fall (which is why I’ve been radio-silent on the blogging for the last 8 months- before you go Google it, I’ll let you know I dropped out of the race).  As a biker, I identified with what she was doing.  Chicago is one of the bike-y-est cities in North America, but it can still be a very… exciting… adventure commuting via bike.  It also seemed like there was probably something I could do to help out.   I promised after the election was done I’d be in touch.

Of course, I forgot.

Actually, I didn’t forget- I dropped out of the election because I got a book deal with O’Reilly Media to write on Kubeflow AND because I had already committed to producing the Apache Roadshow Chicago.  So I “punted” and promised to help out after the Roadshow.

Christina was still nice enough to come out and speak at the roadshow, and was very well received.  We were talking there, and I, remembering my oath to help out, finally got a hold of her a week or two later.

Her plan, was to do license plate recognition.  The current model of Bike Lane Uprising uses user submitted photos, which are manually tagged and entered into a database with things like: License Plate Number, Company, City/State Vehicles, etc.  She had found an open source tool called OpenALPR and wondered if BLU could use it somehow.

Now obviously this is going to have to be served somewhere, and if you hadn’t heard, I have fallen deeply in love with Apache OpenWhisk-incubating over the last 18 months.  And I’m not saying that with my IBM hat on, I genuinely think it is an amazing and horribly underrated product. Also it’s really cheap to run (note- I am running it as IBM Cloud Functions, which is just a very thin veil over OpenWhisk).

OK so OpenALPR has a Python API.  Good news and bad news- good news because this project will take 20 minutes and I’ll be done, bad news because it’s too quick and easy to make a blog post out of.  Considering you’re reading the blog post- obviously that didn’t work.  If you look at OpenALPR, you’ll see its been over a year since any work has gone on with it. It’s basically a derelict, but a functional one…ish.  The Python is broken- some people said they could make the Python work if they built from scratch- I could not. Gonna have to CLI this one.

As a spoiler- here’s the code.

Well, that’s exciting because I’ve never built a Docker function before (only Python, and some pretty abusive python tricks that use os.system(... to build the environment at run time…

For this trick however, we do a multi stage build.  This is because the Alpine Linux repos are unstable as hell, and if you want to version lock something you basically have to build from source.

OpenALPR depends on OpenCV and Tesseract. OpenWhisk expects its Docker functions to start off from the dockerSkelaton image.**  So if you go into the docker/ directory- I’d like to first direct your attention to opencv which uses openwhisk/dockerSkelaton for a base image and builds OpenCV. Was kind of a trick. Not horrible.  Then, we have the tesseract folder which builds an image using rawkintrevo/opencv-whisk as a base. Finally, openalpr/ which builds an image using rawkintrevo/tesseract-whiskas a base.  Now we have an (extremely overweight bc I was lazy about my liposuction) environment with openalpr installed.  Great.

Finally, let me direct your attention to where the magic happens. plate-recog-server-whisk/ has a number of interesting files. First there is an executable called exec.  A silly little bash file with only one job- to call

python3 /action/openalpr-wrapper.py ${1}

You see- the dockerSkelaton has a bunch of plumbing, but OpenWhisk expects there to be a file /action/exec that it will execute and the last line of stdout from that executable to be a json (which OpenWhisk will return).

So lets look at the code of openalpr-wrapper.py elegant in it’s simplicity.  This is a program that takes a single command line arg, a json (that’s how OpenWhisk passes in parameters), that json may have two keys, a required image url, and an optional top n license plates.  subprocess.call( calls alpr with the image and the top plates, and prints the response (in json from, which is what the -j flag is for).  And that’s it. So simple, so elegant.

I’m trying to follow this design pattern more and more as I get older- their usually exists some open source package that does what I want, and I just need a little python glue code up top.  In this case- I wanted

  • License Plate Recognition
  • As-a-Service (e.g. done via API call and scalable).

OpenALPR + Apache OpenWhisk-incubating.

 

I’m hoping to write more now that life has slowed down a bit. Stop back by soon.

** UPDATE 2: The Apache OpenWhisk folks wanted me to provide some clarity around this statement.  Specifically they said about this sentence, that it:

 isn’t quite right. You can use any image as your base as long as you implement the lifecycle/protocol. What you get from the skeleton is a built in solution. You could have started with our python image or otherwise.

 

Photo Credit: Ash Kyd

 

Pavlov’s Sandman pt 3: Hacking a Shock Collar and the benefits of Propper Cargo Culting

In the last post I laid out a super simple algorithm for detecting snores.  You might have read that and thought, “dude, this barely counts as data science”. I would agree with you.  I did that in January, the talk was in May. I figured I’d go hack the shock collar, and then work on fine tuning the algorithm with some more advanced magic.  Turns out I had no idea what I was doing with bluetooth hacking in Python, and it was by luck in early March before I was able to make any communication with the shock collars / begin testing.

I had done some device hacking before (see Cylons).  I thought this would be easy…

_ Author makes 1000-yard stare out of the window at restaurant where he is writing blog_

… it wasn’t.

The idea was, I would be able to see what messages my phone was sending the device, then I would send the same messages and viola! I would have control. So step 1 is to snoop the communication between the phone and the device.

To do this, open your Android (if you’re using an iPhone, you’ll first need to smash it with a hammer and get a real cell phone), go to settings, open developer options (varies by phone look up a tutorial for your model), click on the “Enable HCI bluetooth snoop log”, and then go into the app to pair with the collar / send commands.

Screen Shot 2018-05-22 at 2.37.58 PM

Too easy!  Next step is to pull the log off your phone. For this step you’ll need Linux (actually you can theoretically do this without Linux, but you will need linux for the Python program that controls the audio of the app or significantly refactor it to run on your trash OS)- if you’re not using Linux you’ll need to format your hard drive and install a grownup’s operating system, you may also need to install Android Debugging Bridge.  Open the terminal and type in:

adb pull /sdcard/btsnoop_hci.log

Now here’s a picture.

Screen Shot 2018-05-23 at 3.24.01 PM
A screen shot of a screen shot, but you get the idea.

Ok, that’s going to pull the snoop file on to your computer.  Next step, open up Wireshark or whatever your favorite log analyzer is and read through.

Screen Shot 2018-05-23 at 3.26.02 PM.png

What we see here is what (I think) was a “buzz” command from my phone to the collar.

Cool- so now we see the commands my phone was sending to my device (shock collar), should be easy to mimic right?  Well, no it wasn’t. Not because it was intrinsically difficult, but because I had no idea what I was doing.  What happened next in my story was about two months of me (off and on) trying to make contact from my computer to my device.  The first problem was there was no way to pair between my device and the computer, some of the linux bluetooth utils would see the device, others would not.

I was stabbing in the dark. Do you remember being a child, and seeing your parents order pizza.  And then one day you want to order pizza, and you’ve seen your parents do it plenty of times and seems easy enough, so you walk to the phone press buttons, say “pizza please” and then wait forty five minutes for pizza to show up. You only know if you did it right because pizza either shows up or it doesn’t.  That was basically my experience trying to hack this S.O.B.

I was cargo culting, and doing so with more concern as the deadline approached.  Finally, the cargo gods smiled on me.  I learned about Low Energy Bluetooth devices (which my device is).  I found a somewhat obscure python library for dealing with them.  But I still wasn’t getting signals through (that is to say, the device was not reacting to commands I sent).   From there, I litterally cargo culted backwards.  There are a lot of commands the phone sends to the device and I started going backwards through them all. In not much time- SUCCESS! The collar buzzed.  It turns out there is a command which I roughly translate to mean “Hello device, I am a controller that will be sending you commands now, please do whatever you hear from me”, and then it does.

cargo-cult-plane_497x375

The code that does all of this is so simple and straight forward its hardly worth giving a line by line.

You can find the controller class here.

Pavlov’s Sandman Pt 2: Background and Detecting Snores

Welcome to the long anticipated follow on post to Pavlov’s Sandman: Pt 1.  In that post I admitted that I had a problem with snoring, and laid out a basic strategy and app to stop snoring.  They say that admitting you have a problem is half of the battle.  In this case that is untrue- hacking the shock collar was half of the battle, and the other half of the battle was detecting snores.

I had a deadline for completing this project, ODSC East in Boston, at the beginning of May.  What the talk ended up being about (and in turn what this blog post will now be about) is how in “data science” we usually start with a problem, that based on some passing knowledge seems very solvable and then the million little compromises we make with ourselves on our way to solving the problem/completing a product.

This blog post and the one or two that follow will be an exposition / journal of my run-and-gun data science as I start with assuming a problem is easy, have some trouble but assume I have plenty of time, realize time is running out, panic, be sloppy with experiments, make a critical break through in the 11th hour, and then deliver an OK product / talk / blog post series.

A brief history of the author as a snorer.

As best as I can tell, I began snoring in Afghanistan.  This isn’t suprising, the air in Kabul was so bad the army gave me a note saying that if I ever had respitory issues, it was probably their fault (in spite of the fact that everyone was smoking a pack per day of Luckys / Camel Fulls / Marlboro Reds ).  This is to say nothing of the burn pits I sat next to to keep warm while on gate duty from December until March, or the five thousand years of sheep-poo-turned-to-moon-dust always blowing around in the country side west of Kabul City.

As a brief aside, do you know what’s really fun about trying to “research” when you started snoring? Making a list of ex’s and then contacting each of them out of the blue with a “hey, weird question but…”, it’s like a much more fun version of the “Who gave me the STD?” game.

After Afghanistan, girls I was sleeping with would occasionally complain of my snoring.  This came to a head with my ex though.  She would complain, elbow me, sleep on the couch, etc. But I was more concerned when the issue came up with my new girl friend (a very light sleeper, or I’ve gotten much worse about snoring).  This was especially concerning, because I had tried every snoring “remedy” on the internet and had no success.

Break throughs on other fronts.

I have a puppy named Apache, and was at the trainers.  They convinced me that I should start using a wireless collar ( a shock collar ).  The guy who trained me taught me that you don’t want to hurt the dog, you want to deliver the lightest shock they can feel and just keep tapping them with that until they stop doing what they should be doing.  The shock should be uncomfortable, not painful.

One of the “remedies” I had tried for my snoring before was an app called Sleep as Android. In this app there was an “anti-snoring” function where the phone would buzz or make a noise when you were snoring- this had no effect, but I had always wished I could rig it out to a shock collar.

Finally, in November of last year- I discovered that you can buy a shock collar which is controlled via Bluetooth on Amazon for about $50. (PetSafe).

I have done some device hacking, I figured I could figure out the shock collar easily enough. Detecting snores also seemed easy enough.  So I wrote a paper proposal for ODSC and started working on the issue. (And wrote the last blog post which recorded me snoring).

Snore Detection and the Data Science Process of the Author

Traditionally when I start on a project, I try to come up with an exceptionally simple-enough-to-explain-to-a-5-year-old type of algorithm just so I have a baseline to test against. The benefits to this are 1) having a baseline to train against, but 2) to become “familiar with the data”.

My first attempt at a “simple snore detector” was to attempt to fit a sin curve to the volume of the recorded noises.  This got me used to working with Python audio controls and sound files.  I also learned right away this wasn’t going to work because the “loud” part of the snore happens then there is a much longer quiet portion. That is to say we don’t breath evenly.  I don’t have sleep apnea (that is to say I don’t stop breathing), so the snores are relatively evenly spaced apart, but there are also “other noises” and various other reasons, the sin wave curve fitting just wasn’t ideal.

At this point I went back and read some academic literature on snore detection. There isn’t a lot, but there is a bit.

Automatic Detection of Snoring Events Using Gaussian Mixture Models by Dafna et. al.

Automatic detection, segmentation and assessment of snoring from ambient acoustic data by Duckitt et. al.

An efficient method for snore/nonsnore classification of sleep sounds Cavusoglu et. al.

My Synopsis

Dafna reconstructed when the patient was snoring by looking at the entire night of data and looking at how volume compared to the average.  Following his method and converting it to “real-time” detection however, was going to be problematic.

Duckitt created a Hidden Markov Model (mid 2000s speak for LSTM) (yes I know they’re not that same) with the states snoringnot-snoringother-noisesbreathingduvet-noise.  An interesting idea, one I might visit for a “real version”.

Cavusoglu looked at subband energy distributions, inter and intra individual spectra energy distributions, some principal component analysis, in other words- MATH.  I liked this guys approach and decided to mimic it next.

PyAudioAnalysis

pyAudioAnalysis is a package created and maintained(ish) by Theodoros Giannakopoulos.  It will break audio files down into 32 features, including the ones used by Cavusoglu.  From there I tried some simple logistic regression, random forrest classification, and K-nearest-neighbors classification.

The results weren’t bad, but I was VERY opposed to false positives (e.g. getting shocked when I didn’t snore.  The numbers I was getting on validation just didn’t inspire me (though looking back I think I could probably have been ok.)

Screen Shot 2018-05-22 at 1.59.18 PM

Back to Basics

A quick note on the “equipment” I am using to record, it is a laptop mic, which is basically trash.  Lots of back ground noise.  At this point, I had been playing with audio files for a while.  I decided to see if I could isolate the frequency bands of my snoring.

In short I found that I normally snore at 1500-2000Hz, 5khz-7khz, and occasionally at 7khz-15khz.  I decided to revisit the original loud noises idea, but this time, I would filter the original recording (for the last 5 seconds) into 1500-2khz and 5-7khz. If there was a period which was over the average + 1 standard deviation for the clip, which lasted longer than 0.4 seconds, but less than 1.2 seconds, then there was a pause in which the intensity (volume at that frequency band) was less than the threshold (mean + 1.5 stdevs) for 2.2-4.4 seconds and then another period where the intensity was above the threshold for 0.4 to 1.2 seconds, then we would be in a state of snoring.

This worked exceptionally well, except when I deployed it, I accidentally set the bands at 1500-2khz and 5khz-7khz, which missed a lot of snores.  I will be updating shortly.

Screen Shot 2018-05-22 at 2.13.08 PM.png

In the upper image above, we see the original audio file intensity (blue) over time, and in orange is the intensity on the 5-7khz band. The black line is the threshold. This would have been classified as a snore (except it wouldn’t because I wasn’t watching 5-7khz on accident).

Conclusions

So that is the basics of detecting snores in real time.  Record a tape of the last 5 seconds of audio, analyze it- and if thresholds are surpassed, then we have a “snore”. Fire the shock. but oh, firing the shock and hacking the shock collar- that was a whole other adventure.  To be continued…

 

Pavlov’s Sandman pt. 1

I snore. I’m perfect in many most ways, but this is the one major defect I am aware of. Like any good engineer, upon learning of a defect I set out to correct or at least patch it. The name of this little project stems from Ivan Pavlov’s experiments with conditioning as well as Operation Sandman, a CIA program for sleep deprivation torture of detainees at Gitmo.

The naming convention is evident when one looks at the strategy I am taking to correct my unpleasant snoring habit.  I have an app on my phone (Sleep as Android) that tracks my sleep, is a cool alarm, and among other things, tracks my snoring.   From this app I know:

  1. It is possible to “detect” snoring.
  2. I don’t rip logs all night, but in bursts.

I have tried a number of things to correct this throughout the last year including mouth piece, a jaw strap, a shirt with a tennis ball sewn in the back, video recording myself sleeping to see if I can detect a position where snoring occurs, essential oils/other alternative medicines, etc.  Failing all of these, I now begrudgingly turn to “Data Science” the form of mysticism reserved for the exceptionally desperate.

The plan of attack on this endeavor is as follows:

  1. Create a program that detects loud noises (preprocessing).
  2. Differentiate between snoring and other nighttime noises (dog, furnace, coughs, etc).
  3. When snoring is detected administer a small shock via a Bluetooth controlled shock collar for dogs which I will be wearing as an arm band.
  4. Video record results of me electrocuting myself while trying to sleep and post to YouTube, elevating me to stardom.
  5. Possibly train myself to stop snoring.

The title of this project should now be apparent, as I am hoping to “train” myself to not snore, and if I were developing this commercially, I’m fairly confident that any beta-testing I did would (rightly) classify me as a war criminal.

In part one of this series (I promise that all the time and have a bad habit of not following through), I present the code and methods I have developed for detecting loud noises / building my dataset.

In future parts, I hope to do some cool things with respect to signal processing and Bluetooth device hacking with Python.

GitHub Repo

Loud Noises

The first step is to record sounds. The code presented is fairly elegant and easy to follow (for now).

We have a class AudioHandler which contains some variables and a few methods.

In addition to the alsaaudio handler, we have:

  • rawData a list for holding caching the audio recorded
  • volume which will be used to create a csv of  timestamp, volume data
  • and various thresholds to prevent getting multiple shocks in unison, a warm-up period, volume threshold for recording, etc.

The methods are:

setThreshold

This action listens to the mic for a number of seconds and attempts to dynamically set the threshold. It’s not great- for my first night of use I ended up doing it manually by observation and laying in bed and making some breathing / fake snoring sounds and seeing where it hit.

dumpData

This method writes the csv and audio to disk

executeAction

This is a place holder, and later will be used to call the “shock”

run

This is where all the fun happens. In short it

  1. Attempts to set a baseline threshold considering mic sensativity, background noise, etc.
  2. In a while loop it then
    1. Listens to the mic
    2. If the volume is above the threshold:
      1. Set a recordingActive flag to True if it wasn’t already
      2. If it wasn’t already, timestamp when this recording started
      3. Determine how long the current recording has been going on.
      4. If it has been going on for some duration, call the executeAction method and dump the recording to disk.
    3. If recordingActive is true, add the raw audio and volume levels to rawData and volume

And that’s about it. Again, look at the code for specifics, but all in all pretty straight forward.

Last night I recorded.

Building a Training Set

As a priest of the mystic art of data science, the first part of any ceremonial ritual is to create a training / testing data set.

This was a very tedious part of my day.  I went through the recordings and seperated them into two folders “snore” and “non-snore”.  Well, I did this for about 30 minutes, and got approx 80 samples of each. Then I moved the rest into an “unlabled” folder… you know for testing purposes, not because I was super bored.  Perhaps if I had an intern, this would have been a more robust set.

Finally I wrote a little python script that will copy the csvs over appropriately to all of the wav files you sorted out into the proper directories.

Stay tuned for part 2, where we’ll do some signal processing to differentiate the snores from the noises!

 

Borg System Architecture

or “how I accidentally enslaved humanity to the Machine Overlords”.

The Borg are a fictional group from the Sci-Fi classic, Star Trek, who among other things have a collective consciousness.  This creates a number of problems for the poor humans (and other species) that attempt to resist the Borg, as they are extremely adaptive. When a single Borg Drone learns something, its knowledge is very quickly propagated through the collective, presumably subject to network connectivity issues, and latency.

Here we create a system for an arbitrary number edge devices to report sensor data, a central processor to use the data to understand the environment the edge devices are participating in, and finally to make decisions / give instructions back to the edge device.  This is in essence what the Borg are doing.  Yes, there are some interesting biological / cybernetic integrations, however as far as the “hive mind” aspect is concerned, this is basic principles in play.

I originally built this toy to illustrate that “A.I.” has three principle components: Real time data going into a system, an understanding of the environment is reached, a decision is made. (1) Real Time artifical intelligence, like the “actual” intelligence it supposedly mimics is not making E.O.D. batch decisions. (2) In real time the system is aware of what is happening around it- understanding its environment and then using that understanding to (3) make some sort of decision about how to manipulate that environment. Read up on definitions of intelligence, a murky subject itself.

Another sweet bonus, I wanted to show that sophisticated A.I. can be produced with off-the-shelf components and a little creativity, despite what vendors want to tell you. Vendors have their place. It’s one thing to make something cool, another to productionalize it- and maybe you just don’t care enough. However, since you’re reading this- I hope you at least care a little.

Artificial Intelligence is by no means synonymous with Deep Learning, though Deep Learning can be a very useful tool for building A.I. systems.  This case does real time image recognition, and you’ll note does not invoke Deep Learning or even the less buzz-worthy “neural nets” at any point.  Those can be easily introduced to the solution, but you don’t need them.

Like the Great and Powerful Oz, once you pull back the curtain on A.I. you realize its just some old man who got lost and creatively used resources he had lying around to create a couple of interesting magic tricks.

oz.gif

System Architecture

OpenCV is the Occipital Lobe, this is where faces are identified in the video stream.

Apache Kafka is the nervous system, how messages are passed around the collective. (If we later need to defeat the Borg, this is probably the best place to attack- presuming we of course we aren’t able to make the drones self aware).

hugh.gif

Apache Flink is the collective consciousness of our Borg Collective, where thoughts of the Hive Mind are achieved.  This is probably intuitive if you are familiar with Apache Flink.

Apache Solr is the store of the “memories” of the collective consciousness.

The Apache Mahout library is the “higher order brain functions” for understanding. It is an ideal choice as it is well integrated with Apache Flink and Apache Spark

Apache Spark with Apache Mahout gives our creation a sense of conext, e.g. how do I recognize faces? It quickly allows us to bootstrap millions of years of evolutionary biological processes.

A Walk Through

(1) Spark + Mahout used to calculate eigenfaces (see previous blog post).

(2) Flink is started, it loads the calculated eigenfaces from (1)

(3) A video feed is processed with OpenCV .

(4) OpenCV uses Haar Cascade Filters to detect faces.

(5) Detected faces are turned in to Java Buffered Images, greyscaled and size-scaled to the size used for Eigenface calculations and binarized (inefficient). The binary arrays are passed as messages to Kafka.

(6) Flink picks up the images, converts them back to buffered images. The buffered image is then decomposed into linear a linear combination of the Eigenfaces calculated in (1).

(7) Solr is queried for matching linear combinations. Names associated with best N matches are assigned to each face. I.e. face is “identified”… poorly. See next comments.

(8) If the face is of a “new” person, the linear combinations are written to Solr as a new potential match for future queries.

(8) Instructions for edge device written back to Kafka messaging queue as appropriate.

Problems

A major problem we instantly encountered was that sometimes OpenCV will “see” faces that do not exist, as patterns in clothing, shadows, etc. To overcome this we use Flink’s sliding time window and Mahout’s Canopy clustering.  Intuitively, faces will not momentarily appear and disappear within a frame, cases where this happens are likely errors on the part of OpenCV. We create a short sliding time window and cluster all faces in the window based on their X, Y coordinates.  Canopy clustering is used because it is able to cluster all faces in one pass, reducing the amount of introduced latency.  This step happens between step (6) and (7)

In the resulting clusters there are either lots of faces (a true face) or a very few faces (a ghost or shadow, which we do not want).  Images belonging to the former are further processed for matches in step (7).

Another challenge is certain frames of a face may look like someone else, even though we have been correctly identifying the face in question in nearby frames.  We use our clusters generated in the previous hack, and decide that people do not spontaneously become other people for an instant and then return. We take our queries from step 7 and determine who the person is based on the cluster, not the individual frames.

Finally, as our Solr index of faces grows, our searches in Solr will become less and less effecient.  Hierarchical clustering is believed to speed up these results and be akin to how people actually recognize each other.  In the naive form, for each Solr Query will scan the entire index of faces looking for a match.  However we can clusters the eigenface combinations such that each query will first only scan the cluster centriods, and then only consider eigenfaces in that cluster. This can potentially speed up results greatly.

Usecases

Borg

This is how the Borg were able to recognize Locutus of Borg.

Cylons

This type of system also was imperative for Cylon Raiders and Centurions to recognize (and subsequently not inadvertently kill) the Final Five.

Shorter Term

This toy was originally designed to work with the Petrone Battle Drones however as we see the rise of Sophia and Atlas, this technology could be employed to help multiple subjects with similar tasks learn and adapt more quickly.  Additionally there are numerous applications in security (think network of CCTV cameras, remote locks, alarms, fire control, etc.)

Do you want Cylons? Because that’s how you get Cylons.

Alas, there is no great and powerful Oz. Or- there is, and …

oz2.gif

 

References

Flink Forward, Berlin 2017
Slides Video (warning I was sick this day. Not my best work).

Lucene Revolution, Las Vegas 2017
Slides Video

My Git Repo

PR Donating this to Apache Mahout
If you’re interested in contributing, please start here.