Nate Harada Machine Learning in Real Life

Tensorflow I Love You, But You're Bringing Me Down

Tensorflow’s meteoric rise to the top of the deep learning world is, while unsurprising, pretty damn impressive. With almost 60k stars on Github (the only reasonable measure of software popularity), Tensorflow is far out in front of nearest competitor Caffe, with its paltry 18k. The framework has a lot going for it: Python, great tools like Tensorboard, Python, Google’s knowledge of distributed systems, Python, and popularity that all but guarantees future relevance.

But while Tensorflow is a wonderful framework, the decisions (or lack thereof) being made by the Tensorflow product team are making it increasingly difficult for external developers to adopt. In my eyes, Tensorflow’s public face has grown without proper direction, and is threatening to alienate developers and allow competing frameworks to take over.

Fragmented high level library support

My main gripe strikes me as a weird and totally avoidable issue: there are too damn many Google supported libraries for Tensorflow. Good software engineers know that reinventing the wheel is a bad thing, and so when the prospect of writing yet another training and batching loop rears its ugly head, we look to high level libraries to ease the pain. Apparently, Google employees were aware this would happen, and in a mad scramble to curry organizational favor managed to release no less than five(!) Google developed high level libraries. There’s tf.learn (which is of course different than 3rd party tool TFLearn), tf.slim, DeepMind’s Sonnet, something called prettytensor, and Keras, who if this were a high school drama would be rapidly trying to distance herself from her less cool friend Theano.

I appreciate the work that has gone into these tools, and certainly it’s a benefit to have options. However, these are first party, Google supported tools. There’s no clear preferred library, even internally, and while the champions for each library claim they are nothing alike, it’s difficult for an external developer or researcher to pick an option. When “new” == “risky” for most companies, developers want a toolkit they can commit to deploying internally that will still be considered “best practice” in a few months. By offering a whole slew of somewhat supported options, Google is hindering adaptation of the Tensorflow framework in general. Avoiding writing boilerplate code each new experiment is a must have for most devs, but having to learn a new “hot” framework because previous ones are no longer feature competitive severely limits research output, and is an unreasonable problem to have when all are controlled by the same company.

Build-mania

One of the best things a software product can have is a strong community. For most of us, learning a new library means reading examples on blogs and Github, and consulting forums or documentation for help on specifics. Unfortunately for the average developer, Google’s desire to build features and exciting new pieces of the ecosystem has left those resources in the dust. Every week it seems a new Tensorflow product is announced – XLA, TFDBG, a graph operation to turn on your toaster, etc. No doubt these features are beneficial, but it also means that any resource about Tensorflow is immediately out of date. Documentation tends to be the most up to date, but often provides no context or example usage. Example code is often stale, sometimes presenting old functions or workflows that aren’t used anymore. Stack overflow questions tend to be only half-useful, since at least some of the answer is probably outdated.

This problem should fade as time stabilizes the APIs and features, but to me it seems that this should have been planned for ahead of time. Tensorflow has been out for almost 2 years now (an eternity in deep learning time), but the Python API didn’t stabilize until March 2017. The other language bindings are still not stable. For a language touting its production-ready capabilities, you’d expect the C++ API to not be shifting under your feet.

Everything is a tensor

This one is hard to complain about, because I totally understand why the architecture was built this way. In fact, Derek Murray explicitly states that Google considers this a feature and not a bug in his Tensorflow dev summit talk. Hear me out anyway though – making everything a Tensor makes irrelevant a ton of knowledge about how to work with data in Python and negates many of the great tools that Python has built around it.

In Tensorflow, more and more of the tools built around the project operate as graph operations or nodes themselves. This means that the whole pipeline, from data loading to transformation to training, is all one giant GraphDef. This is highly efficient for Google: by making everything a graph operation, Google can optimize each little piece for whatever hardware the operation will run on (including exotic architectures like TPUs). However, it steepens the learning curve significantly. In this brave new tensor-fied world, I need to learn not only how the deep learning operations work (which are mostly math and therefore language agnostic), but also how the data loading operations work, and the checkpointing operations, and the distributed training operations, and et cetera. Many of the tools that Python relies on such as the Python debugger are no longer useful, and IDEs designed to visualize Python data have no clue how to interact with these new strange language constructs. Developers outside of Google don’t want to essentially have to learn a new language to use Tensorflow, and Google lock-in throws up a serious hurdle for organizations looking to de-risk new technology integration.

A cry for help

Tensorflow is trying to be everything to everyone, but does not present a developer friendly product to the greater deep learning community. Google is known for creating complex but effective internal tools, and taking these tools public is great for the developers at large. However, when you’re on a team at a company with minimal deep learning experience trying to build out production level systems, it’s almost impossible to learn how to do things correctly. Unlike the Google employees who use the framework on a day to day basis, there’s nobody for most of us to chat with when we have questions. To the Tensorflow team: we want to use your product, but at the end of the day it comes down to whatever lets us ship products most effectively. Please don’t make us go back to writing Lua.

Lecture Notes -- Ben Marlin

It’s been a while, but yesterday I attended a great lecture by UMass’ Ben Marlin. Ben works on very similar problems to my own research, and his paper on conditional random fields for morphological analysis of wireless ECG signals is a great example of how advances in machine learning can work to improve long standing problems in healthcare. The notes aren’t perfect, but I’ve tried to fix them up from their raw form. I am unable to find slides, unfortunately.

Segmenting and Labeling On-Body Sensor Data Streams with CRFs and Factor Graphs

Two big spaces in this research

Clinical data analytics (ICU EHRs) mHealth Data Analytics — What we’ll talk about today. This is a broad space, incl. the app and device space like fitness wearables and iPhone apps. Wireless sensors, etc. The interesting this here is that the signals coming in are the same ones that you’ll find in an ICU.

With wearables, we want accurate, real-time, energy efficiency, and non-intrusive sensors. We work with addiction in our lab for example smoking or cocaine use. We also look at eating detection, etc. That may seem silly, but these things tie into ICU monitoring, e.g. pulmonary edema recovery.

For mobile health, we start with detection and move to prediction, understanding and finally intervening.

Current problem framework

Let’s look at the pipeline for these tasks.

At the raw data level: we are looking at quasi-periodic time series data.

[Slide: respiration data, one channel]

Then comes segmentation of some sort. This should be unsupervised and adaptive.

[Slide: segmentations overlaid on raw data, segments are heterogeneous]

Next is labeling these segments. This is basically state of the art, especially making independent predictions for each segmented datum.

[Slide: each segment has a color corresponding to a class]

From these segments we want to be able to come up with activity segments where a segment represents one action like eating a sandwich or smoking a cigarette.

[Slide: higher level colored segments, bigger than the individual segments]

Challenges in Mobile Health

We need these things to be low: cost, power usage, noise, drift, dropout. Obviously this isn’t possible.

  1. Labeled data is very high cost. Not only that but there is limited ecological validity.
  2. Self reporting results in lack of temporal precision and low accuracy
  3. The “n=me” problem. Big data doesn’t really solve problems in this space because people are so different. With low data volumes, everyone looks different. Then end up with covariate shift or transfer learning problems.
  4. For black boxes the need to infer meaningful model results in medicine is difficult. Model distillation is needed for something like deep learning. Doctors and patients don’t trust a black box.
  5. All of this needs to be real time! Model compression is coming back for something like this.

Case Study — CRFs for Labeling and Segmenting ECG

Motivating factor is detecting cocaine usage. For cocaine users, there are morphological structural changes in the ECG besides just rate increase. For example, the QRS and QT prolongates. The detail changes are specific to this drug and thus allow us to filter out false positives that would result with something like heart rate. Detecting each part of the heart rate is very important but difficult.

The basic idea behind this technique is to use CRFs for each segment, given a window of features around each potential peak. Sparse coding is used for feature extraction (and feature learning) and then each window’s sparse coding dictionary is the feature representation.

[Slides: many results slides. accuracy is high, amount of train data required is low, CRF does not have differential recall]

Running out of time, but quick bit about hierarchical segmentation where we jointly label and segment.

I spoke to Ben after the lecture to talk about transfer learning from various datasets of ECGs. He claims that the sparse coding dictionaries are farily stable and consistant, and doesn’t believe that training sparse coding on more complete or noise-free datasets will see a large benefit. We also talked about trying to use the sparse coding coefficients as sequence learning inputs for far-off targets such as disease or outcome prediction. This is something I am considering applying in my own work. He has not tried this, but admits it is an idea worth pursuing.

List of Gotchas for Matlab/Python, In Order of Annoyance

While the SciPy project already maintains a complete list of the differences between NumPy and Matlab, that list is big and random and this list is small and somewhat ordered. My research is written in both Matlab and Python and, like the musician who yells the wrong city at their show, these are the mistakes I make most commonly when switching back and forth.

  • Matlab indexes beginning with 1, Python with 0. This is well known, but can still trip you up when you frequently switch back and forth. This applies to all indexed values, such as the axis to apply a function to (the first axis is 1 in Matlab, 0 in Python).

  • Numpy arrays are by default element-wise for multiplication and division. To perform traditional matrix multiplication you will need to use np.dot, because both * and np.multiply are element-wise. Python 3.5 will be introducting the @ symbol for infix matrix multiplication, which will hopefully resolve some of the confusion. Similarly, Numpy offers matrix as an alternative to ndarray, but if you value your sanity you should stick with the arrays. The matrix class makes traditional matrix multiplication the default operator for the * symbol, at the expense of adding restrictions and caveats to literally everything else.

Python:

A*B            # Element-wise
np.dot(A, B)   # Matrix multiplication

Matlab:

A*B            # Matrix multiplication
A.*B           # Element-wise
  • In Numpy, many functions require a tuple as an argument. This happens in functions like concatenate and reshape:

Python:

A.reshape((5, 5)) # This works
A.reshape(5, 5)   # This doesn't

Matlab:

reshape(A, 5, 5)
  • In Numpy, arrays are not inherently multidimensional. Creating an array can create a 1d array, which does not even have a second dimension. Compare this to Matlab, where vectors are 1xN or Nx1 2d arrays. This small difference is a common source of pain, especially because it isn’t caught by static checkers and will inevitably end up crashing the very end of your long script, right after you finish training a huge model and right before you display the results.

Python:

a = np.arange(10)
a.shape => (10, )
a.reshape((10, 1))
a.shape => (10, 1)

Does Your Music Festival Give Hipster Cred?

I haven’t really had time to write new blog posts with both class and research in full swing, but I did have some leftover code for scraping music festival data, so I decided to do something with it. The festival season is starting soon, with the insanely early SXSW already wrapped up and the massive juggernaut that is Coachella starting this weekend.

Since the hipsters amoung us know that going to a popular music festival (especially to see only headliners) is akin to renoucing “On Avery Island” or writing music reviews for People Magazine, I’ve put together a handy chart to help you choose which festival to go to based on how mainsteam on average the bands are. Using the ever-handy Echonest API, I averaged out the familiarity and “hotttnesss” of the bands at the festival. The most mainstream festivals are towards the top right:

post-image-full

In this case, familiarity can be viewed as long-term brand recognition, and “hotttnesss” can be viewed as hype at this moment. So a band like The Rolling Stones has strong name-recognition but might not be very hyped, while The Weeknd may be very hot but probably isn’t familiar to most people over 40. It’s rare for a band to be popular and not familiar, so most of the festivals lie along the same line. However, there are clear deviations. For example, Lollapalooza seems to have less brand recognition than Summerfest for the same amount of popularity. This makes sense given Summerfest’s more family friendly demographic.

Many trends can be explained by the proportion of headliners to smaller acts. South By Southwest is quite hipster in this interpretation, while Boston Calling is very mainstream. This corresponds to SXSW’s relatively large, small-timer lineup and Boston Calling’s compact, headliner-heavy weekend. Hipsters will note that Pitchfork has stayed true to form and is deep in “you’ve probably never heard of them” territory.

A few points of criticism to address preemptively:

  • This graph relies on Echonest’s ranking system, which is hush-hush. They claim the numbers are based on activity over crawled webpages but who knows how accurate they actually are.
  • There are no axes values because both parameters are normalized dimensionless numbers (i.e. values [0 1]), thus relative values are all that matter.
  • I don’t work for or represent Echonest, even though I know I’ve used them twice now in my blog posts. I do really appreciate their product though.

Notes Archive is Live

As an electrical engineering major, computer notes don’t really work. Getting math into OneNote is even worse than getting math into Word, and so for most of undergrad I stuck to paper and pencil. In my last semester I decided to try using Markdown Notes to both organize notes in my machine learning class as well as practice my LaTeX. Unfortunately the website was not maintained well and after emailing the author several times with no response asking to help contribute (the bugs fixes and features I needed weren’t being worked on), I decided that this year I would try something different.

I set up an instance of Hyde, a Poole theme that runs on the Jekyll platform (the same platform that runs this blog). To separate posts into classes I used the generate categories plugin which allows me to create a “categories” page, as well as display categories in my sidebar menu. However, Github (wisely) does not allow arbitrary code execution on their site via plugins, so I had to create a separate branch for the public site. Using instructions at the sorry app blog, I created a rake file to publish the blog for me. Now when I get to class, I just create a post in the master branch and run “rake blog:publish” to push to the server.

If you’re interested in any of my notes feel free to check them out at notes.nateharada.com.