Understanding Sign Language in Video with AI

Google Meet now offers real-time closed captions for speakers, and this is very helpful for deaf users using the platform to communicate with hearing and speaking users. In this Google is ahead of the competition - thanks Google!

But that’s helpful only in one direction - the deaf person can read the captions to understand the hearing person… but to make themselves understood, the deaf person has to type what they want to say. This is far from ideal.

Could a deaf person sign into the camera and have software create closed captions of what they are signing in real-time?

Yes. I hacked the above demo of software understanding American Sign Language (ASL) together in about a day’s worth of work using Google’s MediaPipe Holistic framework for face and hand pose prediction and TensorFlow for machine learning.

There are plenty of issues with it: the vocabulary is limited, I taught it all the signs myself (and it might therefore struggle with signing accents), it doesn’t use temporal information (and therefore could not tell the difference between some pairs of signs, like “mother” and “grandmother”) and so on.

But those are all solvable problems - just a matter of solid engineering work. There is a number of large datasets of labeled training data - Dr. Bill Vicars’s lifeprint, or handspeak, or various sources on giphy.

In fact, there’s a lot of demos similar to mine online.

Why hasn’t someone built a production version of it? I don’t buy the argument that it’s a matter of technology anymore. Is it a matter of there not being enough money to be made here? Is it just not considered important enough?

Please ping me if you know any companies building ASL-recognition from video into a commercial product - or if you’re thinking about starting one!

UPDATE: The Economist has published a good overview of the “race to teach sign language to computers” (paywall). The article cites data collection and annotation as a major issue, as well as some technological challenges, including capturing people’s facial expressions (I’m calling BS on that - take another look at the video above to see why). On the bright side, there are a number of teams working on the problem… but on the darker side, none of those seem to be at one of the software giants.

read more

Livewired by David Eagleman - Book Notes

Livewired: The Inside Story of the Ever-Changing Brain
by David Eagleman
Goodreads | Wikipedia

When the first draft of the Human Genome Project came to completion at the turn of the millennium, one of the great surprises was that humans have only about twenty thousand genes. This number came as a surprise to biologists: given the complexity of the brain and the body, it had been assumed that hundreds of thousands of genes would be required. So how does the massively complicated brain, with its eighty-six billion neurons, get built from such a small recipe book? The answer pivots on a clever strategy implemented by the genome: build incompletely and let world experience refine. Thus, for humans at birth, the brain is remarkably unfinished, and interaction with the world is necessary to complete it.

One very exciting part of this livewiring is the brain’s ability to process whatever information it receives… and that we don’t need to limit ourselves to our nature-given senses. How about drone pilots intuitively feeling the drone’s pan, tilt, yaw, and acceleration? How about feeling the direction of magnetic north? How about being able to sense infrared light? David talks about this in his excellent TED talk:


How the brain processes sensory information, and the idea of giving people new senses, is only one part of the book. David sees seven key principles; in his words:

read more

Range by David Esptein - Book Notes

Range: Why Generalists Triumph in a Specialized World
by David Epstein
Goodreads | Wikipedia

Malcolm Gladwell popularized the notion of the “10,000 hours rule” in his very readable book Outliers. It goes roughly like this: practice a specific skill deliberately for about 10,000 hours, and you’ll get to world-class level in that thing. The takeaway from that seems to be: pick what you want to do early, and stick with it.

David Epstein respectfully disagrees, and he lays out his arguments and copious supporting data in his book Range: Why Generalists Triumph in a Specialized World. You can find discussions between the two on this topic on YouTube - and in one from February 2020, Gladwell says that Epstein has convinced him.

A great introduction to the big idea is Epstein’s TEDx talk on this topic:

Kind and Wicked Domains

Epstein, borrowing terminology from other authors, makes a distinction between “kind” and “wicked” learning environments.

read more

Contact Managers Suck: The Micelf Whitepaper

Contact Managers Suck: The Micelf Whitepaper

Contact apps haven’t changed since smartphones came out more than ten years ago. It’s time to give them a makeover.

I quit my job in August 2019 to go vagabonding - solo traveling all over the world. Before Covid-19 cut the trip short in February 2020, I visited 14 countries in Europe, Asia, and Oceania.

I met dozens of great people that I wanted so stay in touch with - and I found it surprisingly hard to do so. It felt like everyone had their own pet favorite way to stay in touch - some Instagram, some WhatsApp, some old school email and some just a phone number. Sometimes I’d open three different apps before I finally found a way to contact the person I wanted to talk to.

It felt crazy, and I asked everyone how they did it - and people just shrugged. No one had a good way, other than picking one specific app (generally either WhatsApp or Instagram) and trying to stick with it as much as possible.

It felt crazy because it is crazy. Can we do better? Should we even care?

Happiness is Love

We should care because “Happiness is love.”

That’s the principal finding of The Grant Study, a 75-year longitudinal study of hundreds of people who grew up in Boston neighborhoods between 1940 and 1945, and which continues to this day.

read more

Programatically Creating Ubuntu Images with systemd's mkosi

mkosi is a great tool to programmatically create reproducible images of operating systems. This has a lot of applications in IoT, security, automated testing, managing servers etc. I like it a lot.

mkosi can make images of Fedora, Debian, Ubuntu, ArchLinux, and OpenSuse. There are some differences between those distributions, though, and probably because of that some things are supposed to work… don’t, for some distros.

This post is about the quirks of making Ubuntu images with mkosi. In some cases, the documentation for how to make mkosi do something for an Ubuntu images is just plain wrong (though presumably it works for other distros). In other cases, I had a hard time finding information. Hopefully this post helps you to get mkosi working for Ubuntu images in less time than it took me.

read more