So, What Am I Doing at CMU?

This post is for all my friends and acquiantances that might be wondering what on earth I’m doing at Carnegie Mellon for my PhD. The truth is, I’m doing a variety of things here (I always a variety of side projects to keep me motivated and usually you hear about them in some form or another), however my main research is about probabilistic input. DISCLAIMER: This is NOT a thesis proposal, and there is a chance I might do something entirely different for my PhD thesis, however right now probabilistic input seems like the most likely candidate.

what is probabilistic input?

In a nutshell, the goal of my work is to make user interfaces account for more information when deciding what it is you’re trying to do. I am designing, building and evaluating a new method for modeling and dispatching input that treats user input as uncertain. Modern input systems always assume input is certain, that is it occured exactly as the sensors saw it. When a developer handles an event (say, a mouse event), that event has one x and one y coordinate. This works well for keyboards and mice, but less well for touch, and even more poorly for free-space interactions such as those enabled by the Kinect. After all, your finger is not a point! The stuff I’m working on will allow our input systems to be far more intelligent about interpreting user actions, especially for new input techniques such as touch, voice, and freespace interactions enabled by the kinect. In addition to enabling computers to better understand users, I’m interested in evaluating how we can use this probabilistic approach to design feedback that allows users to better understand how computers are interpreting their actions. For example, what’s the best way for a computer to tell you that it is not sure whether you’re doing a horizontal swipe, or a panning gesture for the kinect? If you think about it, a lot of the interactions you do can be interpreted multiple ways. The challenge of how to communicate this to users to that you understand stuff is ambiguous without being confused or working to hard is a problem I’m trying to solve. Finally, I would like to evaluate how easily developers can adopt this probabilistic approach into real applications, as the ultimate goal of this work is to eventually be adapted into all input handling systems.

what have I done so far?

Most of my work so far has been in designing (and validating through implementation) an architecture for actually dispatching uncertain input. In other words, assume that mouse events now aren’t at a location, but rather have a probability distribution over possible locations. I designed a system that figures out which buttons these new mouse events should go to. This system was published in UIST 2010, you can see the paper here. I then published a refinement of this system (with a few extra bits) that made it much easier for developers to write user controls (buttons, sliders, etc.) for my system. This was published in UIST 2011, you can see the paper here.

what is left?

Right now I’m working on designing better feedback techniques when input is uncertain. After that, I’m going to try to tackle mediation. What’s mediation? It’s basically what shoudl happen when you do something (like a gesture) and the computer can’t decide between two things. So, it asks you what you wanted to do. If it just asked you and had you pick from a list, that would feel unnatural (because it’s a break in your workflow). So, I’m trying to see if there are better ways to mediate between alternate actions. The last piece of my thesis is perhaps the most important and most difficult. It involves evaluating my work on real developers. This is still an unsolved and mostly unexplored area for me, though I know I should be working on it.

what is the best possible outcome for my thesis?

I would be thrilled if at some point in my life I saw mainstream input systems such as those in Microsoft and Apple products turning probabilistic, and if those systems used some of the ideas outlined in previous papers, or papers to come. Given the popularity of natural user interfaces, I think this is a very real possibility, which is quite exciting.

Finally, Face Detection for Windows Phone!

I wrote a port of OpenCV’s face detection library in C# for the Windows Phone. If you just want the link to the library and don’t want to hear my story, it’s http://facedetectwp7.codeplex.com/ If you’re curious, read on:

For the final project of my computational photography class, I developed an app that helps people take pictures of themselves (this was actually a fleshing out of an earlier prototype I’d written while an intern at Microsoft). The app is currently in beta testing and I hope to releas it on the Marketplace before the new year! Post a comment if you’re interested in hearing about the release.

The biggest challenge for this project was the face detection. Why? Because there are no good face detection libraries for the Windows phone. I couldn’t use the popular (and what I later found out was not-as-great-as-it-could-be) OpenCV library because Windows Phones only run on managed code (i.e. no C++). So, I wrote a C# port of the OpenCV viola-jones detector that uses OpenCV’s model files to do face detection. I must thank the folks at http://code.google.com/p/jviolajones/ for their helpful code, I used it as a guide for my port.

The library is available at http://facedetectwp7.codeplex.com/. If you want to build a Windows Phone application that uses face detection, I’d highly recommend looking into it!

 

UIST 2011 Recap

Just got back from my favorite conference and wanted to share some of my favorite papers, in case anybody wonders what the bleeding edge of HCI looks like. Here are my favorites:

FingerFlux (http://hci.rwth-aachen.de/fingerflux)
In a nutshell: Electromagnets on a table + magnet on your finger = touchscreen with magnetic force feedback.

Why I liked it: I had this idea when I just got into grad school and was always wondering what having magnetic feedback on a touchscreen would feel like. It was really great to see this idea published, great job Malte et al!


Real-Time Collaborative Coding in a Web IDE (http://groups.csail.mit.edu/uid/collabode/)


In a nutshell:
A web-based IDE that allows multiple people to edit the same file safely.
Why I liked it:
I liked the idea of this one, because it might provide a better way to do pair programming with David.


SideBySide: Multi-user Interaction with Handheld Projectors (http://www.karlddwillis.com/projects/projection/sidebyside/)

In a nutshell: Play boxing games with your projectors.
Why I liked it: I also had a similar idea (and am trying to publish a paper with Chris and Chloe on it), and think that this is awesome. Great to see it published. Also really well implemented.


No More Bricolage! Methods and Tools to Characterize, Replicate, and Compare Pointing Transfer Functions (http://libpointing.org/)

In a nutshell: Try this: move you finger across a trackpad slowly and see where your mouse goes. Now do the same thing, but move your finger quickly. See how the amount your mouse moves changes? 10,000 lines of code to figure reverse engineer mouse movement on Windows and Mac.
Why I liked it: I think about mouse gain much more than most people, and to me it’s always been a mystery what functions different OSes use.  It’s great to finally see these curves. The fact that they are so different makes me wonder why they haven’t converged on an optimal transfer function yet.


KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth (http://www.youtube.com/watch?v=quGhaggn3cQ)


In a nutshell: Make high resolution 3d models using a Kinect. Objects work best when still.
Why I liked it: I was impressed with the high resolution of the created 3D models.

Some Interesting Image Blending Results

Here are some interesting images I generated for my computational photography class this semester. Images are generated using Poisson blending and Mixed Gradient Blending. The idea is to copy in the change in pixel values, not the pixels themselves. You can learn more about the project here (here is my project submission with more detail).


Elephant in New York City.


Name on a Wall


Walking on Water.

Dealing With Noisy Skeletal Data on Windows Kinect SDK

As you all know, the Windows Kinect SDK is finally out! This post relates to the Windows Kinect SDK, not the OpenNI C# SDK that I’ve given tutorials on in previous posts. It’s pretty easy to get the depth data, rgb data, and skeletal data from the Windows SDK (though obtaining infrared is not supported), I recommend looking at the samples and tutorials provided by Microsoft for information on how to get started.

This post is about a simple trick I learned recently to smooth out the noisy skeletal data provided by the Kinect. Using skeletal data is a great way to track things like hand position. As anybody who has tried tracking hands (or anything really) on the Kinect knows, this skeletal data is quite noisy.

One easy way to reduce this noise is to call the NuiSkeletonSmooth method if you’re using the C++ API, or if you’re using C# to set the TransformSmooth property of the SkeletonEngine of your NUI to true. For example, in C# if you have a Nui object called nui in your class, just do nui.SkeletonEngine.TransformSmooth = true; and the SDK will apply some basic filtering to your skeleton.

This doesn’t remove the suddon joint popping noise you see, but  does remove the regular noise (bouncing around of joints) pretty well. At any rate, this is an easy way to remove some of the noise in your skeletal tracking system.

Windows Phone 7 Silverlight Review App Button

I was trying to figure out how to create a ‘review my app ‘ button on the about page of a new app I was building (see below). This button will take your users to a page where they can review your app (quite useful for getting reviews, which is really important for keeping your app high in the Marketplace).

The review button (outlined in green) on my about page.

After a bit of Googling, it didn’t seem like anybody had posted the answer, perhaps because it was so easy to do this. Here is the code I added on my on button click event handler:

 

private void button2_Click(object sender, RoutedEventArgs e)
{
    // pop up the link to rate and review the app
    MarketplaceReviewTask review = new MarketplaceReviewTask();
    review.Show();
}

Basically all you need to do is show a new MarketPlaceReviewTask. Dead easy. Thank you Windows Phone 7!

 

Where do you click?

Screenshot of Google's in-page analytics

Screenshot of Google's in-page analytics

If you have a blog or website, you know how much fun it is to track your daily visitors, to see where people are coming from, and what they click on. Recently David showed me the Google in-page analytics feature. This is a really useful feature because it creates an overlay on your site showing you what percentage of visitors click your links. Using this feature I notices a fair amount of people were clicking the top bar, but nobody clicked on my blogroll. So I decided to move the link to my personal website to the top bar in the hopes that this will get people to visit it more. What’s more, when I went to visit my blog again, I noticed the Google overlay persisted! Really great!

If you have Google analytics, here’s how to get to the overlay:

  1. Select a site you are interested in.
  2. Go to content, on the left hand panel.
  3. Click on in-page analytics

That’s it! Another great feature from Google.

Hacking Kinect with C# using OpenNI: Basic Depth Viewer

Update 11/21/2011: If you’re using the newest version of OpenNI (not the version that was out when this was written originally, which was around April 2011), then I believe (according to comments) that you don’t need to write “using xn” at the top of your file, you might need to use a different line (such as “using OpenNi.net” or whatever the new namespace is). Right now I don’t have the time to set up OpenNI again (I have since switched to the Kinect for Windows SDK), but if anybody wants to comment on what using statement the used to include the OpenNi namespace I’d really appreciate it (and will update this message accordingly). Otherwise I’ll try to take care of this sometime during winter break.

So you want to hack a Kinect, but don’t feel comfortable or don’t want to bother with C++? Try using C#! This post walks through how to get a basic C# program (using the Windows Presentation Foundation) working using the Kinect. By the end of this post, you will know how to access the depth map of the kinect and have a program that displays the depth view in a semi-aesthetic image (ok, it’s kind of ugly….but you can change it to make it look better!).

Prerequisites

  1. You should be comfortable with programming C# or Java.
  2. You should have Visual Studio installed. If you’re a student, you can get Visual Studio for free from DreamSpark. Otherwise you can get Visual Studio Express here. I’m also going to assume that you’re somewhat familiar with Visual Studio. Make sure your version of visual studio supports Windows Presentation Foundation, I’ll be using it for this tutorial (though you don’t need to use this in general).
  3. I’m going to assume you’re somewhat comfortable with Visual Studio.
  4. Make sure you’ve installed the Kinect libraries and drivers and can get a sample app working [instructions].

Useful References

The OpenNI documentation is a good starting point. Also take a look at the sample code in the Samples directory wherever OpenNI and NITE are installed (for me this is C:\Program Files\OpenNI\Samples and C:\Program Files\Prime Sense\NITE\Samples). The NITE documentation is also  useful. It can also be found under Documentation in your NITE install directory (for me, C:\Program Files\Prime Sense\NITE). Finally, another good place to look is sample code. Look at the sample code under [NITE directory]\Samples, as well as open source code such as kinemote(link. NOTE: I can’t access the kinemote project right now, so for now you can download the kinemote source here).

Quick Start

If you’re comfortable with C# and did the C++ tutorials, all you need to do is include these dlls, and use your openni config file in [Open NI install directory]\Samples\samplesconfig.xml to get your code running. The interface is like the C++ interface, make sure to include the xn namespace.

Creating and Configuring Your Project

After you’ve installed the necessary drivers and frameworks, you’ll want to create a new C# project. If you want to follow the tutorial all the way, create a new Windows Presentation Foundation (WPF) Project, otherwise you can just create a regular C# project and figure things out yourself. From now on I’ll assume you created a WPF project and go from there. I’ll also assume that your main page in your WPF app is called MainWindow.xaml. It isn’t hard to adapt this tutorial to C# if you’re familiar with programming in C#.

After creating a project, do the following.

  1. Download OpenNI config files [link], and save these files in your project directory (I saved them under [project directory]/data).
  2. Download these two dlls [link], and save these files in your project directory (I saved them under [project directory]/lib)
  3. Add the OpenNI dll to your references (in Solution Explorer right click references->Add Reference, go to the browse tab and browse for them).

You should be ready to write some code now!

Adding an Image to your MainWindow Area

We want to display the depth data onto a video screen of sorts, which can also mean we display the depth data on an image that we update frequently. So, let’s add an image to our app!

  1. Open MainWindow.xaml, you should see the WPF visual editor:
  2. Add an image (from the toolbox on the left).
  3. Make the image 640 pixels wide and 480 pixels tall, and the containing window a bit bigger.
  4. Make sure to name your image “image1″

We’re also going to want to initialize the OpenNI framework and set other stuff up when the app loads, so let’s add an event handler which executes when the window loads.

  1. Click on the main window (you can just click the Window tag in the xml description of the designer), click the XAML button if you don’t see any xml.
  2. Go to properties-> events [IMAGE], double click the empty box near the Loaded event, it should automatically create an event called “Window_Loaded”.

That’s all the set up you need, at the end your MainWindow.xaml file should look something like this:

 

The Code

Paste the code below into your MainWindow.xaml.cs file (if you can’t open MainWindow.xaml.cs and only see the design view, right click on the design view and go to View Code). To use the depth image you need to initialize OpenNI and pull depth data Below I will explain how each part of my code does this.

Declarations at the Top and Member Variables

To use OpenNI, include “using xn;” at the top of your file. The member variables I used are also below.

Initializing OpenNI, Depth Viewers

Here’s how to initialize OpenNI:

Getting the Depth Data

The following code updates the depth data and does stuff to it.

Sample Code on Git

If you’re having trouble getting this to work or just want to see how I did it, I’ve posted sample code and the project I used for this tutorial on github, you can access it here. In general, all the code for my tutorials (as well as any other utilities I make available) will be there, so check back often!