Medical Imaging & Computer Vision: March 2014

This blog post aims to provide an overview of the main trends in object detection I encountered in the literature.

Object detection is the process of automatically detecting in an image instances of a class, such as cars or pedestrians. Object localisation is often considered synonym of detection, the main difference I see in medical imaging is that an MRI scan will contain one and only one heart or brain, a localisation task thus assumes the presence of the object in the image. However, in a typical computer vision task, an image could contain several cars or none at all, and the detection task must thus decide whether the object is present or not before finding its location in the image. In the following, I will talk indifferently of detection or localisation.

The simplest approach to find a given object in an image is template matching. This is a very limited approach deprived of any generalisation: it is principally aimed at finding the position of a cropped image in the original version of the image, but it can also be used for objects that have very little variation within a class. An example application in object detection is Anquez et al. (2009) who used template matching to detect the eyes of the fetus in motion free MRI scans, as a starting point for a brain detection pipeline.

In order to take into account the variable appearance of objects within a class, a machine learning framework is usually adopted. An algorithm is trained on training data, using validation data to tweak its various parameters. Testing data, which has not been seen by the detector during training or validation, is then used to assess the accuracy of the trained detector (Bradski et al., 2008). In order to make decisions about images, the algorithm extracts features, which can be as simple as the difference of mean intensities over two rectangular areas (Criminisi et al., 2011), or more complex such as histograms of SIFT features matched to their nearest neighbour in a “vocabulary” of image patches (Csurka et al., 2004). These features are then passed to a machine learning method such as Boosting, Random Forest or SVM, to learn the appearance of the object during training, or to make a decision at testing time. If you look at the common interface for feature detectors in OpenCV and the generic API for classifiers in scikit-learn, you shall notice that image features and machine learning methods are building blocks which can be easily interchanged. Switching between SIFT and SURF, or SVM and Random Forest can be as easy as changing a line of code. Among other things, the choice relies on a trade-off between the desired performance in speed or accuracy, whether your features need to be rotation invariant, the size of your training dataset, whether you have multi-channel images, your hardware limitations such as memory, and of course the implementations you have at hand. Independently of the choice of image features or machine learning method, the questions I am mostly interested for this blog post are the following:

At which positions in the image do you want to run your classifier? At every pixel, every superpixel or only on salient regions?
When your classifer is positioned in the image, is it voting for the current location or for an offset location (see Hough transform)?
Are you running only one detector, or a cascade of detectors? How do you then define “coarse to fine”?
If you need to detect several parts of an object, how do you take the spatial configuration into account instead of running independant detectors?
How do you summarize image information and combine different features?

Last December, I had a short presentation to do, not necessarily too formal, so I decided to have a go at the different XKCDify codes lying around the web to make a more original LaTeX beamer presentation.

It seems that a version of the code is now shipped with Matplotlib, with a basic example here, and the whole galery of Matplotlib examples turned into XKCD plots there.

I did not notice any official code back in December, from what I understood, the idea originated on the Matplotlib users mailing-list from Damon McDougall, followed by a blog post by Jake Vanderplas. But the code I finally used is from Johannes Buchner, with some hard-coded parameterisation. To simulate hand drawn lines with Tikz, I used the LaTex code from percusse.

For the font, I recommand the LaTeX font catalogue to avoid any installation trouble, but in the end, I used the script from Holger Widmann to install Humor-Sans.ttf, which can be downloaded from several places.

Lastly, some LaTeX code to use the font:

%% use xkcd font
%% Note: that's a font with no bold or italic...
\usepackage[T1]{fontenc}
\definecolor{xkcd_color}{rgb}{.376,.435,.533}
\usecolortheme[named=xkcd_color]{structure}
\renewcommand{\familydefault}{xkcd}

and some page layout changes for the Boadilla theme:

%% do not use any \useoutertheme{}

%% disable navigation symbols
\setbeamertemplate{navigation symbols}{}

%% center slide titles
\setbeamertemplate{frametitle}[default][center]

%% customise the footline: only slide numbers
\setbeamertemplate{footline}{
\begin{beamercolorbox}[ht=2.5ex,dp=1.125ex,
      leftskip=.3cm,rightskip=.3cm plus1fil]{}
       \hfill    \small \insertframenumber/\inserttotalframenumber%
    \end{beamercolorbox}%
}

and here are the slides:

Automated segmentation and motion correction of the fetal brain

Regarding the content of the slides, a paper is under review, so more information about it will be presented in due time.

Thursday, 27 March 2014

Object detection

Sunday, 2 March 2014

XKCD-style beamer presentation (LaTeX)