Archive for category Blog

I’ve mentioned in a previous post that, that I’ve hacked up some quick versions of Tetris to pass time while travelling (see see this post).

I’ve had this other version kicking around, which I wrote after flying back from Europe on the train from San Francisco to San Jose. Its in an iframe below. Click on it and use (a,s,d,w)

Or open in a new window

No Comments

Soko solve

Programming is a fun way to pass time.

A few years ago, I was playing lots of sokoban (or box pusher). Around the same time, I was also conducting a number of technical interviews, so was often thinking about self contained programming problems. Back as a grad student, in a single agent search class, we learned about searching with pattern databases–basically you solve an abstraction of the problem and use the solution length in that space to build up a heuristic (see this paper Additive Pattern Database Heuristics).

I’m sure there is plenty of research on the actual topic for sokoban, but I was interested in poking around the problem we were driving to Canada from California for a holiday.

The source code is up on github here:

https://github.com/nbirkbeck/soko-solve

 

And you can check out the demo (included in iframe below):

http://neilbirkbeck.com/soko-solve/

You can play the levels with ‘a’, ‘s’, ‘w’, ‘d’. Or you can click “solve” and “move” through the optimal path. The “Solve all” will benchmark a bunch of different methods. There is a much more detailed description of my analysis in these notes:

https://raw.githubusercontent.com/nbirkbeck/soko-solve/master/NOTES

No Comments

Shape stylized images

Its been a very long time…

I realize that there are a bunch of side projects that are worth a short blog post. About 1.5 years ago, I was working on some coding tutorials to help a friend learn to program. One of starter projects we worked on was to generate a stylized image by populating an image with a bunch of dots (or other shapes). This is inspired by colorblindness test patterns, as I really wanted to generate a bunch of test patterns for testing video color characteristics that would easily allow you to determine if color metadata (e.g., primaries, matrix coefficients, transfer functions) were being interpreted correctly by a video processing system or display.

Well, I never did actually generate the actual test patterns with this code. But we did generate a couple of interesting images. Below are example Andre the Giant

And here is another one that is a portrait of me:

Source code and sample input images on the github project: https://github.com/nbirkbeck/oneonetute

And a video:
 

 

No Comments

JavaScript depth mesh renderer

Was playing around with some old code for generating depth maps and decided to create a demo that renders the depth maps in WebGL. The color video is stacked vertically on the depth texture so that the two will always be in sync. Looked into packing the depth into the 24-bit RGB channels, but as the video codec is using YUV there was significant loss and the depth looked horrible. A better approach would be to pack the data into the YUV channels, but I didn’t try. For this example, the depth is only 8-bit.

You can see the one video here:
http://js-depthmesh.appspot.com/

Stacked color and dpeth

Stacked color and depth

js-depthmesh

, , ,

4 Comments

Updated source files for the random shape generator

Updated some source files for the random shape generator:
http://www.neilbirkbeck.com/source/meldshape-0.1.zip
http://www.neilbirkbeck.com/source/tiling-0.1.zip
http://www.neilbirkbeck.com/source/rshape-0.2.zip

,

No Comments

A voice basis

Not too long ago, Leslie and I were wondering about pronunciations of names.   We had found a site that had some audio samples of the pronunciations, and we had started playing several of them over and over again.  It sounded pretty funny listening to them, and I thought it would be neat to hear a real audio track represented with a bunch of people just saying names.  Then you keep adding more people saying more different names until you get something that “sounds” like the input.

The set of audio samples of someone saying names, become a basis for your audio signal.  I hacked together a quick program and some scripts to use a set of voices to represent an audio track.  The algorithm simply tries to fit the audio samples to the input signal and keeps layering audio to fit the residual.  It’s a greedy approach, where the best fit from the database is chosen first.  Each layer positions as many database samples over the input signal (or residual signal) in a non-overlapping way (see the code for more details).  There are probably much faster ways to dot his, perhaps using spectral techniques, but I wanted something quick and dirty (e.g., a few hours of work).

The result doesn’t sound near as compelling as I had imagined.  To emphasize that it is just people speaking names that are used to produce the target audio track, I’ve broken the target audio track into 5 second segments.  In the beginning, 5*pow(2, i) speakers are used to represent the signal for the i-th segment, so that the signal gets better as you listen longer.

The input audio track is a 60s sample from “Am I a Good Man”.

In the last segment, 10240 speakers are used to represent the signal.  Results:

five_kmw.mp3:The best example as it has the most names and the most speakers (names that start with K, M, V)

five.mp3:uses fewer names (only those that start with K)

three.mp3. uses only 3*pow(2, i) voices at each segment, so the approximation is not as good.

 

Code (with scripts, in a zip file):audio_fit.zip

, , ,

1 Comment

Updates, resume, cs-site, etc.

I guess this site will be undergoing an overhaul in the next little bit.  Also, fixed up some issues with my CS pages (http://www.cs.ualberta.ca/~birkbeck)

I’m trying to get better at updating my resume (and the associated page on this site).  In this regard, I’m grateful to my committee and graduate supervisors for my latest award (I’m just really slow posting this):

https://www.cs.ualberta.ca/research/awards-accolades/graduate

I’m honored to share this title with Amir-Massoud Farahmand.

, , , ,

2 Comments

View-based texture transfer

I recently worked on a project where it was necessary to transfer a texture from an existing 3D source model to another similar target model.  The target model shape is not exactly the same as the source model, and the topology and uv coordinates are different.  While I am sure that there are likely specific methods for doing this (i.e., similar to how normal maps of higher resolution geometry are baked into a lower resolution texture). However, in this example, the geometry can be quite a bit different.  In favor of using some existing tools that I have to solve this problem, I have a solution that is based on some of the work in my PhD on estimating texture of a 3D computer vision (CV) generated object when given calibrated images of the object (see section 4.4 , or this component).  In CV, since the object geometry is estimated from images, it is only an approximation of the true geometry.  For the problem of transferring a texture, we can use the same approach.

However, we are not given images of the source object, but we do have a texture so we can generate these synthetically.  In this way, we can even give important views more weight (e.g., by having more views of say the front of the object). For the sake of illustration, I will demonstrate this on the teddy example (downloaded from www.blender-models.com).  The source model has an accurate 3D geometry, and in this case the source model doesn’t have uv coordinates (the texture coordinates use the world coordinates for a volume-like texture, and the geometry encodes some detail).   I have removed some of the detail on the eyes and nose, then the object has been decimated and smoothed so that its silhouette is quite a bit different than the original input geometry.  The target object also has a set of non-optimal uv coordinates.  The differences in the target object may make it difficult to simply find the corresponding point on the source object in order to transfer the texture (similar to what I’m guessing what would be used for the baking of normal maps).

TeddyOriginal3DCrop TeddyOriginalWireframe TeddyDecimateSmoothed TeddyDecimateSmoothWireframe2
Teddy geometry (source) Teddy wireframe (source) Decimated (target) Decimated wireframe (target)

In order to transfer the texture from the source object to the target a number of synthetic images can be generated around the source object.

TeddyCameras TeddyLitInputImages

These images and the camera poses can be used as input to a texture reconstruction. In my PhD, I explored several alternatives for this problem.  Among these is simply taking a weighted average (avg), computing an optimal view for each texel with some regularization (opt), and a multi-band weighting (multi).  The last two can also be combined, so that the multi-band weight is used for low frequency and the high frequency info is grabbed from the optimal view. For the Teddy, I applied these methods to two configurations: a set of input images generated from the Teddy with illumination, and a set of images from the Teddy without illumination.  For basic texture transfer the latter configuration would be used. After applying these to the Teddy, you can see that a weighted average is blurred due to the difference in the target from the source.

Lit images

TeddyLitInput TeddyLitAverage TeddyLitOptimal TeddyLitMulti TeddyLitMultiOptimal
TeddyTextureAverage TeddyTextureOpt
TeddyTextureMulti TeddyTextureMultiOpt
Teddy input Average Opt Multi MultiOpt

Unlit images

TeddyNoLightInput TeddyUnlitAverage TeddyUnlitOptimal TeddyUnlitMulti TeddyUnlitMultiOptimal
Teddy input Average Opt Multi MultiOpt

The opt and multi both give excellent results despite the changes in the geometry, with some small artifacts (e.g., near the eyes for multi method).  The combination of the two methods gives a good overall texture capable, fixing up some of the small artifacts.  The opt method has some trouble with places that are not visible (e.g., under the chin).  In my prototype implementation, I had some scripts to statically place the cameras and perform the rendering of the images in blender, but the cameras could be placed in order to get full coverage.  The models and images are all available in the TeddyBlog.zip file.

, , ,

4 Comments

Publications

Turns out I wasn’t very active updating the publications. It is now up to date (has things after 2008).

http://www.neilbirkbeck.com/wp/nb-pubs.php

3 Comments

Its been a long time coming…

…not only for this post, but also the cleaning that I just started on the old filing cabinet. It feels nice to get rid of old paper, like receipts & visa statements, from over 10 years ago. The best part is going through the auto expenses, and basically discarding the entire folder–this last year with only one shared vehicle has been great.

One of the most painful folders to go through is the computer receipts. Its hard to swallow how much was spent on computer things. Back when I was just starting out as undergrad, I was playing a lot of Quake 3, and I got caught up in the “trying to have the latest hardware” cycle. This was also at a time when I was working a lot during the summer and still living with my folks, so maybe it was okay to spent $500 on the latest graphics card, if only to get a few more FPS out of quake.

Just out of curiousity I am going to tabulate all of the major expenses during that time:

Surprisingly, I only found receipts for a few mice and keyboards; they probably weren’t kept (Keyboards: 3, Mice: 4).

It all started somewhere around 1999, when I had just started university the year before.  I remember upgrading our home computer for about $500, and was psyched that I could play DVDs on the TV with it.  At about this time was when we first started to use the internet.

1999 ($253)

Quake 3 Arena Boxing Day ’99, $50
FaxModem $203

2000 ($3000!)

This was about the time when I really started to waste money upgrading and spending too much time playing quake.  I think there was a time when I was upgrading our old computer from 133MHZ processors to slightly faster ones that I could find on the edm.forsale newsgroup.   I was working two jobs over the summer and I had hurt myself skateboarding, so all of my efforts went into Q3.  I can’t believe how much was spent, and I’m sure that I bought a huge 19 inch monitor for over $300 during this time.  And these are only the ones I have receipts for.  Many other things were purchased from individuals (and a few sold).

In my defense, I was helping build computers for my G/F at the time, and also for my parents, and brother (I think).  The main problem is that I was caught up in the benchmarking scene.  Its impossible to keep up with it.  The other major cost associated with this was trying to overclock my Athlon.   I had penciled in some marks on the chip to unlock the clock multiplier, and when putting the heatsink back down, I partially crushed the casing.  A little later things weren’t working well (random blue screens), during quake, god forbid.  I had to take it into two places: one charged me for more expensive ram, and the other found out that it was just that one of the cards wasn’t seated all that well.  This chipped die went on to be a decent computer for years later.

Intel Pentium MMX-233 MHZ CPU $80
3DFX Voodoo3 2000 16MB $140
VooDoo3 300 16MB $115
13GB HDD $185
Thunderbird 800, $320
Asus A7V $250
Raid Controller $160
Tower $60
Samsung FDD $20
Power Surge $35
ASUS AGP V7700 GEFORE 2 GTS DELUXE 32MB $495
Gamepad: $55
Tower and 300 Watt PSU $80 + $45
Another diagonstic: $75
Diagnostic + 128 VCRAM: $295
CD writer: $339
Epson stylus: $250

2001 ($861)

Maybe I learnt my lesson from the previous year.  2001 didn’t seem so bad

Lexmark Printer: $246
HIS ATI Video Card $45
Speakers + SB Card $80 + $32
A7V133 Socket A, $188
MSI Starforce $145
Duron 750 $90
Duron Heatsink $35

2003 ($300)
ATI 9800 AGP128MB $300, April

2004 ($593):

Pyro video camera $80

Wireless router and card $183, 2004
Printer Ink and DVD writer: $330

2005 ($1024):
iPod Nano: $250
Samsung 17 inch monitor $387 * 2

2006 ($525):
LG Monitor $250,
160GB HD $90,
MSI 7600GS $185

Somewhere in between I bought Athlon 64 and kept that up until about 1.5 years ago.

It turns out that some of these things were actually good investments.  Obviously graphics cards and printers are not.  But the two monitors I bought in 2005 are still going strong.   Same with the surge protector: don’t have to worry about the current T-storm we are having.

In the end, I’m glad I no longer try to keep up on the latest hardware, but at the time I guess it was exciting.  It also seemed like things were changing faster back then.

No Comments

Region-based tracking

I recently had to revisit some tracking from last year. In the process, I uncovered some other relevant works that were doing region-based tracking by essentially segmenting the image at the same time as registering a 3D object. The formulation is similar to Chan-Vese for image segmentation, however, instead of parameterizing the curve or region directly wiht a level-set, the segmentation is parameterized by the 3D pose parameters of a known 3D object. The pose parameters are then found by a gradient descent.

Surprisingly, the formulation is pretty simple. There are two similar derivations:
1) PWP3D: Real-time segmentation and racking of 3D objects, Prisacariu & Reid
2) S. Dambreville, R. Sandhu, A.Yezzi, and A. Tannenbaum. Robust 3D Pose Estimation and Efficient 2D Region-Based Segmentation from a 3D Shape Prior. In Proc. European Conf. on Computer Vision (ECCV), volume 5303, pages 169-182, 2008.
3) There is another one from 2007. Schamltz.

This is similar to aligning a shape to a known silhouette, but in this case the silhouette is estimated at the same time.

The versions are really similar, and I think the gradient in the two versions can be brought into agreement if you use the projected normal as an approximation to the gradient of the signed distance function (you then have to use the fact that = 0). This should actually benefit implementation 1) because you don’t need to compute the signed distance of the projected mesh.

I implemented the version (2), in what took maybe 1 hour late at night, and then a couple more hours of fiddling and testing it the next day. Version 1) claims real-time; my implementation isn’t real-time, but is probably on the order of a frame or 2 a second. The gradient descent seems quite dependent on the step size parameter, which IMO needs to change based on the scene context (size ot object, contrast between foreground and background).

Here are some of the results. In most of the examples, I used a 3D model of a duck. The beginning of the video illustrates that the method can track with a poor starting point. In fact, the geometry is also inaccurate (it comes from shape-from-silhouette, and has some artifacts on the bottom). In spite of this, the tracking is still pretty solid, although it seems more sensitive to rotation (not sure if this is just due to the parameterization).

Here are 4 videos illustrating the tracking (mostly on the same object).  The last one is of a skinned geometry (probably could have gotten a better example, but it was late, and this was just for illustration anyhow).

http://www.youtube.com/watch?v=ffxXYGXEPOQ
http://www.youtube.com/watch?v=ygn5aY8L-wQ
http://www.youtube.com/watch?v=__3-QB_7jhM
http://www.youtube.com/watch?v=4lgemcZR87E

, ,

No Comments

Linked-flow for long range correspondences

This last week, I thought I could use linked optic flow to obtain long range correspondences for part of my project. In fact, this simple method of linking pairwise flow works pretty well. The idea is to compute pairwise optic flow and link the tracks over time. Tracks that have too much forward-backward error (the method is similar to what is used in this paper, “Object segmentation by long term analysis of point trajectories”, Brox and Malik).

I didn’t end up using it for anything, but I wanted to at least put it up the web. In my implementation I used a grid to initialize the tracks and only kept tracks that had pairwise frame motion greater than a threshold, that were backward forward consistent (up to a threshold), and that were consistent for enough frames (increasing this threshold gives less tracks). New tracks can be added at any frame when there are no tracks close enough. The nice thing about this is that tracks can be added in texture-less region. In the end, the results are similar to what you would expect from the “Particle Video” method.

Here are a few screenshots on two scenes (rigid and non-rigid).room_tracks

arm_tracks

But it is better to look at the video:

, ,

No Comments

Qt UiLoader runtime erros when cross-compiling

I was recently trying to build a windows version of the level-set shape generation (initial results, which is really a derivative of JShape), which is now titled MeldShape. The Mac version has come through several revisions now, and I figured if I was going to put anything up here, it might as well include everything (a document, some binaries, as well as a post).

Anyhow, I usually use a cross-compiler to build windows applications, and in the past I haven’t had any trouble getting a working build. However, this time was different.

I have working binaries for most of the libraries that MeldShape depends on, so building MeldShape was just a matter of updating these libraries and fixing any non-portable aspects within MeldShape. There were a few of these little things, like usleep and drand48, and checking validity of a pthread with (pthread_t*) != null (win32 pthread_t is a struct with a pointer inside). These things are obviously straightforward, and wouldn’t have even been an issue had I been thinking of a portable application in the first place. The real problem came within the Qt components.

When cross-compiling on linux for win32 targets, it is relatively straightforward to setup Qt. You use wine to install the windows libraries of Qt (the mingw build) and use your linux qmake with the appropriate spec file. Typically once you get the program to link you are out of the woods. But with Meldshape, linking was fine but running was always giving a symbol error in the Qt dll’s. It didn’t work either in Wine or in windows XP.

This was super frustrating, as I don’t have a working setup in windows using MinGW to build Qt applications. And in my experience, building in MinGW is really slow compared to the cross-compiler, so I really didn’t want to have to setup my environment from scratch (building from source). So I suffered through this problem, trying to figure out why on execution windows was complaining about missing symbols in the Dll (mostly in QtCore4.dll). I have seen similar problems to these when trying to run the executable with mismatched Dll’s (especially between the mingw and msvc builds of Qt), so I figured it had to be something with the versions of the Dll’s. I was using the same version as I had built with, so that was out.

I then tried an older version of Qt (since I had been using an older version in the past), and again no luck. With no other option, I started to strip my app to a barebones sample application to see if even that would work. And sure enough it was working fine (although it wasn’t referencing much else other than QApplication). The problem seemed to be something to do with one of the other libraries I was using.

I struggled with this for a while, and finally came up with the hypotheses that this was maybe due to loading parts of the UI with QUiLoader (from UiTools). After commenting out the few parts that use forms, it actually starts to work ???? This was at the point when I was ready to say, “screw the windows build”. I mean, the application is pretty simple, and at this point it is not even worth the effort. Anyway, I’m sure I am using forms in my other applications, so I have no idea at this point why using forms are causing problems with the Qt in windows. I decide to try QFormBuilder from the QtDesigner components instead. Luckily the API is pretty much the same, so almost no code (except for the declaration of the loader) has to change. Strangely enough, QFormBuilder worked fine.

I have no idea why QUiLoader was causing problems and QFormBuilder was not. I’m happy I found the problem, but at the same time I think the only reason I found it was due to luck. In the end it took almost 6 hours to find the problem and port the rest of the code…something I figured would take maybe 2 hours.

In the next little bit, I will try and upload the binaries and the technical document (as well as create a new project page for it)…all of the things that could have been done in that time it took to track down a non-sense bug.

, , , ,

No Comments

1D peak detection with dynamic programming

Not too long ago, I was working on something where it was necessary to extract the correct locations of peaks in a 1D signal.  The 1D input signal comes from some noisy measurement process, so some local method (e.g., thresholding) may not always produce the best results.

In our case, the underlying generator of the signal came from a process with a strong prior model.  For simplicy, assume that the spacing between the peaks can be modeled as Gaussian with mean, mu, and standard deviation, sigma.  Also, assuming that the input signal is the probability of a peak (e.g., the signal is real valued, in [0, 1]), the problem can be formulated in a more global way.

For example, in a Bayes approach, the problem would be formulated as maximizing p(model | signal), where the model = {x_1, x_2, …, x_k} is the set of points where the peaks occur in the signal.  The MAP estimate is obtained by maximizing p(model | signal) ~= p(signal | model) p(model), or minimizing -log(p(signal | model)) – log(p(model)).  Assuming a simple Gaussian distribution between the spacing of the peaks, the prior, p(model), is easily defined as \sum_{i=2}^N exp(((x_i – x_(i-1)) – mean)^2/(2 sigma)^2).

For the negative log-likelihood term, we can use something as simple as -log(p(signal | model)) = \sum_{x} signal(x) * f(x; model) + (1 – signal(x)) * (1 – f(x; model)).  Where f(x; model) = 1 where for some j, x == x_j, and 0 otherwise.

On first encounter with this problem, it seemed like dynamic programming would be ideal to solve it.  In order to use Dynamic programming, we relax the unknowns, x_i (represent the ordered locations of our peaks) and allow them to take on any value in the input domain.  The dynamic programming solution is then straightforward.  In the first pass, the total cost matrix for all the peak locations is created.  For each variable x_i, we store for each possible input location, the total cost for that location and all previous variables.  In order to compute the total cost for variable x_{i+1}, each possible input location finds the best combination of x_i’s total cost, the likelihood, and the prior cost between x_i and x_{i+1}.

The only difference is that at each peak location, the likelihood for the region after that peak location is not considered (until later) and the peak location is penalized for appearing before x_{i-1} (ensures ordering).

Once the total cost matrix, C, has been computed (e.g., NxM matrix with N peak locations and an input signal of length M), the likelihood beyond x_i can be considered.   For each of the possible peak locations, we accumulate the likelihood beyond the candidate peak location  (e.g., C(i, j) = sum(neg_log_likelihood(S(j+1: M))).  If done this way, each row of the cost matrix now gives the total cost if we were to consider only i peaks in the input signal.  That is, we get the best locations of the peaks for all possible numbers of peaks.

In the following synthetic examples, the global dynamic programming method was used to extract the peaks in the input signal, as well as the number of  peaks present.  The global method used searched for up to 5 more peaks than were present in the ground truth.  For the top examples the input peaks had a mean of 14 and standard deviation of 2.  The dynamic programming method used the ground truth from the underlying model during its prediction.

Easy 4

Simple example, where a non global method would work

Easy 10

Another simple example, with more peaks, number automatically selected

Easy 4

A more difficult example with more noise (4 peaks)

Easy 10

A more difficult example with more noise (10 peaks)

Hard 4

A harder example, notice that the last peak is detected correctly, even though the input signal is almost homogeneous near it.

Easy 10

A harder example where it would be harder to select the ideal threshold.

Hard 4

The success of the global approach depends on the prior model providing some information in the presence of noise. With a small standard deviation, significant noise can be overcome (notice the high peak at around 70 in the input is not a true peak in the ground truth signal).

Easy 10

When the prior model has a large standard deviation, the global method can not as easily resolve ambiguities. In this case, the wrong number of peaks is selected and the peak near time 50 is missed.

The above tables were generated in octave (dyprog1D source code).

No Comments

Canadian getting a J1 visa

I wanted to share some of my experiences of a Canadian student obtaining J1 status.  Originally, I found the information on the state websites to be somewhat contradictory.   Maybe, contradictory is the wrong word, it is just that the majority of the information is obviously directed to the international community (minus Canada).  There are some specifics for Canadians, but just the sheer presence of the data directed at internationals makes it easy to become uncertain about what is actually required.

When crossing the border yesterday, I had all of the information that I thought was required (see the 5 points below). The Canadian specific site claims you need (http://www.consular.canada.usembassy.gov/student_exchange_visa_usa.asp

Canadian Citizens do not need visas to study in the U.S. You do need to obtain an I-20 (or DS-2019) Certificate of Eligibility from the university that you plan to attend. At the time you receive the I-20 (or DS-2019) you will be registered with SEVIS, the student tracking system. You will be assigned a SEVIS number, and be required to pay a registration fee.

When you cross the border to study you will need to provide the Officer at the port of entry:

  • Proof of identity and citizenship (a Canadian passport for example)
  • The original I-20 (or DS-2019) certificate
  • Proof that you have paid your SEVIS fee
  • Proof that you have the funds to pay for the school that you plan to attend
  • Proof of your ties to Canada

After investing more than $500-600 in the process of paying the SEVIS fee and paying for insurance,  I wanted to make sure that this was adequate (i.e., no appointment was necessary at a consulate, nor any extra forms were required).  For more information, I called the pay-line to get more details;  I actually called twice, and both of them confirmed the above.  I  was still a bit tense, up until crossing the border this morning.  After standing in the customs line, the first officer turned me back because I didn’t have an I-94 form filled out.  Luckily this is just like the customs sheet (available near customs).  After filling it out, I tried again.  The officer looked over my things, stamped my passport and I-94, and I was on my way.  Despite the next customs officer ping-ing me into a separate holding area where I was pong-ed immediately back out  (as it was not necessary).  I still wanted to make double sure, so I asked the ping-ing officer if this stamp on the I-94 was my visa.  His reply, “Canadian citizens don’t get visa’s”.  I had heard this somewhere else, and it is confusing, but I think this is the equivalent of visa status.

So as far as  I know everything is all good.

More general information (exceptions).
http://www.consular.canada.usembassy.gov/exceptions.asp

Specifics:
http://www.consular.canada.usembassy.gov/usa_visa.asp#exchange

,

No Comments

Total recall

About a month ago, I picked up some random books from the Wee Book Inn.  I decided to take one, Total Recall, with me on my flight yesterday.  When the airline announced a likely 4 hour delay due to landing gear problems, I decided to give it a go.   I know what you are thinking: Total Recall, with Arnold Schwarzenegger, was based on a book?  Well, no, this book is actually about memory performance–the full title is “Total Recall: How to Boost your Memory Power.”   As I said, when I purchased this book, it was a random buy;  I am not too worried about my memory yet.  Anyhow, I got hooked on this book.

It starts out with the types of blocks that affect memory: emotional, mechanical, and physical.  In this first part, Joan Minninger gives several real examples of people that have trouble remembering.  These examples have the same impact as those in Dale Carengie’s seminal books, which is why I enjoyed reading it.  Take one the examples for an emotional block, where a woman cannot remember her recent vacation.  She wants to impress her friends, but the reason she cannot recall her vacation is because her friends are better talkers than she, and they don’t really want to listen to her stories (at least she feels this way).  There are plenty of interesting stories like this, and some of them include people with photographic memories and who experience synesthesia (like Daniel Tammet).

The book then has chapters on the kinds of memory, the three r’s (registration, retention, and retrievel), and theories of how the brain works.  Then the latter part of the book is about improving your memory.  Many of the things you probably already know about, like association, mnemonics, and taking the information in in different forms.  Some of these are specific to remembering faces/names, numbers, studying, repetition, etc.  The methods for remembering information from books and lectures were presented in a way that is similar to software design patterns.  The author presents several patterns of how reading material is often organized: problem pattern, opinion pattern, thesis pattern, information pattern, and instruction pattern.   Most of these are probably apparent, if you thought about it long enough, but having just read a software design pattern book, I was amused at the similarities of how these patterns were presented in her writing to the software patterns.

, , ,

No Comments

Game Developer’s Open Source Handbook

I was recently thumbing through some books at the library, and came across the Game Developer’s Open Source Handbook by Steven Goodwin.  As a longtime Open Source user, I had to look into this book to Open my eyes to some other projects.

The book as a pretty good intro to the origins of the free and open software movements as well as notes on why and when it is appropriate to use such codes in games.  There is also some summaries of open source licenses and notes on when different components using different license can be mixed (and also what is required of you when you are shipping a title).

For the most part, however, I was more concerned with the tools that I hadn’t know about.  The book covers development in a GNU/Linux environment.  For graphics, there is info on some 2D (e.g., SDL) and 3D engines (CrystalSpace, Irrlicht, and Ogre).   I was interested to find out that there was libraries mentioned in there for interactive consoles (OGLCONSOLE) and font handling. I have a small wrapper library for rendering ttf fonts in GL using freetype2, but the FTGL seems like a very worthy alternative.

There is a chapter on audio, something that I have not been too concerned with in the past.  I have barely used OpenAL (in my GT racer demo), and have written to the /dev/dsp files in Linux before (NES emulator).  I have also used FMOD for mp3 playback in an old bomberman implementation (I believe they have a non-commercial license).  The book details some tools for sound conversion, although I pretty much always rely on mplayer, ffmpeg, or gstreamer for video and sound needs.

For physics there are also several choices.  ODE is a reasonable solution (discussed in the book), which I have played around with before.  I think that Bullet and Box2D were probably too recent to include in the book.   The book mentions some of the other libraries useful for collision detection (e.g., OPCODE).

There are also several libraries listed for networking, including SDL_net, torque network library, and clanNetwork.  I have personally never used any of these, but I figure they are probably worth looking into (although this is one of the easier areas to write your own).

Scripting was something that I was actually a bit more interested in.  The book covers some of the details of using Lua, and possibly Guile (with Python, Java, and some other dynamically typed languages coming in as mentionables).  I was a bit dissapointed that there wasn’t more detail in these sections, but I guess that is because it was something that I wanted to know more about.

There was a bunch of other useful utility libraries mixed in, including some for parsing xml (e.g., expat), and several libraries for GUI controls (CEGUI, SDLtk, GGGUIChan, ParaGUI).  After taking a brief look at some of these, I ranked them in this order:  GG, GUIchan, paragui, SDLtk.    It was interesting to find out about the generic game tree library  (GGTL), and internationalization with gettext (something that I haven’t used ever, but the book provided a good enough overview of its capabilites).

Then for tools and production, some of the well known apps were mentioned (e.g., Blender, gimp, audacity, ImageMagick, ffmpeg, mplayer).  Some other tools for modeling included JPatch and for film editing there was Kino and Cinepaint.

For the most part,  the book brought my attention to a bunch of other middleware-like components that I either wasn’t aware of, or had forgotten about.  The above list isn’t exhaustive, and I’m sure there new libraries for each of the components.

, , , , ,

No Comments

Some links

I was browsing google projects today, and I came across some things that I probably have seen before, but forgot.

Another boost feature that I just learnt about, that could come in pretty handy: http://www.boost.org/doc/libs/1_40_0/libs/conversion/lexical_cast.htm

No Comments

Variational Displacement Map Recovery

Shortly after working on some variational scene flow (from a single moving camera), I thought it might be a good idea to implement the same ideas to reconstruct both a displacement map and a flow map on top of a base mesh.  The variational formulation for displacement map estimation is more or less the same.  I parameterized the displacement as displacement along the normal (something that we have done before), so the objective is to find the displacements on the mesh such that the image score is minimized (in this case, pairwise SSD scores), while having a regularization constraint over the displacements (and flow vectors) in the uv-coordinates.

I had implemented this idea, and barely tested it on anything.  This last week, I figured that I could use parts of the project to generate some data.  So I wanted tos hare my results.  Below is a sample of three input images from a synthetic sequence.  The images were lit from several lights to ensure the best conditions for the shape estimation (e.g., the SSD score wouldn’t get confused).  The results look pretty good. And they should.  This is a pretty good situation for stereo.

Input images for the displacement map estimation

Input images for the displacement map estimation

displacement_results

Base mesh, rrecovered displaced mesh, and recovered displaceme map

The idea of solving for flow required that there were multiple images of the object deforming over time.  Again, I tested this on a similar sequence, where now the object had some texture (to enable the flow recovery), and I also introduced some motion.  The idea is now to recover both the displacement map (that ensures stereo consistency at time t=0), and also the 3D flow map that warps this image forward in time (t > 0).  Ideally, there would also be some temporal consistency between flow maps at (t>0), but for now I simply solved for the displacement and flow simultaneously for pairs (t=0, t=1), (t=0, t=2), etc

In this case the input sequences look something like the sequence below:

Again, the reconstruction, for the most part was okay.  There is one exception: the displaced meshes sometimes overlap/intersect, which means that they are not as useful in the application that I wanted to use them in (that is without post processing).  Notice that there is flow roughly in the regions of the eyse and near the mouth, which agrees with the input sequence.  The displacement looked similar to the non flowed case.

The u, v, and w- components of the flow for the last image.

The u, v, and w- components of the flow for the last image.

The resulting, recovered mesh appears beside the input rendering in the following video.  I could have probably chosen the regularization parameters better.  If the video doesn’t load, try this link: flowed_result.

, , , , , ,

No Comments

Spectral regularization on computer vision problems?

I recently attended a talk by Robert Tibshirani, who mostly presented his work (and related developments) of the Lasso. The talk was very interesting, but I found some of his most recent work on spectral regularization for matrix completion particularly interesting. There are several problems in vision that suffer from unknown data, and this tool seemed like it could help out. I quickly prototyped some solutions using his/their approach and it does seem to have benefits in these applications, although I’m not sure of the impact.

I have written a draft document that details the problems that I was thinking about: spectral-reg-vision.pdf

And some quick and dirty matlab scripts that were used in the testing: spectreg

No Comments

Using GPC

The polygon growing is coming along. Found this nice tool for polygon clipping: GPC. It is a C library that is all in one file. Nice and lightweight, efficient, and has a simple API (there are only a 6 or so functions, only two of which you need to use). And it does polygon union, difference, XOR, and something else I think. I’ve been using it to convery the meldshape (level set generated shapes) non-overlapping images (with spaces) into complete tilings of the plane:

spaced

Input (want to get rid of spaces)

tiled

Output (complete tiling, shapes do not overlap)

Maybe in the near future I will have the upload (it is a complement to JShape: the random shape generator). And also some details on how the GPC library was used to generate these.

No Comments

GPU image registration code

Last year, at about this time, I went through an itch to implement various optic-flow algorithms. One that was brought to my attention by a classmate was the grid-powered non-linear registration from the medical community:

http://www.mendeley.com/research/grid-powered-nonlinear-image-registration-with-locally-adaptive-regularization/

Shortly after implementing the method, I was driven to write a GPU optimized version. First, because I wanted to use Cuda. After I got started, I realized it was just as easy to implement the idea using shaders alone. I tested out using gauss-siedel to solve the sparse system of equations on the GPU needed for the diffusion component of the algorithm (they actually use adaptive operator splitting and a fast tri-diagonal solver in the paper, I believe). I remember not being all that impressed by the speedup that I attained (somewhat less than 2 times for this solver including read back, maybe 4 times without reading back). I ended up using an explicit filtering approach for the diffusion.

I wanted to post the code (it has some internal dependencies for creating windows and initing shaders, if anyone is interested I can supply): gpu-gauss-0.0.tar.gz

I didn’t actually compare the timings to the CPU implementation, but here is an example input sequence (same image perspectively warped):

input

Below is a cross-faded animation of the warped gifs. I didn’t really tweak the parameters (there are a couple for # iterations), but one-way flow took about 0.9 seconds on a geforce 8800 (including loading the images and saving, etc). This was run using the command: ./warp /tmp/w0.png /tmp/w1.png --seq=0 --its=20 --upd_its=20 --glob_its=40. There are a couple of artifacts around the boundary, but for the most part it looks pretty accurate. I have a couple other implementations of optic flow to post sometime.

w

No Comments

Smooth depth proxies

Recently tested out an idea for using a view-dependent Laplacian smoothed depth for rendering. Given a set of 3D points and a camera viewpoint, the objective is to generate a dense surface for rendering. Could either do triangulation in 3D, in the image, or some other interpolation (or surface contouring).

idea

This was a quick test of partitioning the image into bins and using the z-values of the projected points as constraints on a linear system with the Laplacian as smoothness. The result is a view-dependent depth that looks pretty good.

Check out the videos below. The videos show screenshots of two alternatives used for view-dependent texturing in a predictive display setting. Basically, an operator controls a robot (which has a camera on it). There is some delay in the robots response to the operators commands (mostly due to the velocity and acceleration constraints on the real robot). The operator, however, is presented with real-time feedback of the expected position of the camera. This is done by using the most recent geometry and projective texture mapping it with the most recent image. The videos show the real robot view (from PTAM in the foreground) and also the predicted view (that uses the geometric proxy) in the background. Notice the lag in the foreground image. The smooth surface has the a wireframe overlay of the bins used to compute the surface. It works pretty good.

Currently, the constraints are obtained from the average depth, although it may be more appropriate to use the min depth for the constraints. A slightly more detailed document is available (although it is pretty coarse): lapproxy.pdf

Another improvement would be to introduce some constraints over time (this could ensure that the motion of the surface over time is slow).

No Comments

Level set shape generation

Not much to say about this yet.  Trying to work on something for an artist.  This is a first attempt.  Starting from the same starting position, each of the other configurations have random stiffnesses, which means different images get generated.

trials

Some movies:

trial1trial3trial4

trial2

No Comments

Space-time texture used!

Again, we were recently working on a project involving a robot and a colleague’s (David Lovi’s) free space carving to acquire a coarse seen model. The model is fine as a proxy for view-dependent texturing (e.g., lab-flythrough):

But to generate a figure it is nice to have a single texture map. Since the geometry is course, a simple averaging is out of the question:

mergedAfter some minor updates to my initial implementation (to take into account the fact that all images were inside the model), the space-time texture map (idea anyhow) seems to work pretty good. Remember that this assigns a single image to each triangle, but it does so in a manner that tries to assign neighboring triangles values that agree with one another while ensuring that the triangle projects to a large area in the chosen image.

texOf course the texture coordinates above are not the greatest. The texture was generated from some 60 keyframes. The vantage points below give you an idea of the quality of the texture:

lab-single-tex

Again, the model is coarse (obtained from key-points and carving), but space-time texture map idea works pretty good to get a single texture.

The data for the model was obtained with the prototype system (described in this video http://www.neilbirkbeck.com/wp/wp-content/uploads/2010/03/IROS_predisp.mp4)

No Comments