INTERVIEW by EVA RECINOS | PHOTOGRAPHY courtesy of KEVIN KARSCH
Reality has never been so shaky. With the advent of current technologies, you can copy and paste yourself into a photo with Angelina Jolie, fix your friend’s red, flash-stricken eyes, and easily morph your photos to seem of another century. But while the best-known photo-tweaking tool might be Photoshop, there are countless other projects underway making it easier to twist reality in a picture. And one of the brilliant thinkers behind their development is Kevin Karsch, a professed video game lover and part of the group developing a way to insert synthetic objects into any photograph. What’s so special about it? The objects interact with their surroundings in a new way — seamlessly, from lighting to shadows. Whether via advertising or real estate, Karsch’s work is seriously changing the way photos are altered.
Tell me a little about yourself.
I went to the University of Missouri, and ever since high school I wanted to make video games, so that’s how I got into computer science. Now I’m at the University of Illinois, and we’ve been working on this project for the past year or so.
What exactly is a synthetic object and a legacy photograph?
A synthetic object is a 3-D model, so it’s a representation of a real-world surface you can input into a computer and play around with. So when you watch a Pixar or Dreamworks movie, all the 3-D objects are the synthetic objects, and that’s what we are referring to. These are the objects you see we’ve inserted in the paper and video that is online. When we talk about inserting objects, we mean we have a 3-D model of that object to be inserted. We’re not taking a clipping from a photograph and inserting it into a photograph (as you might do with Photoshop), at least not yet.
By a legacy photograph, we mean a picture taken previously that you want to be able to insert things into. There are existing methods to do this kind of thing if you can go back and get measurements of the scene or multiple photos, but the novelty [of the project] is that you can do this without going back to the original scene and without additional scene measurements.
So you mentioned Photoshop… can you talk a little about how your project is different from Photoshop?
With Photoshop, you’re directly manipulating the pixels you’re working with. There’s a bunch of tools that help automate that process, but the underlying thing is that you’re manipulating the 2-D grid of pixels in front of you. The end result is a 2-D picture like ours, but instead of being able to interact with a 3-D scene, as in our method, you have to manipulate the 2-D pixels on the canvas. With our technique, you take a picture and from that picture you can interact with it as if it were a 3-D scene. The geometry, lighting, and perspective will be calculated automatically. In Photoshop, you’d have to do this without any aids, which makes it much more tedious and difficult to get right. It usually takes an expert and a great deal of time to get good results. Our goal is a system that allows novice users to achieve these complex edits, but much more quickly.
So how does the process begin? What types of files can the technology be used on?
All you need is an image and our software. You import the image and then you follow the instructions in the interface. There’s a method of interaction that requires you to mark the scene boundaries — for example, the floors, walls, ceiling, and maybe any objects — and also indicate light sources by clicking them in the picture. Then we automatically refine these annotations and produce a 3-D scene that we can insert objects into; this can be done in five or ten minutes. It’s not quite ready yet, but we’re working on an online demo so that everyone can try out our system.
How did you first get involved in this project?
In undergrad, I started doing computer graphics and computer vision research. Computer graphics can mean a lot of things, but I’ve been working on making synthetic pictures look photorealistic. Computer vision, at a high level, is how to make a computer interpret pictures and videos the same way people do. This means being able to tell how far away something is automatically, or what the material and object is made out of, or where the light is coming from — things like that. When I went to grad school, I told my advisors this was my area of interest. They were working on an idea they pitched to me, along the lines of “wouldn’t it be cool if you could tweak a picture as if you were in the scene?” This is the initial step towards that goal and we have a long way to go to realize the rest of it.
What kind of projects are you hoping people will use this technology for?
We’ve gotten a bunch of interest, even from places we didn’t expect. The obvious ones are for visual effects. Hollywood already does this kind of thing, but they use the processes I described before, where they go back to the scene and take measurements and insert objects realistically. For more expensive productions, they can go back and do this, but our method might be helpful in certain circumstances. For lower budget films that can’t do this or even legacy footage or pictures, our method could be used.
More interestingly, I think things like furniture companies or real estate could get a lot of use out of this, where someone could take a picture of their home or a room in their home, and they could insert realistic models into the picture and have it look as if they had furniture in their room. This would be a good way to redecorate your room quickly and without measuring anything.
Also, for historical documentaries, you might have things you want to insert for educational purposes into old footage.
How can Daily BR!NK readers contribute to your success?
The biggest one would be feedback. We’ll be releasing this demo soon, and hopefully all the readers can play with it. Feedback would be the best way to incorporate what kind of things they envision this could be useful for and what else they want to see from it. This will probably be the focus of my thesis research, and I have another two or three years to keep going in this direction, at least until the end of my PhD. If there are things they’d like to see, maybe we can put some work into that. If there’s a strong consensus for one feature or a certain request, that’ll definitely give it some precedence.