This report provides an overview of a system designed for animating and calibrating 3D head models in compliance with the rules recently standardized by MPEG-4. Starting from a geometrical representation of a 3D model and the associated configuration file, the system can automatically build the animation rules, which translate the stream of Facial Animation Parameters (FAP) into the appropriate facial movements. The generic face can also be reshaped according to a subset of the Facial Definition Parameters (FDP) in order to match the appearance of a specific face. It is possible to create a realistic MPEG-4 head model by texture mapping the face (nose, mouth, chin, cheeks and ears), the hair, the eyes, and the environment. The Face Styler is an easy to use tool than can create such realistic heads.
It imports images of a person's head acquired from a variety of sources, such as digital cameras, scanned books and photographs, or public archives on the Internet. Using these images, an MPEG-4 compliant head described using Facial Definition Parameters (FDPs) can be interactively created in 15 minutes. The Face Styler generates data that can be directly imported by the Facial Animation Engine (FAE), which is an implementation of an MPEG-4 facial animation system. The FAE does not have the ability to display realistic hair or an environment. However, a loosely coupled integration layer provides this functionality without modifying the internals of the FAE. The achieved results, presented at the end of the paper, proof the effectiveness of the implementation, both in the animation and calibration process. A number of applications are foreseen in multimedia products, like virtual kiosks, virtual text readers and games.
Realistic facial animation is one of the most fundamental problems in computer graphics. The applications of facial animation are very diverse, and include fields that range from purely recreational to life enhancing. Perhaps the best known application of facial animation is in the film industry. Their systems are traditionally based on keyframe animation, with many parameters that influence the appearance of the face. Another application of facial animation is computer games, where titles such as Full Throttle and The Curse of Monkey Island used facial animation for their 2D cartoon characters. This trend continued into 3D titles, where games such as Tomb Raider and Grim Fandango used facial animation as the key tool to communicate the story to the player. Facial animation is also being applied in medical fields like facial surgery planning and previewing the effects of dental surgery.
However, these pre-operative applications would require a very accurate anatomical model of the patient's face. This is not very practical, because each face varies enormously from the next, and acquiring face data can be tedious Facial Animation can also be used as a teaching aid. Talking Tiles is an application of Hyper Animation that aids with the teaching of language skills. Facial animation could also be used to teach the hearing impaired. A face model could demonstrate how certain words are pronounced, while cut-away views show where the tongue needs to be positioned to create the desired sounds. The important issue to remember is that all these varied applications, film, computer games, medicine and teaching, use facial animation as a communications medium. That is, they utilise a computer simulation of a human face in order to reach the audience more convincingly.
FACIAL ANIMATION IN MPEG-4
MPEG-4 goes beyond the conventional concept of the audio/visual scene being composed of a sequence of rectangular video frames and an associated audio track. Instead, the scene is composed of a set of Audio-Visual Objects (AVOs). The sender encodes these AVOs into elementary streams, and transmits them via a single multiplexed communications channel. The decoder is responsible for extracting the elementary streams and compositing the decoded AVOs to form the scene. MPEG-4 specified two AVOs that represent synthetic faces, a Simple Face Object and a Calibration Face Object.
The following parameters are used by these objects: ·
FDP (Facial Definition Parameter): FDPs are responsible for defining the appearance of the face. The sender simply transmits a series of FDPs that describe the intended face. The receiver has its own generic face model, which is then deformed using the FDPs. This recreates an approximation of the intended face at the receiver's side. This stage is called the calibration of the facial model. Calibration is typically only performed at the start of a new animation session. ·
FAP (Facial Animation Parameter): The FAPs describe the movements of the face. They can either describe low-level animation (displacing single points on the face), or high-level animation (reproducing facial expressions). Unlike FDPs, the FAP stream is a continuous transmission.
An object profile describes the syntax and the decoding tools for a given object. An MPEG-4 decoder compliant with a given object profile must necessarily be able to decode and use all the syntactic and semantic information included in that profile. This mechanism allows one to classify the MPEG-4 terminals by grouping object profiles into composition profiles that define the terminals’ performance.