2024-02-21

Paper / Motion Control

15 minutes read (About 2196 words) 0 visits

Paper-Learned-Motion-Matching

Keywords: Neural Network, Generative Models, Memory, 2020

Learned Motion Matching_SIGGRAPH_2020

DANIEL HOLDEN, Ubisoft La Forge, Ubisoft, Canada

OUSSAMA KANOUN, Ubisoft La Forge, Ubisoft, Canada

MAKSYM PEREPICHKA, Concordia University, Canada

TIBERIU POPA, Concordia University, Canada

Code: https://github.com/orangeduck/Motion-Matching

WorkFlow

Overview

Problem

The primary limitation of Motion Matching is that the memory usage (and run time performance to some degree) scale linearly with the amount of data that is provided and the number of features to be matched.

Inspiration

Recent methods have shown that neural-network-based models can be effectively applied to generate realistic motion in a number of difficult cases including navigation over rough terrain [Holden et al. 2017], quadruped motion [Zhang et al. 2018], and interactions with the environment or other characters [Lee et al. 2018; Starke et al. 2019].

Yet, such models are often difficut to control, have unpredictable behaviour, long training times, and can produce animation of lower quality than that of the original training set.

Methods

Basic Motion Matching

a feature vector $\pmb{x}$, which describes the features we wish to match at each frame.
$$
\pmb{x} = \lbrace \pmb{t^t} \pmb{t^d} \pmb{f^t} \pmb{\dot{f}^t} \pmb{\dot{h}^t}\rbrace \in R^{27}
$$

$\pmb{t^t} \in R^6$, 2D future trajectory positions projected on the ground, 20, 40, and 60 frames in the future (at 60Hz) local to the character.

$\pmb{t^d} \in R^6$, the future trajectory facing directions 20, 40, and 60 frames in the future local to the character.

$\pmb{f^t} \in R^6$, the two foot joint positions local to the character.

$\pmb{\dot{f}^t} \in R^6$, the two foot joint velocities local to the character.

$\pmb{\dot{h}^t} \in R^3$, the hip joint velocity local to the character.

void database_build_matching_features(
    database& db,
    const float feature_weight_foot_position,
    const float feature_weight_foot_velocity,
    const float feature_weight_hip_velocity,
    const float feature_weight_trajectory_positions,
    const float feature_weight_trajectory_directions)
{
    int nfeatures = 
        3 + // Left Foot Position
        3 + // Right Foot Position 
        3 + // Left Foot Velocity
        3 + // Right Foot Velocity
        3 + // Hip Velocity
        6 + // Trajectory Positions 2D
        6 ; // Trajectory Directions 2D
        
    db.features.resize(db.nframes(), nfeatures);
    db.features_offset.resize(nfeatures);
    db.features_scale.resize(nfeatures);
    
    int offset = 0;
    compute_bone_position_feature(db, offset, Bone_LeftFoot, feature_weight_foot_position);
    compute_bone_position_feature(db, offset, Bone_RightFoot, feature_weight_foot_position);
    compute_bone_velocity_feature(db, offset, Bone_LeftFoot, feature_weight_foot_velocity);
    compute_bone_velocity_feature(db, offset, Bone_RightFoot, feature_weight_foot_velocity);
    compute_bone_velocity_feature(db, offset, Bone_Hips, feature_weight_hip_velocity);
    compute_trajectory_position_feature(db, offset, feature_weight_trajectory_positions);
    compute_trajectory_direction_feature(db, offset, feature_weight_trajectory_directions);
    
    assert(offset == nfeatures);
    
    database_build_bounds(db);
}

Normalizing the features: scale each feature (e.g. left foot position) by its standard deviation.

//Z-score normalization: (x - mu) / std
void normalize_feature(
    slice2d<float> features,
    slice1d<float> features_offset,
    slice1d<float> features_scale,
    const int offset, 
    const int size, 
    const float weight = 1.0f)
{
    // First compute what is essentially the mean 
    // value for each feature dimension
    for (int j = 0; j < size; j++)
    {
        features_offset(offset + j) = 0.0f;    
    }
    
    for (int i = 0; i < features.rows; i++)
    {
        for (int j = 0; j < size; j++)
        {
            features_offset(offset + j) += features(i, offset + j) / features.rows;
        }
    }
    
    // Now compute the variance of each feature dimension
    array1d<float> vars(size);
    vars.zero();
    
    for (int i = 0; i < features.rows; i++)
    {
        for (int j = 0; j < size; j++)
        {
            vars(j) += squaref(features(i, offset + j) - features_offset(offset + j)) / features.rows;
        }
    }
    
    // We compute the overall std of the feature as the average
    // std across all dimensions
    float std = 0.0f;
    for (int j = 0; j < size; j++)
    {
        std += sqrtf(vars(j)) / size;
    }
    
    // Features with no variation can have zero std which is
    // almost always a bug.
    assert(std > 0.0);
    
    // The scale of a feature is just the std divided by the weight
    for (int j = 0; j < size; j++)
    {
        features_scale(offset + j) = std / weight;
    }
    
    // Using the offset and scale we can then normalize the features
    for (int i = 0; i < features.rows; i++)
    {
        for (int j = 0; j < size; j++)
        {
            features(i, offset + j) = (features(i, offset + j) - features_offset(offset + j)) / features_scale(offset + j);
        }
    }
}

a pose vector $\pmb{y}$, which contains all the pose information for a single frame of animation.

$$
\pmb{y} = \lbrace \pmb{y^t} \pmb{y^r} \pmb{\dot{y}^t} \pmb{\dot{y}^r} \pmb{\dot{r}^t} \pmb{\dot{r}^r} \pmb{o^{\ast}}\rbrace
$$

$\pmb{y^t} \pmb{y^r}$ are the joint local translations and rotations, local to the character forward facing direction.

$\pmb{\dot{y}^t} \pmb{\dot{y}^r}$ are the joint local translational and rotational velocities, local to the character forward facing direction.

$\pmb{\dot{r}^t} \pmb{\dot{r}^r}$ are the character root translational and rotational velocity, local to the character forward facing direction.

$\pmb{o^{\ast}}$ are all the other additional task specific outputs, such as foot contact information, the position or trajectory of other characters in the scene, or the future positions of some joints of the character.

Motion Database: $\pmb{X} = [\pmb{x_0}, \pmb{x_1}, \cdots, \pmb{x_{n-1}}]$

Animation Database: $\pmb{Y} = [\pmb{y_0}, \pmb{y_1}, \cdots, \pmb{y_{n-1}}]$

at runtime, every $N$ frames, or when the user input changes significantly, a query vector $\pmb{\hat{x}}$ is constructed which contains the desired feature vector. To control the features corresponding to the future trajectory position and direction we use a spring-damper based system which generates a desired velocity and direction based on the current state of the joystick.

/*
* convert the gamepad stick direction into a desired target velocity in the world space, 
* allowing for separate speeds for forward, backward, and sideways movement
*/
vec3 desired_velocity_update(
    const vec3 gamepadstick_left,
    const float camera_azimuth,
    const quat simulation_rotation,
    const float fwrd_speed,
    const float side_speed,
    const float back_speed)
{
    // Find stick position in world space by rotating using camera azimuth
    vec3 global_stick_direction = quat_mul_vec3(
        quat_from_angle_axis(camera_azimuth, vec3(0, 1, 0)), gamepadstick_left);
    
    // Find stick position local to current facing direction
    vec3 local_stick_direction = quat_inv_mul_vec3(
        simulation_rotation, global_stick_direction);
    
    // Scale stick by forward, sideways and backwards speeds
    vec3 local_desired_velocity = local_stick_direction.z > 0.0 ?
        vec3(side_speed, 0.0f, fwrd_speed) * local_stick_direction :
        vec3(side_speed, 0.0f, back_speed) * local_stick_direction;
    
    // Re-orientate into the world space
    return quat_mul_vec3(simulation_rotation, local_desired_velocity);
}
quat desired_rotation_update(
    const quat desired_rotation,
    const vec3 gamepadstick_left,
    const vec3 gamepadstick_right,
    const float camera_azimuth,
    const bool desired_strafe,
    const vec3 desired_velocity)
{
    quat desired_rotation_curr = desired_rotation;
    
    // If strafe is active then desired direction is coming from right
    // stick as long as that stick is being used, otherwise we assume
    // forward facing
    if (desired_strafe)
    {
        vec3 desired_direction = quat_mul_vec3(quat_from_angle_axis(camera_azimuth, vec3(0, 1, 0)), vec3(0, 0, -1));

        if (length(gamepadstick_right) > 0.01f)
        {
            desired_direction = quat_mul_vec3(quat_from_angle_axis(camera_azimuth, vec3(0, 1, 0)), normalize(gamepadstick_right));
        }
        
        return quat_from_angle_axis(atan2f(desired_direction.x, desired_direction.z), vec3(0, 1, 0));            
    }
    
    // If strafe is not active the desired direction comes from the left 
    // stick as long as that stick is being used
    else if (length(gamepadstick_left) > 0.01f)
    {
        
        vec3 desired_direction = normalize(desired_velocity);
        return quat_from_angle_axis(atan2f(desired_direction.x, desired_direction.z), vec3(0, 1, 0));
    }
    
    // Otherwise desired direction remains the same
    else
    {
        return desired_rotation_curr;
    }
}

find the entry in the Matching Database which minimizes the squared euclidean distance to the query vector.
$$
k^{\ast} = \underbrace{argmin}_{k} ||\pmb{\hat{x}} - \pmb{x_k}||^2
$$

// Motion Matching search function essentially consists
// of comparing every feature vector in the database, 
// against the query feature vector, first checking the 
// query distance to the axis aligned bounding boxes used 
// for the acceleration structure.
void motion_matching_search(
    int& __restrict__ best_index, 
    float& __restrict__ best_cost, 
    const slice1d<int> range_starts,
    const slice1d<int> range_stops,
    const slice2d<float> features,
    const slice1d<float> features_offset,
    const slice1d<float> features_scale,
    const slice2d<float> bound_sm_min,
    const slice2d<float> bound_sm_max,
    const slice2d<float> bound_lr_min,
    const slice2d<float> bound_lr_max,
    const slice1d<float> query_normalized,
    const float transition_cost,
    const int ignore_range_end,
    const int ignore_surrounding)
{
    int nfeatures = query_normalized.size;
    int nranges = range_starts.size;
    
    int curr_index = best_index;
    
    // Find cost for current frame
    if (best_index != -1)
    {
        best_cost = 0.0;
        for (int i = 0; i < nfeatures; i++)
        {
            best_cost += squaref(query_normalized(i) - features(best_index, i));
        }
    }
    
    float curr_cost = 0.0f;
    
    // Search rest of database
    for (int r = 0; r < nranges; r++)
    {
        // Exclude end of ranges from search    
        int i = range_starts(r);
        int range_end = range_stops(r) - ignore_range_end;
        
        while (i < range_end)
        {
            // Find index of current and next large box
            int i_lr = i / BOUND_LR_SIZE;
            int i_lr_next = (i_lr + 1) * BOUND_LR_SIZE;
            
            // Find distance to box
            curr_cost = transition_cost;
            for (int j = 0; j < nfeatures; j++)
            {
                curr_cost += squaref(query_normalized(j) - clampf(query_normalized(j), 
                    bound_lr_min(i_lr, j), bound_lr_max(i_lr, j)));
                
                if (curr_cost >= best_cost)
                {
                    break;
                }
            }
            
            // If distance is greater than current best jump to next box
            if (curr_cost >= best_cost)
            {
                i = i_lr_next;
                continue;
            }
            
            // Check against small box
            while (i < i_lr_next && i < range_end)
            {   
                // Find index of current and next small box
                int i_sm = i / BOUND_SM_SIZE;
                int i_sm_next = (i_sm + 1) * BOUND_SM_SIZE;
                
                // Find distance to box
                curr_cost = transition_cost;
                for (int j = 0; j < nfeatures; j++)
                {
                    curr_cost += squaref(query_normalized(j) - clampf(query_normalized(j), 
                        bound_sm_min(i_sm, j), bound_sm_max(i_sm, j)));
                    
                    if (curr_cost >= best_cost)
                    {
                        break;
                    }
                }
                
                // If distance is greater than current best jump to next box
                if (curr_cost >= best_cost)
                {
                    i = i_sm_next;
                    continue;
                }
                
                // Search inside small box
                while (i < i_sm_next && i < range_end)
                {
                    // Skip surrounding frames
                    if (curr_index != - 1 && abs(i - curr_index) < ignore_surrounding)
                    {
                        i++;
                        continue;
                    }
                    
                    // Check against each frame inside small box
                    curr_cost = transition_cost;
                    for (int j = 0; j < nfeatures; j++)
                    {
                        curr_cost += squaref(query_normalized(j) - features(i, j));
                        if (curr_cost >= best_cost)
                        {
                            break;
                        }
                    }
                    
                    // If cost is lower than current best then update best
                    if (curr_cost < best_cost)
                    {
                        best_index = i;
                        best_cost = curr_cost;
                    }
                    
                    i++;
                }
            }
        }
    }
}

Once found, if the current frame $i$ does not match the nearest frame, $i$ , $k^{\ast}$, then animation playback continues from this point $i := k^{\ast}$, and a transition is inserted using inertialization blending [Bollo 2016, 2017].

void inertialize_pose_transition(
    slice1d<vec3> bone_offset_positions,
    slice1d<vec3> bone_offset_velocities,
    slice1d<quat> bone_offset_rotations,
    slice1d<vec3> bone_offset_angular_velocities,
    vec3& transition_src_position,
    quat& transition_src_rotation,
    vec3& transition_dst_position,
    quat& transition_dst_rotation,
    const vec3 root_position,
    const vec3 root_velocity,
    const quat root_rotation,
    const vec3 root_angular_velocity,
    const slice1d<vec3> bone_src_positions,
    const slice1d<vec3> bone_src_velocities,
    const slice1d<quat> bone_src_rotations,
    const slice1d<vec3> bone_src_angular_velocities,
    const slice1d<vec3> bone_dst_positions,
    const slice1d<vec3> bone_dst_velocities,
    const slice1d<quat> bone_dst_rotations,
    const slice1d<vec3> bone_dst_angular_velocities)
{
    // First we record the root position and rotation
    // in the animation data for the source and destination
    // animation
    transition_dst_position = root_position;
    transition_dst_rotation = root_rotation;
    transition_src_position = bone_dst_positions(0);
    transition_src_rotation = bone_dst_rotations(0);
    
    // We then find the velocities so we can transition the 
    // root inertiaizers
    vec3 world_space_dst_velocity = quat_mul_vec3(transition_dst_rotation, 
        quat_inv_mul_vec3(transition_src_rotation, bone_dst_velocities(0)));
    
    vec3 world_space_dst_angular_velocity = quat_mul_vec3(transition_dst_rotation, 
        quat_inv_mul_vec3(transition_src_rotation, bone_dst_angular_velocities(0)));
    
    // Transition inertializers recording the offsets for 
    // the root joint
    inertialize_transition(
        bone_offset_positions(0),
        bone_offset_velocities(0),
        root_position,
        root_velocity,
        root_position,
        world_space_dst_velocity);
        
    inertialize_transition(
        bone_offset_rotations(0),
        bone_offset_angular_velocities(0),
        root_rotation,
        root_angular_velocity,
        root_rotation,
        world_space_dst_angular_velocity);
    
    // Transition all the inertializers for each other bone
    for (int i = 1; i < bone_offset_positions.size; i++)
    {
        inertialize_transition(
            bone_offset_positions(i),
            bone_offset_velocities(i),
            bone_src_positions(i),
            bone_src_velocities(i),
            bone_dst_positions(i),
            bone_dst_velocities(i));
            
        inertialize_transition(
            bone_offset_rotations(i),
            bone_offset_angular_velocities(i),
            bone_src_rotations(i),
            bone_src_angular_velocities(i),
            bone_dst_rotations(i),
            bone_dst_angular_velocities(i));
    }
}

Learned Motion Matching

the Motion Matching algorithm consists of three key stages:

Projection: where a nearest neighbor search is used to find the feature vector in the Matching Database which best matches the query vector.
Stepping: where we advance forward the index in the Matching Database.
Decompression: where we look up the associated pose in the Animation Database which corresponds to our current index in the Matching Database.

Decompressor

decompressor

Stepper

stepper

Projector

projector

# Motion Control

Paper-Learned-Motion-Matching

WorkFlow

Problem

Inspiration

Methods

Basic Motion Matching

Learned Motion Matching

Decompressor

Stepper

Projector

Like this article? Support the author with

Comments

Catalogue

Links

Categories

Tag Cloud

Recent

Archives

Tags

Your browser is out-of-date!