tessl/npm-tensorflow-models--posenet

Pretrained PoseNet model in TensorFlow.js for real-time human pose estimation from images and video streams

—

Pending

Overview

Eval results

Files

Multiple Person Pose Estimation

Name: tessl/npm-tensorflow-models--posenet
Author: tessl

Robust pose detection algorithm for images containing multiple people. Uses non-maximum suppression to avoid duplicate detections and sophisticated decoding to handle overlapping poses.

Capabilities

Estimate Multiple Poses

Detects and estimates multiple poses from input image using advanced decoding with part-based non-maximum suppression.

/**
 * Estimate multiple person poses from input image
 * @param input - Input image (various formats supported)
 * @param config - Configuration options for multi-person inference
 * @returns Promise resolving to array of detected poses
 */
estimateMultiplePoses(
  input: PosenetInput,
  config?: MultiPersonInferenceConfig
): Promise<Pose[]>;

Usage Examples:

import * as posenet from '@tensorflow-models/posenet';

// Load model
const net = await posenet.load();

// Basic multiple pose estimation
const groupPhoto = document.getElementById('group-image') as HTMLImageElement;
const poses = await net.estimateMultiplePoses(groupPhoto);

console.log(`Detected ${poses.length} people`);
poses.forEach((pose, index) => {
  console.log(`Person ${index + 1} confidence:`, pose.score);
});

// With custom detection parameters
const poses2 = await net.estimateMultiplePoses(groupPhoto, {
  flipHorizontal: false,
  maxDetections: 10,      // Detect up to 10 people
  scoreThreshold: 0.6,    // Higher confidence threshold
  nmsRadius: 25           // Larger suppression radius
});

// Filter high-quality poses
const goodPoses = poses2.filter(pose => pose.score > 0.7);
console.log(`Found ${goodPoses.length} high-quality poses`);

// Process each person's keypoints
poses.forEach((pose, personIndex) => {
  const visibleKeypoints = pose.keypoints.filter(kp => kp.score > 0.5);
  console.log(`Person ${personIndex}: ${visibleKeypoints.length} visible keypoints`);
  
  // Find pose center (average of visible keypoints)
  if (visibleKeypoints.length > 0) {
    const center = visibleKeypoints.reduce(
      (acc, kp) => ({ x: acc.x + kp.position.x, y: acc.y + kp.position.y }),
      { x: 0, y: 0 }
    );
    center.x /= visibleKeypoints.length;
    center.y /= visibleKeypoints.length;
    console.log(`Person ${personIndex} center:`, center);
  }
});

// Real-time video processing
const video = document.getElementById('webcam') as HTMLVideoElement;
async function processVideoFrame() {
  const poses = await net.estimateMultiplePoses(video, {
    flipHorizontal: true,
    maxDetections: 5,
    scoreThreshold: 0.5,
    nmsRadius: 20
  });
  
  // Draw poses on canvas or process data
  drawPoses(poses);
  
  requestAnimationFrame(processVideoFrame);
}

Multiple Person Configuration

Configuration options for multi-person pose estimation with advanced parameters.

/**
 * Configuration interface for multiple person pose estimation
 */
interface MultiPersonInferenceConfig {
  /** Whether to flip poses horizontally (useful for webcam feeds) */
  flipHorizontal: boolean;
  /** Maximum number of poses to detect in the image */
  maxDetections?: number;
  /** Minimum root part confidence score for pose detection */
  scoreThreshold?: number;
  /** Non-maximum suppression radius in pixels */
  nmsRadius?: number;
}

Default Configuration

const MULTI_PERSON_INFERENCE_CONFIG: MultiPersonInferenceConfig = {
  flipHorizontal: false,
  maxDetections: 5,
  scoreThreshold: 0.5,
  nmsRadius: 20
};

Configuration Parameters

maxDetections (default: 5):

Maximum number of people to detect in the image
Higher values detect more people but increase processing time
Typical range: 1-20 depending on use case

scoreThreshold (default: 0.5):

Minimum confidence score for a pose to be returned
Range: 0.0 to 1.0
Higher values = fewer but more confident detections
Lower values = more detections but potentially false positives

nmsRadius (default: 20):

Non-maximum suppression radius in pixels
Prevents duplicate detections of the same person
Larger values = more aggressive suppression
Must be strictly positive

flipHorizontal (default: false):

Whether to mirror poses horizontally
Set to true for webcam feeds that are horizontally flipped
Affects final pose coordinates

Input Types

Multiple pose estimation supports the same input formats as single pose:

type PosenetInput = 
  | ImageData        // Canvas ImageData object
  | HTMLImageElement // HTML img element  
  | HTMLCanvasElement // HTML canvas element
  | HTMLVideoElement // HTML video element (for real-time processing)
  | tf.Tensor3D;     // TensorFlow.js 3D tensor

Return Value

Multiple pose estimation returns a Promise that resolves to an array of Pose objects:

/**
 * Array of detected poses, each with keypoints and confidence score
 */
Promise<Pose[]>

/**
 * Individual detected pose
 */
interface Pose {
  /** Array of 17 keypoints representing body parts */
  keypoints: Keypoint[];
  /** Overall pose confidence score (0-1) */
  score: number;
}

/**
 * Individual body part keypoint with position and confidence
 */
interface Keypoint {
  /** Confidence score for this keypoint (0-1) */
  score: number;
  /** 2D position in image coordinates */
  position: Vector2D;
  /** Body part name (e.g., 'nose', 'leftWrist') */
  part: string;
}

Algorithm Details

The multi-person pose estimation algorithm uses a sophisticated "Fast Greedy Decoding" approach:

Part Detection: Identifies potential body part locations across the entire image
Priority Queue: Creates a queue of candidate parts sorted by confidence score
Root Selection: Selects highest-confidence parts as potential pose roots
Pose Assembly: Follows displacement vectors to assemble complete poses
Non-Maximum Suppression: Removes duplicate detections using configurable radius
Score Calculation: Computes pose scores based on non-overlapping keypoints

Performance Characteristics

Speed: Slower than single pose but handles multiple people robustly
Accuracy: Higher accuracy when multiple people are present
Scalability: Processing time increases with maxDetections parameter
Memory: Higher memory usage due to complex decoding process
Robustness: Handles overlapping and partially occluded poses

Use Cases

Ideal for:

Group photos and videos
Crowded scenes
Multi-person fitness applications
Social interaction analysis
Surveillance and monitoring

Not ideal for:

Single person scenarios (use single pose for better performance)
Extremely crowded scenes (>20 people)
Real-time applications on low-end hardware

Error Handling

The algorithm gracefully handles various challenging scenarios:

Partial Occlusion: Detects visible keypoints when people overlap
Edge Cases: Handles people at image boundaries
Low Confidence: Returns poses only above scoreThreshold
Empty Results: Returns empty array when no poses meet criteria

Install with Tessl CLI

npx tessl i tessl/npm-tensorflow-models--posenet

docs