CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-tensorflow-models--posenet

Pretrained PoseNet model in TensorFlow.js for real-time human pose estimation from images and video streams

Pending
Overview
Eval results
Files

multi-pose.mddocs/

Multiple Person Pose Estimation

Robust pose detection algorithm for images containing multiple people. Uses non-maximum suppression to avoid duplicate detections and sophisticated decoding to handle overlapping poses.

Capabilities

Estimate Multiple Poses

Detects and estimates multiple poses from input image using advanced decoding with part-based non-maximum suppression.

/**
 * Estimate multiple person poses from input image
 * @param input - Input image (various formats supported)
 * @param config - Configuration options for multi-person inference
 * @returns Promise resolving to array of detected poses
 */
estimateMultiplePoses(
  input: PosenetInput,
  config?: MultiPersonInferenceConfig
): Promise<Pose[]>;

Usage Examples:

import * as posenet from '@tensorflow-models/posenet';

// Load model
const net = await posenet.load();

// Basic multiple pose estimation
const groupPhoto = document.getElementById('group-image') as HTMLImageElement;
const poses = await net.estimateMultiplePoses(groupPhoto);

console.log(`Detected ${poses.length} people`);
poses.forEach((pose, index) => {
  console.log(`Person ${index + 1} confidence:`, pose.score);
});

// With custom detection parameters
const poses2 = await net.estimateMultiplePoses(groupPhoto, {
  flipHorizontal: false,
  maxDetections: 10,      // Detect up to 10 people
  scoreThreshold: 0.6,    // Higher confidence threshold
  nmsRadius: 25           // Larger suppression radius
});

// Filter high-quality poses
const goodPoses = poses2.filter(pose => pose.score > 0.7);
console.log(`Found ${goodPoses.length} high-quality poses`);

// Process each person's keypoints
poses.forEach((pose, personIndex) => {
  const visibleKeypoints = pose.keypoints.filter(kp => kp.score > 0.5);
  console.log(`Person ${personIndex}: ${visibleKeypoints.length} visible keypoints`);
  
  // Find pose center (average of visible keypoints)
  if (visibleKeypoints.length > 0) {
    const center = visibleKeypoints.reduce(
      (acc, kp) => ({ x: acc.x + kp.position.x, y: acc.y + kp.position.y }),
      { x: 0, y: 0 }
    );
    center.x /= visibleKeypoints.length;
    center.y /= visibleKeypoints.length;
    console.log(`Person ${personIndex} center:`, center);
  }
});

// Real-time video processing
const video = document.getElementById('webcam') as HTMLVideoElement;
async function processVideoFrame() {
  const poses = await net.estimateMultiplePoses(video, {
    flipHorizontal: true,
    maxDetections: 5,
    scoreThreshold: 0.5,
    nmsRadius: 20
  });
  
  // Draw poses on canvas or process data
  drawPoses(poses);
  
  requestAnimationFrame(processVideoFrame);
}

Multiple Person Configuration

Configuration options for multi-person pose estimation with advanced parameters.

/**
 * Configuration interface for multiple person pose estimation
 */
interface MultiPersonInferenceConfig {
  /** Whether to flip poses horizontally (useful for webcam feeds) */
  flipHorizontal: boolean;
  /** Maximum number of poses to detect in the image */
  maxDetections?: number;
  /** Minimum root part confidence score for pose detection */
  scoreThreshold?: number;
  /** Non-maximum suppression radius in pixels */
  nmsRadius?: number;
}

Default Configuration

const MULTI_PERSON_INFERENCE_CONFIG: MultiPersonInferenceConfig = {
  flipHorizontal: false,
  maxDetections: 5,
  scoreThreshold: 0.5,
  nmsRadius: 20
};

Configuration Parameters

maxDetections (default: 5):

  • Maximum number of people to detect in the image
  • Higher values detect more people but increase processing time
  • Typical range: 1-20 depending on use case

scoreThreshold (default: 0.5):

  • Minimum confidence score for a pose to be returned
  • Range: 0.0 to 1.0
  • Higher values = fewer but more confident detections
  • Lower values = more detections but potentially false positives

nmsRadius (default: 20):

  • Non-maximum suppression radius in pixels
  • Prevents duplicate detections of the same person
  • Larger values = more aggressive suppression
  • Must be strictly positive

flipHorizontal (default: false):

  • Whether to mirror poses horizontally
  • Set to true for webcam feeds that are horizontally flipped
  • Affects final pose coordinates

Input Types

Multiple pose estimation supports the same input formats as single pose:

type PosenetInput = 
  | ImageData        // Canvas ImageData object
  | HTMLImageElement // HTML img element  
  | HTMLCanvasElement // HTML canvas element
  | HTMLVideoElement // HTML video element (for real-time processing)
  | tf.Tensor3D;     // TensorFlow.js 3D tensor

Return Value

Multiple pose estimation returns a Promise that resolves to an array of Pose objects:

/**
 * Array of detected poses, each with keypoints and confidence score
 */
Promise<Pose[]>

/**
 * Individual detected pose
 */
interface Pose {
  /** Array of 17 keypoints representing body parts */
  keypoints: Keypoint[];
  /** Overall pose confidence score (0-1) */
  score: number;
}

/**
 * Individual body part keypoint with position and confidence
 */
interface Keypoint {
  /** Confidence score for this keypoint (0-1) */
  score: number;
  /** 2D position in image coordinates */
  position: Vector2D;
  /** Body part name (e.g., 'nose', 'leftWrist') */
  part: string;
}

Algorithm Details

The multi-person pose estimation algorithm uses a sophisticated "Fast Greedy Decoding" approach:

  1. Part Detection: Identifies potential body part locations across the entire image
  2. Priority Queue: Creates a queue of candidate parts sorted by confidence score
  3. Root Selection: Selects highest-confidence parts as potential pose roots
  4. Pose Assembly: Follows displacement vectors to assemble complete poses
  5. Non-Maximum Suppression: Removes duplicate detections using configurable radius
  6. Score Calculation: Computes pose scores based on non-overlapping keypoints

Performance Characteristics

  • Speed: Slower than single pose but handles multiple people robustly
  • Accuracy: Higher accuracy when multiple people are present
  • Scalability: Processing time increases with maxDetections parameter
  • Memory: Higher memory usage due to complex decoding process
  • Robustness: Handles overlapping and partially occluded poses

Use Cases

Ideal for:

  • Group photos and videos
  • Crowded scenes
  • Multi-person fitness applications
  • Social interaction analysis
  • Surveillance and monitoring

Not ideal for:

  • Single person scenarios (use single pose for better performance)
  • Extremely crowded scenes (>20 people)
  • Real-time applications on low-end hardware

Error Handling

The algorithm gracefully handles various challenging scenarios:

  • Partial Occlusion: Detects visible keypoints when people overlap
  • Edge Cases: Handles people at image boundaries
  • Low Confidence: Returns poses only above scoreThreshold
  • Empty Results: Returns empty array when no poses meet criteria

Install with Tessl CLI

npx tessl i tessl/npm-tensorflow-models--posenet

docs

advanced-apis.md

index.md

keypoints.md

model-loading.md

multi-pose.md

pose-utilities.md

single-pose.md

tile.json