0
# Multiple Person Pose Estimation
1
2
Robust pose detection algorithm for images containing multiple people. Uses non-maximum suppression to avoid duplicate detections and sophisticated decoding to handle overlapping poses.
3
4
## Capabilities
5
6
### Estimate Multiple Poses
7
8
Detects and estimates multiple poses from input image using advanced decoding with part-based non-maximum suppression.
9
10
```typescript { .api }
11
/**
12
* Estimate multiple person poses from input image
13
* @param input - Input image (various formats supported)
14
* @param config - Configuration options for multi-person inference
15
* @returns Promise resolving to array of detected poses
16
*/
17
estimateMultiplePoses(
18
input: PosenetInput,
19
config?: MultiPersonInferenceConfig
20
): Promise<Pose[]>;
21
```
22
23
**Usage Examples:**
24
25
```typescript
26
import * as posenet from '@tensorflow-models/posenet';
27
28
// Load model
29
const net = await posenet.load();
30
31
// Basic multiple pose estimation
32
const groupPhoto = document.getElementById('group-image') as HTMLImageElement;
33
const poses = await net.estimateMultiplePoses(groupPhoto);
34
35
console.log(`Detected ${poses.length} people`);
36
poses.forEach((pose, index) => {
37
console.log(`Person ${index + 1} confidence:`, pose.score);
38
});
39
40
// With custom detection parameters
41
const poses2 = await net.estimateMultiplePoses(groupPhoto, {
42
flipHorizontal: false,
43
maxDetections: 10, // Detect up to 10 people
44
scoreThreshold: 0.6, // Higher confidence threshold
45
nmsRadius: 25 // Larger suppression radius
46
});
47
48
// Filter high-quality poses
49
const goodPoses = poses2.filter(pose => pose.score > 0.7);
50
console.log(`Found ${goodPoses.length} high-quality poses`);
51
52
// Process each person's keypoints
53
poses.forEach((pose, personIndex) => {
54
const visibleKeypoints = pose.keypoints.filter(kp => kp.score > 0.5);
55
console.log(`Person ${personIndex}: ${visibleKeypoints.length} visible keypoints`);
56
57
// Find pose center (average of visible keypoints)
58
if (visibleKeypoints.length > 0) {
59
const center = visibleKeypoints.reduce(
60
(acc, kp) => ({ x: acc.x + kp.position.x, y: acc.y + kp.position.y }),
61
{ x: 0, y: 0 }
62
);
63
center.x /= visibleKeypoints.length;
64
center.y /= visibleKeypoints.length;
65
console.log(`Person ${personIndex} center:`, center);
66
}
67
});
68
69
// Real-time video processing
70
const video = document.getElementById('webcam') as HTMLVideoElement;
71
async function processVideoFrame() {
72
const poses = await net.estimateMultiplePoses(video, {
73
flipHorizontal: true,
74
maxDetections: 5,
75
scoreThreshold: 0.5,
76
nmsRadius: 20
77
});
78
79
// Draw poses on canvas or process data
80
drawPoses(poses);
81
82
requestAnimationFrame(processVideoFrame);
83
}
84
```
85
86
### Multiple Person Configuration
87
88
Configuration options for multi-person pose estimation with advanced parameters.
89
90
```typescript { .api }
91
/**
92
* Configuration interface for multiple person pose estimation
93
*/
94
interface MultiPersonInferenceConfig {
95
/** Whether to flip poses horizontally (useful for webcam feeds) */
96
flipHorizontal: boolean;
97
/** Maximum number of poses to detect in the image */
98
maxDetections?: number;
99
/** Minimum root part confidence score for pose detection */
100
scoreThreshold?: number;
101
/** Non-maximum suppression radius in pixels */
102
nmsRadius?: number;
103
}
104
```
105
106
### Default Configuration
107
108
```typescript { .api }
109
const MULTI_PERSON_INFERENCE_CONFIG: MultiPersonInferenceConfig = {
110
flipHorizontal: false,
111
maxDetections: 5,
112
scoreThreshold: 0.5,
113
nmsRadius: 20
114
};
115
```
116
117
### Configuration Parameters
118
119
**maxDetections** (default: 5):
120
- Maximum number of people to detect in the image
121
- Higher values detect more people but increase processing time
122
- Typical range: 1-20 depending on use case
123
124
**scoreThreshold** (default: 0.5):
125
- Minimum confidence score for a pose to be returned
126
- Range: 0.0 to 1.0
127
- Higher values = fewer but more confident detections
128
- Lower values = more detections but potentially false positives
129
130
**nmsRadius** (default: 20):
131
- Non-maximum suppression radius in pixels
132
- Prevents duplicate detections of the same person
133
- Larger values = more aggressive suppression
134
- Must be strictly positive
135
136
**flipHorizontal** (default: false):
137
- Whether to mirror poses horizontally
138
- Set to true for webcam feeds that are horizontally flipped
139
- Affects final pose coordinates
140
141
### Input Types
142
143
Multiple pose estimation supports the same input formats as single pose:
144
145
```typescript { .api }
146
type PosenetInput =
147
| ImageData // Canvas ImageData object
148
| HTMLImageElement // HTML img element
149
| HTMLCanvasElement // HTML canvas element
150
| HTMLVideoElement // HTML video element (for real-time processing)
151
| tf.Tensor3D; // TensorFlow.js 3D tensor
152
```
153
154
### Return Value
155
156
Multiple pose estimation returns a Promise that resolves to an array of Pose objects:
157
158
```typescript { .api }
159
/**
160
* Array of detected poses, each with keypoints and confidence score
161
*/
162
Promise<Pose[]>
163
164
/**
165
* Individual detected pose
166
*/
167
interface Pose {
168
/** Array of 17 keypoints representing body parts */
169
keypoints: Keypoint[];
170
/** Overall pose confidence score (0-1) */
171
score: number;
172
}
173
174
/**
175
* Individual body part keypoint with position and confidence
176
*/
177
interface Keypoint {
178
/** Confidence score for this keypoint (0-1) */
179
score: number;
180
/** 2D position in image coordinates */
181
position: Vector2D;
182
/** Body part name (e.g., 'nose', 'leftWrist') */
183
part: string;
184
}
185
```
186
187
### Algorithm Details
188
189
The multi-person pose estimation algorithm uses a sophisticated "Fast Greedy Decoding" approach:
190
191
1. **Part Detection**: Identifies potential body part locations across the entire image
192
2. **Priority Queue**: Creates a queue of candidate parts sorted by confidence score
193
3. **Root Selection**: Selects highest-confidence parts as potential pose roots
194
4. **Pose Assembly**: Follows displacement vectors to assemble complete poses
195
5. **Non-Maximum Suppression**: Removes duplicate detections using configurable radius
196
6. **Score Calculation**: Computes pose scores based on non-overlapping keypoints
197
198
### Performance Characteristics
199
200
- **Speed**: Slower than single pose but handles multiple people robustly
201
- **Accuracy**: Higher accuracy when multiple people are present
202
- **Scalability**: Processing time increases with maxDetections parameter
203
- **Memory**: Higher memory usage due to complex decoding process
204
- **Robustness**: Handles overlapping and partially occluded poses
205
206
### Use Cases
207
208
**Ideal for:**
209
- Group photos and videos
210
- Crowded scenes
211
- Multi-person fitness applications
212
- Social interaction analysis
213
- Surveillance and monitoring
214
215
**Not ideal for:**
216
- Single person scenarios (use single pose for better performance)
217
- Extremely crowded scenes (>20 people)
218
- Real-time applications on low-end hardware
219
220
### Error Handling
221
222
The algorithm gracefully handles various challenging scenarios:
223
224
- **Partial Occlusion**: Detects visible keypoints when people overlap
225
- **Edge Cases**: Handles people at image boundaries
226
- **Low Confidence**: Returns poses only above scoreThreshold
227
- **Empty Results**: Returns empty array when no poses meet criteria