0
# Person Segmentation
1
2
Person segmentation capabilities for detecting and isolating human figures in images and videos. Provides both semantic segmentation (combined mask for all people) and instance segmentation (individual masks per person) with associated pose keypoints.
3
4
## Capabilities
5
6
### Semantic Person Segmentation
7
8
Segments all people in the image into a single combined mask with all detected poses.
9
10
```typescript { .api }
11
/**
12
* Segments all people in the image, returns combined segmentation mask
13
* @param input - Image input (ImageData, HTMLImageElement, HTMLCanvasElement, HTMLVideoElement, tf.Tensor3D)
14
* @param config - Optional inference configuration
15
* @returns Promise resolving to semantic person segmentation result
16
*/
17
segmentPerson(
18
input: BodyPixInput,
19
config?: PersonInferenceConfig
20
): Promise<SemanticPersonSegmentation>;
21
22
interface SemanticPersonSegmentation {
23
/** Binary segmentation mask (0=background, 1=person) for all people */
24
data: Uint8Array;
25
/** Mask width in pixels */
26
width: number;
27
/** Mask height in pixels */
28
height: number;
29
/** Array of all detected poses */
30
allPoses: Pose[];
31
}
32
33
type BodyPixInput = ImageData | HTMLImageElement | HTMLCanvasElement | HTMLVideoElement | OffscreenCanvas | tf.Tensor3D;
34
```
35
36
### Multi-Person Instance Segmentation
37
38
Segments multiple people individually, providing separate masks and poses for each detected person.
39
40
```typescript { .api }
41
/**
42
* Segments multiple people individually with instance-level segmentation
43
* @param input - Image input
44
* @param config - Optional multi-person inference configuration
45
* @returns Promise resolving to array of individual person segmentations
46
*/
47
segmentMultiPerson(
48
input: BodyPixInput,
49
config?: MultiPersonInstanceInferenceConfig
50
): Promise<PersonSegmentation[]>;
51
52
interface PersonSegmentation {
53
/** Binary segmentation mask (0=background, 1=person) for this person */
54
data: Uint8Array;
55
/** Mask width in pixels */
56
width: number;
57
/** Mask height in pixels */
58
height: number;
59
/** Pose keypoints for this person */
60
pose: Pose;
61
}
62
```
63
64
### Configuration Options
65
66
```typescript { .api }
67
interface PersonInferenceConfig {
68
/** Flip result horizontally for mirrored cameras */
69
flipHorizontal?: boolean;
70
/** Internal resolution - higher values = better accuracy, slower inference */
71
internalResolution?: 'low' | 'medium' | 'high' | 'full' | number;
72
/** Threshold for segmentation confidence (0-1) */
73
segmentationThreshold?: number;
74
/** Maximum number of poses to detect */
75
maxDetections?: number;
76
/** Minimum pose confidence score (0-1) */
77
scoreThreshold?: number;
78
/** Non-maximum suppression radius for pose detection */
79
nmsRadius?: number;
80
}
81
82
interface MultiPersonInstanceInferenceConfig extends PersonInferenceConfig {
83
/** Minimum keypoint score for pose matching */
84
minKeypointScore?: number;
85
/** Number of refinement steps for accuracy */
86
refineSteps?: number;
87
}
88
89
// Default configuration constants
90
const PERSON_INFERENCE_CONFIG: PersonInferenceConfig = {
91
flipHorizontal: false,
92
internalResolution: 'medium',
93
segmentationThreshold: 0.7,
94
maxDetections: 10,
95
scoreThreshold: 0.4,
96
nmsRadius: 20,
97
};
98
99
const MULTI_PERSON_INSTANCE_INFERENCE_CONFIG: MultiPersonInstanceInferenceConfig = {
100
flipHorizontal: false,
101
internalResolution: 'medium',
102
segmentationThreshold: 0.7,
103
maxDetections: 10,
104
scoreThreshold: 0.4,
105
nmsRadius: 20,
106
minKeypointScore: 0.3,
107
refineSteps: 10
108
};
109
```
110
111
**Usage Examples:**
112
113
```typescript
114
import * as bodyPix from '@tensorflow-models/body-pix';
115
116
const net = await bodyPix.load();
117
const imageElement = document.getElementById('people-image');
118
119
// Basic semantic segmentation
120
const segmentation = await net.segmentPerson(imageElement);
121
console.log(`Found ${segmentation.allPoses.length} people`);
122
123
// High-accuracy semantic segmentation
124
const highQualitySegmentation = await net.segmentPerson(imageElement, {
125
internalResolution: 'high',
126
segmentationThreshold: 0.8,
127
scoreThreshold: 0.5
128
});
129
130
// Multi-person instance segmentation
131
const peopleSegmentations = await net.segmentMultiPerson(imageElement, {
132
maxDetections: 5,
133
scoreThreshold: 0.4,
134
segmentationThreshold: 0.7
135
});
136
137
console.log(`Detected ${peopleSegmentations.length} individual people`);
138
peopleSegmentations.forEach((person, index) => {
139
console.log(`Person ${index}: pose score ${person.pose.score}`);
140
});
141
142
// Webcam segmentation with horizontal flip
143
const videoElement = document.getElementById('webcam');
144
const webcamSegmentation = await net.segmentPerson(videoElement, {
145
flipHorizontal: true,
146
internalResolution: 'medium'
147
});
148
```
149
150
### Performance vs Quality Trade-offs
151
152
**Real-time Applications (30+ FPS):**
153
```typescript
154
const config = {
155
internalResolution: 'low',
156
segmentationThreshold: 0.7,
157
maxDetections: 3
158
};
159
```
160
161
**Balanced Quality (15-30 FPS):**
162
```typescript
163
const config = {
164
internalResolution: 'medium',
165
segmentationThreshold: 0.7,
166
maxDetections: 5
167
};
168
```
169
170
**High Quality (5-15 FPS):**
171
```typescript
172
const config = {
173
internalResolution: 'high',
174
segmentationThreshold: 0.8,
175
scoreThreshold: 0.5,
176
maxDetections: 10
177
};
178
```
179
180
## Pose Information
181
182
Each segmentation result includes pose keypoints providing additional context:
183
184
```typescript { .api }
185
interface Pose {
186
/** Array of 17 body keypoints (nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles) */
187
keypoints: Keypoint[];
188
/** Overall pose confidence score (0-1) */
189
score: number;
190
}
191
192
interface Keypoint {
193
/** Keypoint confidence score (0-1) */
194
score: number;
195
/** Pixel coordinates of the keypoint */
196
position: Vector2D;
197
/** Body part name (e.g., 'nose', 'leftShoulder', 'rightAnkle') */
198
part: string;
199
}
200
201
interface Vector2D {
202
x: number;
203
y: number;
204
}
205
```
206
207
**Keypoint Names:**
208
- `nose`, `leftEye`, `rightEye`, `leftEar`, `rightEar`
209
- `leftShoulder`, `rightShoulder`, `leftElbow`, `rightElbow`
210
- `leftWrist`, `rightWrist`, `leftHip`, `rightHip`
211
- `leftKnee`, `rightKnee`, `leftAnkle`, `rightAnkle`
212
213
## Use Cases
214
215
- **Background removal/replacement** for video calls
216
- **Virtual backgrounds** in streaming applications
217
- **People counting** in surveillance or analytics
218
- **Privacy protection** by blurring or masking people
219
- **Augmented reality** effects and filters
220
- **Sports analysis** with pose and movement tracking
221
- **Photo editing** tools for selective editing