0
# PoseNet
1
2
PoseNet is a TensorFlow.js library for real-time human pose estimation in web browsers. It detects human poses from images or video streams, providing keypoint locations for body parts with confidence scores. The library supports both single and multiple person pose detection using pre-trained MobileNetV1 and ResNet50 neural network architectures.
3
4
## Package Information
5
6
- **Package Name**: @tensorflow-models/posenet
7
- **Package Type**: npm
8
- **Language**: TypeScript
9
- **Installation**: `npm install @tensorflow-models/posenet`
10
11
## Core Imports
12
13
```typescript
14
import * as posenet from '@tensorflow-models/posenet';
15
```
16
17
For ES6 destructuring:
18
19
```typescript
20
import { load, version, partNames, partIds, poseChain, partChannels } from '@tensorflow-models/posenet';
21
```
22
23
For CommonJS:
24
25
```javascript
26
const posenet = require('@tensorflow-models/posenet');
27
```
28
29
**Note:** PoseNet also requires TensorFlow.js core:
30
31
```typescript
32
import * as tf from '@tensorflow/tfjs-core';
33
```
34
35
## Basic Usage
36
37
```typescript
38
import * as posenet from '@tensorflow-models/posenet';
39
40
// Load the model
41
const net = await posenet.load();
42
43
// Single person pose estimation
44
const pose = await net.estimateSinglePose(imageElement, {
45
flipHorizontal: false
46
});
47
48
console.log('Pose score:', pose.score);
49
console.log('Keypoints:', pose.keypoints);
50
51
// Multiple person pose estimation
52
const poses = await net.estimateMultiplePoses(imageElement, {
53
flipHorizontal: false,
54
maxDetections: 5,
55
scoreThreshold: 0.5,
56
nmsRadius: 20
57
});
58
59
poses.forEach((pose, i) => {
60
console.log(`Pose ${i} score:`, pose.score);
61
});
62
```
63
64
## Architecture
65
66
PoseNet is built around several key components:
67
68
- **Model Loading**: Configurable neural network architectures (MobileNetV1, ResNet50) with various trade-offs between speed and accuracy
69
- **Pose Estimation Engine**: Core algorithms for single and multi-person pose detection with different decoding strategies
70
- **Keypoint System**: 17-point human skeleton model with confidence scoring for each body part
71
- **Utility Functions**: Helper functions for pose manipulation, scaling, and geometric calculations
72
- **WebGL Acceleration**: TensorFlow.js backend integration for GPU-accelerated inference
73
74
## Capabilities
75
76
### Model Loading and Configuration
77
78
Load and configure PoseNet models with various architectures and performance trade-offs. Choose between MobileNetV1 for speed or ResNet50 for accuracy.
79
80
```typescript { .api }
81
function load(config?: ModelConfig): Promise<PoseNet>;
82
83
interface ModelConfig {
84
architecture: PoseNetArchitecture;
85
outputStride: PoseNetOutputStride;
86
inputResolution: InputResolution;
87
multiplier?: MobileNetMultiplier;
88
modelUrl?: string;
89
quantBytes?: PoseNetQuantBytes;
90
}
91
```
92
93
[Model Loading and Configuration](./model-loading.md)
94
95
### Single Person Pose Estimation
96
97
Fast pose detection optimized for single person scenarios. Ideal when only one person is expected in the image.
98
99
```typescript { .api }
100
class PoseNet {
101
estimateSinglePose(
102
input: PosenetInput,
103
config?: SinglePersonInterfaceConfig
104
): Promise<Pose>;
105
}
106
107
interface SinglePersonInterfaceConfig {
108
flipHorizontal: boolean;
109
}
110
```
111
112
[Single Person Pose Estimation](./single-pose.md)
113
114
### Multiple Person Pose Estimation
115
116
Robust pose detection for images containing multiple people. Uses non-maximum suppression to avoid duplicate detections.
117
118
```typescript { .api }
119
class PoseNet {
120
estimateMultiplePoses(
121
input: PosenetInput,
122
config?: MultiPersonInferenceConfig
123
): Promise<Pose[]>;
124
}
125
126
interface MultiPersonInferenceConfig {
127
flipHorizontal: boolean;
128
maxDetections?: number;
129
scoreThreshold?: number;
130
nmsRadius?: number;
131
}
132
```
133
134
[Multiple Person Pose Estimation](./multi-pose.md)
135
136
### Pose Processing and Utilities
137
138
Utility functions for manipulating, scaling, and analyzing detected poses. Includes keypoint relationships and geometric calculations.
139
140
```typescript { .api }
141
function getAdjacentKeyPoints(keypoints: Keypoint[], minConfidence: number): Keypoint[][];
142
function getBoundingBox(keypoints: Keypoint[]): {maxX: number, maxY: number, minX: number, minY: number};
143
function getBoundingBoxPoints(keypoints: Keypoint[]): Vector2D[];
144
function scalePose(pose: Pose, scaleY: number, scaleX: number, offsetY?: number, offsetX?: number): Pose;
145
function scaleAndFlipPoses(poses: Pose[], imageSize: [number, number], inputResolution: [number, number], padding: Padding, flipHorizontal?: boolean): Pose[];
146
```
147
148
[Pose Processing and Utilities](./pose-utilities.md)
149
150
### Keypoint System and Body Parts
151
152
Constants and data structures defining the 17-point human skeleton model used by PoseNet.
153
154
```typescript { .api }
155
const partNames: string[];
156
const partIds: {[jointName: string]: number};
157
const poseChain: [string, string][];
158
const partChannels: string[];
159
```
160
161
[Keypoint System and Body Parts](./keypoints.md)
162
163
### Advanced APIs and Low-Level Functions
164
165
Low-level decoding functions and neural network classes for custom pose estimation implementations and advanced use cases.
166
167
```typescript { .api }
168
function decodeSinglePose(heatmapScores: tf.Tensor3D, offsets: tf.Tensor3D, outputStride: PoseNetOutputStride): Promise<Pose>;
169
function decodeMultiplePoses(scoresBuffer: TensorBuffer3D, offsetsBuffer: TensorBuffer3D, displacementsFwdBuffer: TensorBuffer3D, displacementsBwdBuffer: TensorBuffer3D, outputStride: number, maxPoseDetections: number, scoreThreshold?: number, nmsRadius?: number): Pose[];
170
class MobileNet extends BaseModel;
171
```
172
173
[Advanced APIs and Low-Level Functions](./advanced-apis.md)
174
175
### Package Information
176
177
Version information and package metadata.
178
179
```typescript { .api }
180
const version: string;
181
```
182
183
## Core Types
184
185
```typescript { .api }
186
interface Pose {
187
keypoints: Keypoint[];
188
score: number;
189
}
190
191
interface Keypoint {
192
score: number;
193
position: Vector2D;
194
part: string;
195
}
196
197
interface Vector2D {
198
x: number;
199
y: number;
200
}
201
202
interface Padding {
203
top: number;
204
bottom: number;
205
left: number;
206
right: number;
207
}
208
209
type PosenetInput = ImageData | HTMLImageElement | HTMLCanvasElement | HTMLVideoElement | tf.Tensor3D;
210
211
type PoseNetArchitecture = 'ResNet50' | 'MobileNetV1';
212
type PoseNetOutputStride = 32 | 16 | 8;
213
type PoseNetQuantBytes = 1 | 2 | 4;
214
type MobileNetMultiplier = 0.50 | 0.75 | 1.0;
215
type InputResolution = number | {width: number, height: number};
216
type TensorBuffer3D = tf.TensorBuffer<tf.Rank.R3>;
217
```