0
# Profiling
1
2
Continuous profiling capabilities provide deep insights into application performance through CPU usage analysis, memory allocation tracking, lock contention monitoring, and wall time measurement. The profiler helps identify performance bottlenecks, memory leaks, and resource consumption patterns in production applications.
3
4
## Capabilities
5
6
### Profiler Management
7
8
The main Profiler class manages the lifecycle of continuous profiling and data collection across multiple profiling dimensions.
9
10
```python { .api }
11
class Profiler:
12
def __init__(
13
self,
14
service: str = None,
15
env: str = None,
16
version: str = None,
17
tags: Dict[str, str] = None
18
):
19
"""
20
Initialize a new profiler instance.
21
22
Parameters:
23
- service: Service name to profile (defaults to global service)
24
- env: Environment name (e.g., 'production', 'staging')
25
- version: Application version
26
- tags: Additional tags to attach to profiles
27
"""
28
29
def start(self, stop_on_exit: bool = True, profile_children: bool = True) -> None:
30
"""
31
Start the profiler and begin collecting profile data.
32
33
Parameters:
34
- stop_on_exit: Whether to automatically stop profiling on program exit
35
- profile_children: Whether to start a profiler in child processes
36
"""
37
38
def stop(self, flush: bool = True) -> None:
39
"""
40
Stop the profiler and optionally flush remaining data.
41
42
Parameters:
43
- flush: Whether to upload any remaining profile data
44
"""
45
```
46
47
Usage examples:
48
49
```python
50
from ddtrace.profiling import Profiler
51
52
# Basic profiler setup
53
profiler = Profiler(
54
service="web-service",
55
env="production",
56
version="2.1.0",
57
tags={"team": "backend", "region": "us-east-1"}
58
)
59
60
# Start profiling
61
profiler.start()
62
63
# Your application code runs here
64
run_application()
65
66
# Stop profiling (optional - automatically stops on exit)
67
profiler.stop()
68
```
69
70
### Profile Collection Types
71
72
The profiler automatically collects multiple types of performance data:
73
74
#### CPU Profiling
75
76
Tracks CPU usage and call stack samples to identify performance hotspots:
77
78
```python
79
# CPU profiling is enabled by default
80
profiler = Profiler()
81
profiler.start()
82
83
# The profiler will sample call stacks periodically
84
# and identify functions consuming the most CPU time
85
```
86
87
#### Memory Profiling
88
89
Monitors memory allocation patterns and tracks memory usage over time:
90
91
```python
92
# Memory profiling tracks allocations and can help identify memory leaks
93
profiler = Profiler()
94
profiler.start()
95
96
# Large memory allocations and patterns will be captured
97
large_data = allocate_large_dataset()
98
process_data(large_data)
99
```
100
101
#### Lock Contention Profiling
102
103
Identifies threading bottlenecks and synchronization issues:
104
105
```python
106
import threading
107
108
profiler = Profiler()
109
profiler.start()
110
111
# Lock contention will be automatically detected
112
lock = threading.Lock()
113
with lock:
114
# Critical section - contention will be measured
115
shared_resource_operation()
116
```
117
118
#### Wall Time Profiling
119
120
Measures total elapsed time including I/O wait and blocking operations:
121
122
```python
123
profiler = Profiler()
124
profiler.start()
125
126
# Wall time profiling captures total execution time
127
# including blocking I/O operations
128
with open('large-file.txt') as f:
129
data = f.read() # I/O wait time is captured
130
process_file_data(data)
131
```
132
133
### Advanced Configuration
134
135
#### Environment-based Configuration
136
137
The profiler can be configured through environment variables:
138
139
```python
140
import os
141
142
# Configure via environment variables
143
os.environ['DD_PROFILING_ENABLED'] = 'true'
144
os.environ['DD_SERVICE'] = 'my-python-service'
145
os.environ['DD_ENV'] = 'production'
146
os.environ['DD_VERSION'] = '1.2.3'
147
148
# Profiler picks up environment configuration automatically
149
profiler = Profiler()
150
profiler.start()
151
```
152
153
#### Custom Profiling Configuration
154
155
```python
156
# Advanced profiler configuration
157
profiler = Profiler(
158
service="api-server",
159
env="staging",
160
version="1.0.0-beta",
161
tags={
162
"datacenter": "us-west-2",
163
"instance_type": "c5.large",
164
"deployment": "canary"
165
}
166
)
167
168
# Custom profiler configuration
169
profiler.start(stop_on_exit=False)
170
171
# Manual control over profiler lifecycle
172
import time
173
while application_running():
174
time.sleep(60) # Run for 1 minute
175
# Profiles are automatically uploaded periodically
176
177
profiler.stop(flush=True)
178
```
179
180
### Integration with Tracing
181
182
Profiling data is automatically correlated with distributed traces when both profiling and tracing are enabled:
183
184
```python
185
from ddtrace import tracer
186
from ddtrace.profiling import Profiler
187
188
# Enable both tracing and profiling
189
profiler = Profiler()
190
profiler.start()
191
192
# Traces and profiles are automatically correlated
193
with tracer.trace("expensive-operation") as span:
194
span.set_tag("operation.type", "data-processing")
195
196
# This operation will appear in both:
197
# 1. The distributed trace (timing and metadata)
198
# 2. The profiling data (CPU/memory usage details)
199
cpu_intensive_operation()
200
memory_intensive_operation()
201
```
202
203
### Profiling in Production
204
205
#### Performance Impact
206
207
The profiler is designed for production use with minimal overhead:
208
209
```python
210
# Production-ready profiler setup
211
profiler = Profiler(
212
service="production-api",
213
env="production",
214
version="2.3.1"
215
)
216
217
# Start with production-optimized settings
218
profiler.start()
219
220
# Profiler overhead is typically <2% CPU and <1% memory
221
# Safe to run continuously in production
222
```
223
224
#### Error Handling
225
226
```python
227
from ddtrace.profiling import Profiler
228
229
try:
230
profiler = Profiler()
231
profiler.start()
232
233
# Application code
234
run_application()
235
236
except Exception as e:
237
# Profiler errors don't affect application execution
238
print(f"Profiler error (non-fatal): {e}")
239
240
finally:
241
# Always try to stop cleanly
242
try:
243
profiler.stop()
244
except:
245
pass # Ignore cleanup errors
246
```
247
248
### Profiling Best Practices
249
250
#### Service Identification
251
252
```python
253
# Use descriptive service names for multi-service applications
254
web_profiler = Profiler(service="web-frontend")
255
api_profiler = Profiler(service="api-backend")
256
worker_profiler = Profiler(service="background-worker")
257
258
# Start appropriate profiler based on service type
259
if is_web_server():
260
web_profiler.start()
261
elif is_api_server():
262
api_profiler.start()
263
elif is_worker():
264
worker_profiler.start()
265
```
266
267
#### Contextual Tagging
268
269
```python
270
# Add context-specific tags for better profiling insights
271
profiler = Profiler(
272
service="payment-processor",
273
tags={
274
"payment_provider": "stripe",
275
"region": get_deployment_region(),
276
"instance_id": get_instance_id(),
277
"version": get_application_version()
278
}
279
)
280
profiler.start()
281
```
282
283
#### Development vs Production
284
285
```python
286
import os
287
288
# Different configurations for different environments
289
if os.environ.get('ENVIRONMENT') == 'development':
290
# Development configuration
291
profiler = Profiler(env="development")
292
else:
293
# Standard production configuration
294
profiler = Profiler(env="production")
295
296
profiler.start()
297
```
298
299
## Profile Data Analysis
300
301
The collected profiling data appears in the Datadog UI and provides insights into:
302
303
- **CPU Hotspots**: Functions consuming the most CPU time
304
- **Memory Allocation Patterns**: Which functions allocate the most memory
305
- **Lock Contention**: Threading bottlenecks and synchronization issues
306
- **I/O Wait Times**: Blocking operations and external service dependencies
307
- **Garbage Collection Impact**: Memory management overhead
308
309
This data is automatically correlated with:
310
- Distributed traces (when tracing is enabled)
311
- Application logs (when log correlation is configured)
312
- Infrastructure metrics (when Datadog Agent is deployed)
313
314
## Troubleshooting
315
316
Common profiling issues and solutions:
317
318
```python
319
# Basic profiler lifecycle
320
profiler = Profiler()
321
profiler.start()
322
323
# Your application code here
324
run_application()
325
326
# Stop profiler for debugging
327
try:
328
profiler.stop(flush=True)
329
print("Profiler stopped and flushed successfully")
330
except Exception as e:
331
print(f"Profiler stop failed: {e}")
332
```