File type identification using libmagic
npx @tessl/cli install tessl/pypi-python-magic@0.4.00
# python-magic
1
2
A Python interface to the libmagic file type identification library. python-magic provides file type detection by examining file headers according to predefined patterns, offering both simple convenience functions and advanced control through the Magic class.
3
4
## Package Information
5
6
- **Package Name**: python-magic
7
- **Language**: Python
8
- **Installation**: `pip install python-magic`
9
- **System Requirements**: libmagic C library (`sudo apt-get install libmagic1` on Ubuntu/Debian)
10
11
## Core Imports
12
13
```python
14
import magic
15
```
16
17
For convenience functions:
18
19
```python
20
from magic import from_file, from_buffer, from_descriptor
21
```
22
23
For advanced usage:
24
25
```python
26
from magic import Magic, MagicException
27
```
28
29
For compatibility layer:
30
31
```python
32
from magic import compat
33
# or deprecated functions directly
34
from magic import detect_from_filename, detect_from_content, detect_from_fobj, open
35
```
36
37
## Basic Usage
38
39
```python
40
import magic
41
42
# Simple file type detection
43
file_type = magic.from_file('document.pdf')
44
print(file_type) # 'PDF document, version 1.4'
45
46
# Get MIME type instead
47
mime_type = magic.from_file('document.pdf', mime=True)
48
print(mime_type) # 'application/pdf'
49
50
# Detect from file contents
51
with open('document.pdf', 'rb') as f:
52
content = f.read(2048) # Read first 2KB for accurate detection
53
file_type = magic.from_buffer(content)
54
print(file_type) # 'PDF document, version 1.4'
55
56
# Advanced usage with Magic class
57
m = Magic(uncompress=True) # Look inside compressed files
58
file_type = m.from_file('archive.tar.gz')
59
print(file_type) # Shows content type, not just compression
60
```
61
62
## Architecture
63
64
python-magic wraps the libmagic C library through ctypes, providing:
65
66
- **Convenience Functions**: Simple one-line detection for common use cases
67
- **Magic Class**: Advanced control with customizable flags and parameters
68
- **Thread Safety**: Magic instances use internal locking for safe concurrent access
69
- **Cross-platform**: Automatic library loading for Windows, macOS, and Linux
70
- **Compatibility Layer**: Optional support for libmagic's native Python bindings
71
72
## Capabilities
73
74
### File Type Detection from File Path
75
76
Identifies file types by examining files on disk.
77
78
```python { .api }
79
def from_file(filename, mime=False):
80
"""
81
Detect filetype from filename.
82
83
Args:
84
filename (str | PathLike): Path to file to analyze
85
mime (bool): Return MIME type if True, human-readable description if False
86
87
Returns:
88
str: File type description or MIME type
89
90
Raises:
91
IOError: If file cannot be accessed
92
MagicException: If libmagic encounters an error
93
"""
94
```
95
96
### File Type Detection from Buffer
97
98
Identifies file types from file content in memory.
99
100
```python { .api }
101
def from_buffer(buffer, mime=False):
102
"""
103
Detect filetype from file content buffer.
104
105
Args:
106
buffer (bytes | str): File content to analyze (recommend ≥2048 bytes)
107
mime (bool): Return MIME type if True, human-readable description if False
108
109
Returns:
110
str: File type description or MIME type
111
112
Raises:
113
MagicException: If libmagic encounters an error
114
"""
115
```
116
117
### File Type Detection from File Descriptor
118
119
Identifies file types from open file descriptors.
120
121
```python { .api }
122
def from_descriptor(fd, mime=False):
123
"""
124
Detect filetype from file descriptor.
125
126
Args:
127
fd (int): File descriptor number
128
mime (bool): Return MIME type if True, human-readable description if False
129
130
Returns:
131
str: File type description or MIME type
132
133
Raises:
134
MagicException: If libmagic encounters an error
135
"""
136
```
137
138
### Advanced Magic Class
139
140
Provides direct control over libmagic with customizable flags and parameters.
141
142
```python { .api }
143
class Magic:
144
"""
145
Advanced wrapper around libmagic with customizable behavior.
146
147
Thread-safe class for fine-grained control over file type detection.
148
"""
149
150
def __init__(self, mime=False, magic_file=None, mime_encoding=False,
151
keep_going=False, uncompress=False, raw=False, extension=False):
152
"""
153
Create Magic instance with custom configuration.
154
155
Args:
156
mime (bool): Return MIME types instead of descriptions
157
magic_file (str, optional): Path to custom magic database file
158
mime_encoding (bool): Return character encoding information
159
keep_going (bool): Continue processing after first match
160
uncompress (bool): Look inside compressed files
161
raw (bool): Don't decode non-printable characters
162
extension (bool): Return file extensions (requires libmagic ≥524)
163
164
Raises:
165
NotImplementedError: If extension=True but libmagic version < 524
166
ImportError: If libmagic library cannot be loaded
167
"""
168
169
def from_file(self, filename):
170
"""
171
Identify file type from file path.
172
173
Args:
174
filename (str | PathLike): Path to file
175
176
Returns:
177
str: File type information based on instance configuration
178
179
Raises:
180
IOError: If file cannot be accessed
181
MagicException: If libmagic encounters an error
182
"""
183
184
def from_buffer(self, buf):
185
"""
186
Identify file type from buffer content.
187
188
Args:
189
buf (bytes | str): File content to analyze
190
191
Returns:
192
str: File type information based on instance configuration
193
194
Raises:
195
MagicException: If libmagic encounters an error
196
"""
197
198
def from_descriptor(self, fd):
199
"""
200
Identify file type from file descriptor.
201
202
Args:
203
fd (int): File descriptor number
204
205
Returns:
206
str: File type information based on instance configuration
207
208
Raises:
209
MagicException: If libmagic encounters an error
210
"""
211
212
def setparam(self, param, val):
213
"""
214
Set libmagic parameter.
215
216
Args:
217
param (int): Parameter constant (MAGIC_PARAM_*)
218
val (int): Parameter value
219
220
Returns:
221
int: 0 on success, -1 on failure
222
223
Raises:
224
NotImplementedError: If libmagic doesn't support parameters
225
"""
226
227
def getparam(self, param):
228
"""
229
Get libmagic parameter value.
230
231
Args:
232
param (int): Parameter constant (MAGIC_PARAM_*)
233
234
Returns:
235
int: Current parameter value
236
237
Raises:
238
NotImplementedError: If libmagic doesn't support parameters
239
"""
240
```
241
242
### Compatibility Layer Magic Class
243
244
The compat module provides an alternative Magic class interface that matches libmagic's native Python bindings.
245
246
```python { .api }
247
class Magic:
248
"""
249
Compatibility Magic class for libmagic's native Python bindings.
250
251
This class provides lower-level access to libmagic functionality
252
and is included for compatibility with existing code.
253
"""
254
255
def __init__(self, ms):
256
"""
257
Initialize Magic object with magic_t pointer.
258
259
Args:
260
ms: Magic structure pointer from magic_open()
261
"""
262
263
def close(self):
264
"""
265
Close the magic database and deallocate resources.
266
267
Must be called to properly clean up the magic object.
268
"""
269
270
def file(self, filename):
271
"""
272
Get file type description from filename.
273
274
Args:
275
filename (str | bytes): Path to file to analyze
276
277
Returns:
278
str | None: File type description or None if error occurred
279
"""
280
281
def descriptor(self, fd):
282
"""
283
Get file type description from file descriptor.
284
285
Args:
286
fd (int): File descriptor number
287
288
Returns:
289
str | None: File type description or None if error occurred
290
"""
291
292
def buffer(self, buf):
293
"""
294
Get file type description from buffer content.
295
296
Args:
297
buf (bytes): File content to analyze
298
299
Returns:
300
str | None: File type description or None if error occurred
301
"""
302
303
def error(self):
304
"""
305
Get textual description of last error.
306
307
Returns:
308
str | None: Error description or None if no error
309
"""
310
311
def setflags(self, flags):
312
"""
313
Set flags controlling magic behavior.
314
315
Args:
316
flags (int): Bitwise OR of magic flags
317
318
Returns:
319
int: 0 on success, -1 on failure
320
"""
321
322
def load(self, filename=None):
323
"""
324
Load magic database from file.
325
326
Args:
327
filename (str, optional): Database file path, None for default
328
329
Returns:
330
int: 0 on success, -1 on failure
331
"""
332
333
def compile(self, dbs):
334
"""
335
Compile magic database files.
336
337
Args:
338
dbs (str): Colon-separated list of database files
339
340
Returns:
341
int: 0 on success, -1 on failure
342
"""
343
344
def check(self, dbs):
345
"""
346
Check validity of magic database files.
347
348
Args:
349
dbs (str): Colon-separated list of database files
350
351
Returns:
352
int: 0 on success, -1 on failure
353
"""
354
355
def list(self, dbs):
356
"""
357
List entries in magic database files.
358
359
Args:
360
dbs (str): Colon-separated list of database files
361
362
Returns:
363
int: 0 on success, -1 on failure
364
"""
365
366
def errno(self):
367
"""
368
Get numeric error code from last operation.
369
370
Returns:
371
int: 0 for internal error, non-zero for OS error code
372
"""
373
```
374
375
### Compatibility Layer Factory Function
376
377
Factory function for creating compat Magic objects.
378
379
```python { .api }
380
def open(flags):
381
"""
382
Create Magic object for compatibility layer.
383
384
Args:
385
flags (int): Magic flags to use (MAGIC_* constants)
386
387
Returns:
388
Magic: Magic instance from compat module
389
390
Example:
391
from magic import compat
392
m = compat.open(compat.MAGIC_MIME)
393
m.load()
394
result = m.file('document.pdf')
395
m.close()
396
"""
397
```
398
399
### Version Information
400
401
Get libmagic version information.
402
403
```python { .api }
404
def version():
405
"""
406
Get libmagic version number.
407
408
Returns:
409
int: libmagic version number
410
411
Raises:
412
NotImplementedError: If version detection not supported
413
"""
414
```
415
416
### Exception Handling
417
418
Custom exception for magic-related errors.
419
420
```python { .api }
421
class MagicException(Exception):
422
"""
423
Exception raised by libmagic operations.
424
425
Attributes:
426
message (str): Error description from libmagic
427
"""
428
429
def __init__(self, message):
430
"""
431
Create MagicException with error message.
432
433
Args:
434
message (str): Error description
435
"""
436
```
437
438
## Configuration Constants
439
440
### Basic Flags
441
442
```python { .api }
443
MAGIC_NONE = 0x000000 # No special behavior
444
MAGIC_DEBUG = 0x000001 # Turn on debugging output
445
MAGIC_SYMLINK = 0x000002 # Follow symbolic links
446
MAGIC_COMPRESS = 0x000004 # Check inside compressed files
447
MAGIC_DEVICES = 0x000008 # Look at device file contents
448
MAGIC_MIME_TYPE = 0x000010 # Return MIME type string
449
MAGIC_MIME_ENCODING = 0x000400 # Return MIME encoding
450
MAGIC_MIME = 0x000010 # Return MIME type (same as MIME_TYPE)
451
MAGIC_EXTENSION = 0x1000000 # Return file extensions
452
MAGIC_CONTINUE = 0x000020 # Return all matches, not just first
453
MAGIC_CHECK = 0x000040 # Print warnings to stderr
454
MAGIC_PRESERVE_ATIME = 0x000080 # Restore access time after reading
455
MAGIC_RAW = 0x000100 # Don't decode non-printable characters
456
MAGIC_ERROR = 0x000200 # Handle ENOENT as real error
457
```
458
459
### No-Check Flags
460
461
```python { .api }
462
MAGIC_NO_CHECK_COMPRESS = 0x001000 # Don't check compressed files
463
MAGIC_NO_CHECK_TAR = 0x002000 # Don't check tar files
464
MAGIC_NO_CHECK_SOFT = 0x004000 # Don't check magic entries
465
MAGIC_NO_CHECK_APPTYPE = 0x008000 # Don't check application type
466
MAGIC_NO_CHECK_ELF = 0x010000 # Don't check ELF details
467
MAGIC_NO_CHECK_ASCII = 0x020000 # Don't check ASCII files
468
MAGIC_NO_CHECK_TROFF = 0x040000 # Don't check ASCII/troff
469
MAGIC_NO_CHECK_FORTRAN = 0x080000 # Don't check ASCII/fortran
470
MAGIC_NO_CHECK_TOKENS = 0x100000 # Don't check ASCII/tokens
471
```
472
473
### Additional Flags (Compatibility Layer)
474
475
```python { .api }
476
MAGIC_APPLE = 2048 # Return Apple creator/type
477
MAGIC_NO_CHECK_TEXT = 131072 # Don't check text files
478
MAGIC_NO_CHECK_CDF = 262144 # Don't check CDF files
479
MAGIC_NO_CHECK_ENCODING = 2097152 # Don't check for text encoding
480
MAGIC_NO_CHECK_BUILTIN = 4173824 # Don't use built-in tests
481
```
482
483
### Parameter Constants
484
485
```python { .api }
486
MAGIC_PARAM_INDIR_MAX = 0 # Recursion limit for indirect magic
487
MAGIC_PARAM_NAME_MAX = 1 # Use count limit for name/use magic
488
MAGIC_PARAM_ELF_PHNUM_MAX = 2 # Max ELF notes processed
489
MAGIC_PARAM_ELF_SHNUM_MAX = 3 # Max ELF program sections processed
490
MAGIC_PARAM_ELF_NOTES_MAX = 4 # Max ELF sections processed
491
MAGIC_PARAM_REGEX_MAX = 5 # Length limit for regex searches
492
MAGIC_PARAM_BYTES_MAX = 6 # Max bytes to read from file
493
```
494
495
### Compatibility Functions (Deprecated)
496
497
These functions provide compatibility with libmagic's native Python bindings but generate deprecation warnings.
498
499
```python { .api }
500
def detect_from_filename(filename):
501
"""
502
Detect file type from filename (compatibility function).
503
504
Args:
505
filename (str): Path to file
506
507
Returns:
508
FileMagic: Named tuple with mime_type, encoding, and name fields
509
510
Warnings:
511
PendingDeprecationWarning: This function is deprecated
512
"""
513
514
def detect_from_content(byte_content):
515
"""
516
Detect file type from bytes (compatibility function).
517
518
Args:
519
byte_content (bytes): File content to analyze
520
521
Returns:
522
FileMagic: Named tuple with mime_type, encoding, and name fields
523
524
Warnings:
525
PendingDeprecationWarning: This function is deprecated
526
"""
527
528
def detect_from_fobj(fobj):
529
"""
530
Detect file type from file object (compatibility function).
531
532
Args:
533
fobj: File-like object with fileno() method
534
535
Returns:
536
FileMagic: Named tuple with mime_type, encoding, and name fields
537
538
Warnings:
539
PendingDeprecationWarning: This function is deprecated
540
"""
541
542
def open(flags):
543
"""
544
Create Magic object (compatibility function).
545
546
Args:
547
flags (int): Magic flags to use
548
549
Returns:
550
Magic: Magic instance from compat module
551
552
Warnings:
553
PendingDeprecationWarning: This function is deprecated
554
"""
555
```
556
557
### FileMagic Type
558
559
```python { .api }
560
FileMagic = namedtuple('FileMagic', ('mime_type', 'encoding', 'name'))
561
```
562
563
## Usage Examples
564
565
### Working with Different File Types
566
567
```python
568
import magic
569
570
# Web files
571
magic.from_file('page.html', mime=True) # 'text/html'
572
magic.from_file('style.css', mime=True) # 'text/css'
573
magic.from_file('script.js', mime=True) # 'text/javascript'
574
575
# Images
576
magic.from_file('photo.jpg', mime=True) # 'image/jpeg'
577
magic.from_file('icon.png', mime=True) # 'image/png'
578
magic.from_file('drawing.svg', mime=True) # 'image/svg+xml'
579
580
# Documents
581
magic.from_file('report.pdf', mime=True) # 'application/pdf'
582
magic.from_file('data.xlsx', mime=True) # 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
583
584
# Archives
585
magic.from_file('backup.zip', mime=True) # 'application/zip'
586
magic.from_file('source.tar.gz', mime=True) # 'application/gzip'
587
```
588
589
### Advanced Detection with Magic Class
590
591
```python
592
import magic
593
594
# Create Magic instance for compressed files
595
m = magic.Magic(uncompress=True)
596
597
# This will look inside the compressed file
598
result = m.from_file('archive.tar.gz')
599
print(result) # May show "POSIX tar archive" instead of just "gzip compressed data"
600
601
# Get both MIME type and encoding
602
m = magic.Magic(mime=True, mime_encoding=True)
603
result = m.from_file('document.txt')
604
print(result) # 'text/plain; charset=utf-8'
605
606
# Get file extensions
607
try:
608
m = magic.Magic(extension=True)
609
extensions = m.from_file('image.jpg')
610
print(extensions) # 'jpeg/jpg/jpe/jfif'
611
except NotImplementedError:
612
print("Extension detection requires libmagic version 524 or higher")
613
```
614
615
### Error Handling
616
617
```python
618
import magic
619
from magic import MagicException
620
621
try:
622
# This will raise IOError if file doesn't exist
623
result = magic.from_file('nonexistent.txt')
624
except IOError as e:
625
print(f"File access error: {e}")
626
except MagicException as e:
627
print(f"Magic detection error: {e.message}")
628
629
# Handling buffer detection errors
630
try:
631
# Empty buffer might cause issues
632
result = magic.from_buffer(b'')
633
except MagicException as e:
634
print(f"Detection failed: {e.message}")
635
```
636
637
### Working with File Descriptors
638
639
```python
640
import magic
641
import os
642
643
# Open file and get file descriptor
644
with open('document.pdf', 'rb') as f:
645
fd = f.fileno()
646
file_type = magic.from_descriptor(fd)
647
print(file_type) # Works while file is open
648
649
# Using with stdin/stdout/stderr
650
import sys
651
try:
652
stdin_type = magic.from_descriptor(sys.stdin.fileno())
653
print(f"stdin type: {stdin_type}")
654
except:
655
print("stdin detection failed (may be redirected)")
656
```
657
658
### Custom Magic Database
659
660
```python
661
import magic
662
663
# Use custom magic database file
664
try:
665
m = magic.Magic(magic_file='/path/to/custom.mgc')
666
result = m.from_file('specialized_file.dat')
667
print(result)
668
except ImportError:
669
print("Custom magic file not found or invalid")
670
```