0
# Utility Functions
1
2
Helper functions for file processing, filtering, and output formatting. These utilities support the core analysis functionality with file discovery, result filtering, and various output options.
3
4
## Capabilities
5
6
### File Discovery
7
8
Functions for finding and filtering source files for analysis.
9
10
```python { .api }
11
def get_all_source_files(paths, exclude_patterns, lans):
12
"""
13
Gets all source files from paths with exclusion patterns and language filtering.
14
Includes gitignore support and recursive directory traversal.
15
16
Args:
17
paths (list): List of file or directory paths to search
18
exclude_patterns (list): List of glob patterns to exclude from analysis
19
lans (list): List of language names to filter (None for all languages)
20
21
Returns:
22
iterator: Iterator of filtered source file paths
23
24
Example:
25
files = get_all_source_files(
26
['src/', 'lib/'],
27
['*test*', '*.min.js', 'build/*'],
28
['python', 'javascript']
29
)
30
for filepath in files:
31
print(f"Found: {filepath}")
32
"""
33
```
34
35
### Result Filtering
36
37
Functions for filtering analysis results based on thresholds and criteria.
38
39
```python { .api }
40
def warning_filter(option, module_infos):
41
"""
42
Filters functions that exceed specified thresholds.
43
44
Args:
45
option: Configuration object with threshold settings (CCN, length, etc.)
46
module_infos: Iterator of FileInformation objects
47
48
Returns:
49
generator: Generator yielding functions exceeding thresholds
50
51
Example:
52
# option.CCN = 10, option.length = 50
53
warnings = warning_filter(options, analysis_results)
54
for func_info in warnings:
55
print(f"Warning: {func_info.name} exceeds thresholds")
56
"""
57
58
def whitelist_filter(warnings, script=None, whitelist=None):
59
"""
60
Filters warnings based on whitelist configuration.
61
Removes warnings for functions/files specified in whitelist.
62
63
Args:
64
warnings: Iterator of warning objects
65
script (str): Path to whitelist script (optional)
66
whitelist (str): Path to whitelist file (optional, default: "whitelizard.txt")
67
68
Returns:
69
generator: Generator yielding filtered warnings not in whitelist
70
71
Example:
72
filtered_warnings = whitelist_filter(warnings, whitelist="ignore.txt")
73
for warning in filtered_warnings:
74
print(f"Genuine warning: {warning}")
75
"""
76
```
77
78
### File Hashing
79
80
Function for generating file hashes for duplicate detection.
81
82
```python { .api }
83
def md5_hash_file(full_path_name):
84
"""
85
Calculates MD5 hash of a file for duplicate detection.
86
87
Args:
88
full_path_name (str): Full path to the file to hash
89
90
Returns:
91
str: MD5 hash string of file content
92
93
Example:
94
hash1 = md5_hash_file('src/file1.py')
95
hash2 = md5_hash_file('src/file2.py')
96
if hash1 == hash2:
97
print("Files are identical")
98
"""
99
```
100
101
### Output Functions
102
103
Functions for formatting and displaying analysis results in different styles.
104
105
```python { .api }
106
def print_clang_style_warning(code_infos, option, scheme, _):
107
"""
108
Prints warnings in clang/gcc compiler format.
109
110
Args:
111
code_infos: Iterator of code information objects
112
option: Configuration options object
113
scheme: Output formatting scheme
114
_: Unused parameter (for interface compatibility)
115
116
Returns:
117
int: Number of warnings printed
118
119
Example Output:
120
src/app.py:25: warning: function has high complexity (15)
121
"""
122
123
def print_msvs_style_warning(code_infos, option, scheme, _):
124
"""
125
Prints warnings in Microsoft Visual Studio format.
126
127
Args:
128
code_infos: Iterator of code information objects
129
option: Configuration options object
130
scheme: Output formatting scheme
131
_: Unused parameter
132
133
Returns:
134
int: Number of warnings printed
135
136
Example Output:
137
src/app.py(25) : warning: function has high complexity (15)
138
"""
139
140
def silent_printer(result, *_):
141
"""
142
Silent printer that exhausts result iterator without output.
143
Used for analysis without display output.
144
145
Args:
146
result: Iterator of results to consume
147
*_: Additional arguments (ignored)
148
149
Returns:
150
int: Always returns 0
151
152
Example:
153
# Analyze without printing results
154
exit_code = silent_printer(analysis_results)
155
"""
156
```
157
158
### Threading and Parallel Processing
159
160
Functions for managing multi-threaded analysis and parallel file processing.
161
162
```python { .api }
163
def map_files_to_analyzer(files, analyzer, working_threads):
164
"""
165
Maps files to analyzer using appropriate threading method.
166
167
Args:
168
files: Iterator of file paths to analyze
169
analyzer: FileAnalyzer instance to use for analysis
170
working_threads (int): Number of threads to use (1 for single-threaded)
171
172
Returns:
173
iterator: Results from analyzing files
174
175
Example:
176
analyzer = FileAnalyzer([])
177
files = ['app.py', 'utils.py', 'config.py']
178
results = map_files_to_analyzer(files, analyzer, 4)
179
for result in results:
180
print(f"Analyzed: {result.filename}")
181
"""
182
183
def get_map_method(working_threads):
184
"""
185
Returns appropriate mapping method based on thread count.
186
187
Args:
188
working_threads (int): Number of working threads
189
190
Returns:
191
function: Either multiprocessing.Pool.imap_unordered or built-in map
192
193
Example:
194
map_func = get_map_method(4) # Returns pool.imap_unordered
195
map_func = get_map_method(1) # Returns built-in map
196
"""
197
198
def print_extension_results(extensions):
199
"""
200
Prints results from analysis extensions that have print_result method.
201
202
Args:
203
extensions (list): List of extension objects
204
205
Example:
206
extensions = get_extensions(['wordcount', 'duplicate'])
207
print_extension_results(extensions)
208
"""
209
```
210
211
### Constants
212
213
Default configuration values used throughout the system.
214
215
```python { .api }
216
DEFAULT_CCN_THRESHOLD: int = 15
217
"""Default cyclomatic complexity threshold for warnings"""
218
219
DEFAULT_WHITELIST: str = "whitelizard.txt"
220
"""Default whitelist filename for filtering warnings"""
221
222
DEFAULT_MAX_FUNC_LENGTH: int = 1000
223
"""Default maximum function length threshold"""
224
```
225
226
## Usage Examples
227
228
### File Discovery with Filtering
229
230
```python
231
from lizard import get_all_source_files
232
233
# Find all Python and JavaScript files, excluding tests and build artifacts
234
source_files = get_all_source_files(
235
paths=['src/', 'lib/', 'app/'],
236
exclude_patterns=[
237
'*test*', # Exclude test files
238
'*Test*', # Exclude Test files
239
'*/tests/*', # Exclude tests directories
240
'*/node_modules/*', # Exclude npm dependencies
241
'*/build/*', # Exclude build artifacts
242
'*.min.js', # Exclude minified files
243
'*/migrations/*' # Exclude database migrations
244
],
245
lans=['python', 'javascript']
246
)
247
248
print("Source files found:")
249
for filepath in source_files:
250
print(f" {filepath}")
251
```
252
253
### Threshold-Based Filtering
254
255
```python
256
import lizard
257
from lizard import warning_filter
258
259
# Create configuration with custom thresholds
260
class AnalysisOptions:
261
def __init__(self):
262
self.CCN = 8 # Complexity threshold
263
self.length = 40 # Function length threshold
264
self.arguments = 4 # Parameter count threshold
265
self.nloc = 30 # Lines of code threshold
266
267
options = AnalysisOptions()
268
269
# Analyze files
270
results = lizard.analyze(['src/'])
271
272
# Filter functions exceeding thresholds
273
warnings = warning_filter(options, results)
274
275
print("Functions exceeding thresholds:")
276
for func_info in warnings:
277
issues = []
278
if func_info.cyclomatic_complexity > options.CCN:
279
issues.append(f"CCN={func_info.cyclomatic_complexity}")
280
if func_info.length > options.length:
281
issues.append(f"Length={func_info.length}")
282
if func_info.parameter_count > options.arguments:
283
issues.append(f"Args={func_info.parameter_count}")
284
if func_info.nloc > options.nloc:
285
issues.append(f"NLOC={func_info.nloc}")
286
287
print(f" {func_info.name}: {', '.join(issues)}")
288
```
289
290
### Whitelist Filtering
291
292
```python
293
from lizard import warning_filter, whitelist_filter
294
import lizard
295
296
# Create whitelist file
297
whitelist_content = """
298
# Ignore complex legacy functions
299
src/legacy.py:old_complex_function
300
src/legacy.py:another_complex_function
301
302
# Ignore generated code
303
src/generated/*
304
305
# Ignore specific patterns
306
*_test.py:*
307
"""
308
309
with open('project_whitelist.txt', 'w') as f:
310
f.write(whitelist_content)
311
312
# Analyze and filter
313
results = lizard.analyze(['src/'])
314
warnings = warning_filter(options, results)
315
filtered_warnings = whitelist_filter(warnings, whitelist='project_whitelist.txt')
316
317
print("Warnings after whitelist filtering:")
318
for warning in filtered_warnings:
319
print(f" {warning.name} in {warning.filename}")
320
```
321
322
### File Duplicate Detection
323
324
```python
325
from lizard import md5_hash_file
326
import os
327
328
def find_duplicate_files(directory):
329
"""Find duplicate files by MD5 hash comparison."""
330
file_hashes = {}
331
duplicates = []
332
333
for root, dirs, files in os.walk(directory):
334
for file in files:
335
if file.endswith(('.py', '.js', '.java', '.cpp')):
336
filepath = os.path.join(root, file)
337
try:
338
filehash = md5_hash_file(filepath)
339
if filehash in file_hashes:
340
duplicates.append((filepath, file_hashes[filehash]))
341
else:
342
file_hashes[filehash] = filepath
343
except Exception as e:
344
print(f"Error hashing {filepath}: {e}")
345
346
return duplicates
347
348
# Find duplicates in source directory
349
duplicates = find_duplicate_files('src/')
350
if duplicates:
351
print("Duplicate files found:")
352
for file1, file2 in duplicates:
353
print(f" {file1} == {file2}")
354
else:
355
print("No duplicate files found")
356
```
357
358
### Custom Output Formatting
359
360
```python
361
from lizard import print_clang_style_warning, print_msvs_style_warning
362
import lizard
363
364
class CustomOptions:
365
def __init__(self):
366
self.CCN = 10
367
self.length = 50
368
369
class CustomScheme:
370
def function_info(self, func):
371
return f"{func.name}: CCN={func.cyclomatic_complexity}, NLOC={func.nloc}"
372
373
options = CustomOptions()
374
scheme = CustomScheme()
375
376
# Analyze code
377
results = lizard.analyze(['src/'])
378
warnings = lizard.warning_filter(options, results)
379
380
# Print warnings in different formats
381
print("Clang-style warnings:")
382
clang_count = print_clang_style_warning(warnings, options, scheme, None)
383
384
print(f"\nTotal warnings: {clang_count}")
385
386
# Reset iterator for second format
387
warnings = lizard.warning_filter(options, lizard.analyze(['src/']))
388
print("\nVisual Studio-style warnings:")
389
msvs_count = print_msvs_style_warning(warnings, options, scheme, None)
390
```
391
392
### Silent Analysis
393
394
```python
395
from lizard import silent_printer
396
import lizard
397
398
# Perform analysis without output (for programmatic use)
399
results = lizard.analyze(['src/'])
400
401
# Count results without printing
402
result_list = list(results)
403
total_files = len(result_list)
404
total_functions = sum(len(fi.function_list) for fi in result_list)
405
406
print(f"Silent analysis complete:")
407
print(f" Files analyzed: {total_files}")
408
print(f" Functions found: {total_functions}")
409
410
# Use silent printer to consume iterator without output
411
results = lizard.analyze(['src/'])
412
exit_code = silent_printer(results)
413
print(f"Analysis exit code: {exit_code}")
414
```