0
# PMD C++ Language Module
1
2
PMD C++ Language Module provides C++ language support for PMD's Copy-Paste Detector (CPD). This library enables duplicate code detection in C++ source files by providing lexical analysis and tokenization capabilities specifically tailored for C++ syntax, including preprocessor directive handling and configurable code block filtering.
3
4
## Package Information
5
6
- **Package Name**: pmd-cpp
7
- **Package Type**: maven
8
- **Language**: Java
9
- **Group ID**: net.sourceforge.pmd
10
- **Artifact ID**: pmd-cpp
11
- **Installation**: Add as Maven dependency:
12
13
```xml
14
<dependency>
15
<groupId>net.sourceforge.pmd</groupId>
16
<artifactId>pmd-cpp</artifactId>
17
<version>7.13.0</version>
18
</dependency>
19
```
20
21
## Core Imports
22
23
```java
24
import net.sourceforge.pmd.lang.cpp.CppLanguageModule;
25
import net.sourceforge.pmd.lang.cpp.cpd.CppCpdLexer;
26
import net.sourceforge.pmd.lang.cpp.cpd.CppEscapeTranslator;
27
import net.sourceforge.pmd.cpd.CpdLexer;
28
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
29
import net.sourceforge.pmd.cpd.CpdLanguageProperties;
30
```
31
32
## Basic Usage
33
34
```java
35
import net.sourceforge.pmd.lang.cpp.CppLanguageModule;
36
import net.sourceforge.pmd.cpd.CpdLexer;
37
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
38
39
// Get the C++ language module instance
40
CppLanguageModule cppModule = CppLanguageModule.getInstance();
41
42
// Create a property bundle with default settings
43
LanguagePropertyBundle properties = cppModule.newPropertyBundle();
44
45
// Create a CPD lexer for C++ tokenization
46
CpdLexer lexer = cppModule.createCpdLexer(properties);
47
48
// The lexer can now be used to tokenize C++ source files for duplicate detection
49
```
50
51
## Architecture
52
53
The PMD C++ module is built around several key components:
54
55
- **CppLanguageModule**: Main entry point that extends PMD's language module framework
56
- **CppCpdLexer**: Core tokenizer that processes C++ source code into tokens for CPD analysis
57
- **CppEscapeTranslator**: Processes C++ backslash line continuations according to C++ standards
58
- **Internal Components**: Block skipping and preprocessing functionality handled internally
59
- **Token Processing**: Supports comprehensive C++ token recognition including literals, identifiers, operators, and comments
60
61
## Capabilities
62
63
### Language Module Management
64
65
Core language module functionality for registering and configuring C++ support within the PMD framework.
66
67
```java { .api }
68
public class CppLanguageModule extends CpdOnlyLanguageModuleBase {
69
public CppLanguageModule();
70
public static CppLanguageModule getInstance();
71
public LanguagePropertyBundle newPropertyBundle();
72
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle);
73
}
74
```
75
76
**Properties**:
77
78
```java { .api }
79
public static final PropertyDescriptor<String> CPD_SKIP_BLOCKS;
80
```
81
82
### C++ Tokenization
83
84
Advanced C++ lexical analysis with support for preprocessor directives, line continuations, and configurable filtering.
85
86
```java { .api }
87
public class CppCpdLexer extends JavaccCpdLexer {
88
public CppCpdLexer(LanguagePropertyBundle cppProperties);
89
protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc);
90
protected TokenManager<JavaccToken> filterTokenStream(TokenManager<JavaccToken> tokenManager);
91
protected void processToken(TokenFactory tokenEntries, JavaccToken currentToken);
92
}
93
```
94
95
### Escape Processing
96
97
Handles C++ backslash line continuation processing according to C++ language standards.
98
99
```java { .api }
100
public class CppEscapeTranslator extends BackslashEscapeTranslator {
101
public CppEscapeTranslator(TextDocument input);
102
protected int handleBackslash(int maxOff, int backSlashOff);
103
}
104
```
105
106
107
## Configuration Properties
108
109
The module supports several CPD configuration properties that can be set via the LanguagePropertyBundle:
110
111
### Skip Blocks Configuration
112
113
```java { .api }
114
// Skip code blocks matching start/end patterns (pipe-separated)
115
CppLanguageModule.CPD_SKIP_BLOCKS
116
```
117
118
**Default**: Skips conditionally compiled code (`#if 0|#endif`)
119
120
**Usage**: Set to empty string to disable, or provide custom start|end patterns
121
122
### Standard CPD Properties
123
124
```java { .api }
125
// Ignore literal sequences in duplicate detection
126
CpdLanguageProperties.CPD_IGNORE_LITERAL_SEQUENCES
127
128
// Ignore literal and identifier sequences
129
CpdLanguageProperties.CPD_IGNORE_LITERAL_AND_IDENTIFIER_SEQUENCES
130
131
// Replace identifiers with generic placeholders
132
CpdLanguageProperties.CPD_ANONYMIZE_IDENTIFIERS
133
134
// Replace literals with generic placeholders
135
CpdLanguageProperties.CPD_ANONYMIZE_LITERALS
136
```
137
138
## Supported File Extensions
139
140
The C++ language module automatically recognizes the following file extensions:
141
142
- `.h` - C/C++ header files
143
- `.hpp` - C++ header files
144
- `.hxx` - C++ header files
145
- `.c` - C source files
146
- `.cpp` - C++ source files
147
- `.cxx` - C++ source files
148
- `.cc` - C++ source files
149
- `.C` - C++ source files
150
151
## Token Types
152
153
The lexer recognizes and processes the following C++ token categories:
154
155
### Literals
156
- **String literals**: Regular strings and raw strings
157
- **Character literals**: Single character constants
158
- **Numeric literals**: Decimal, hexadecimal, octal, binary integers and floating-point numbers
159
160
### Identifiers and Keywords
161
- **Identifiers**: Variable, function, and type names
162
- **C++ Keywords**: All standard C++ language keywords
163
164
### Operators and Punctuation
165
- **Arithmetic operators**: `+`, `-`, `*`, `/`, `%`, etc.
166
- **Logical operators**: `&&`, `||`, `!`, etc.
167
- **Comparison operators**: `==`, `!=`, `<`, `>`, `<=`, `>=`
168
- **Assignment operators**: `=`, `+=`, `-=`, etc.
169
- **Punctuation**: `{`, `}`, `(`, `)`, `[`, `]`, `;`, `,`, etc.
170
171
### Comments
172
- **Single-line comments**: `// comment text`
173
- **Multi-line comments**: `/* comment text */`
174
- **Preprocessor comments**: Comments within preprocessor directives
175
176
## Advanced Features
177
178
### Preprocessor Directive Handling
179
- Automatic skipping of preprocessor output
180
- Configurable conditional compilation block skipping
181
- Support for line continuation handling in preprocessor directives
182
183
### Line Continuation Processing
184
- Proper handling of backslash-newline sequences
185
- Support for both Unix (`\n`) and Windows (`\r\n`) line endings
186
- Maintains source location accuracy after escape processing
187
188
### Token Filtering
189
- Configurable literal sequence filtering for improved duplicate detection
190
- Identifier anonymization for focusing on code structure
191
- Balanced brace handling in complex expressions
192
- Smart sequence detection for array/object initializers
193
194
### Error Handling
195
The module provides robust error handling for:
196
- Malformed C++ source files
197
- Invalid preprocessor directives
198
- Unterminated comments or strings
199
- Character encoding issues
200
201
Errors are reported through PMD's standard error reporting mechanisms with accurate source location information.