0
# Copy-Paste Detection
1
2
The PMD JSP module provides copy-paste detection (CPD) capabilities for JSP files through a specialized lexer that tokenizes JSP content for duplicate code analysis.
3
4
## CPD Lexer
5
6
### JspCpdLexer
7
8
Tokenizes JSP files for PMD's Copy-Paste Detector to identify duplicate code blocks.
9
10
```java { .api }
11
public class JspCpdLexer extends JavaccCpdLexer {
12
protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc);
13
}
14
```
15
16
**Usage:**
17
18
```java
19
import net.sourceforge.pmd.lang.jsp.cpd.JspCpdLexer;
20
import net.sourceforge.pmd.lang.document.TextDocument;
21
import net.sourceforge.pmd.cpd.CpdLexer;
22
23
// Create CPD lexer for JSP files
24
CpdLexer lexer = new JspCpdLexer();
25
26
// The lexer is typically used by PMD's CPD framework automatically
27
// when processing JSP files for duplicate detection
28
```
29
30
**Methods:**
31
32
- `makeLexerImpl(TextDocument)`: Creates a token manager for the given JSP document
33
34
## Integration with PMD CPD
35
36
### Language Module Integration
37
38
The JSP language module integrates CPD support through the `CpdCapableLanguage` interface:
39
40
```java { .api }
41
public class JspLanguageModule extends SimpleLanguageModuleBase implements CpdCapableLanguage {
42
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle);
43
}
44
```
45
46
**Usage:**
47
48
```java
49
import net.sourceforge.pmd.lang.jsp.JspLanguageModule;
50
import net.sourceforge.pmd.cpd.CpdLexer;
51
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
52
53
// Get language module
54
JspLanguageModule module = JspLanguageModule.getInstance();
55
56
// Create CPD lexer with configuration
57
LanguagePropertyBundle properties = // ... PMD configuration
58
CpdLexer lexer = module.createCpdLexer(properties);
59
```
60
61
## Token Management
62
63
### JSP Token Handling
64
65
The CPD lexer uses the same token management as the main JSP parser:
66
67
```java { .api }
68
public final class JspTokenKinds {
69
public static final String[] TOKEN_NAMES;
70
public static TokenManager<JavaccToken> newTokenManager(CharStream cs);
71
}
72
```
73
74
### Token Behavior
75
76
```java { .api }
77
public final class InternalApiBridge {
78
@InternalApi
79
public static JavaccTokenDocument.TokenDocumentBehavior getJspTokenBehavior();
80
}
81
```
82
83
**Note:** `InternalApiBridge` provides access to token behavior configuration but is marked as internal API.
84
85
## CPD Analysis Process
86
87
### Tokenization Process
88
89
1. **Document Input**: JSP files are provided as `TextDocument` instances
90
2. **Token Generation**: `JspCpdLexer` creates tokens representing JSP constructs
91
3. **Token Filtering**: Tokens are processed to identify meaningful code blocks
92
4. **Duplicate Detection**: PMD's CPD engine compares token sequences across files
93
5. **Report Generation**: Duplicate blocks are reported with file locations and similarity metrics
94
95
### Token Types
96
97
The lexer generates tokens for:
98
99
- **HTML Elements**: Tags, attributes, and content
100
- **JSP Directives**: Page directives, includes, taglib declarations
101
- **JSP Actions**: JSP expressions, scriptlets, declarations
102
- **Expression Language**: EL expressions and JSF value bindings
103
- **Comments**: Both HTML and JSP comments
104
- **Text Content**: Plain text and CDATA sections
105
106
## CPD Configuration
107
108
### File Extensions
109
110
CPD automatically processes files with JSP-related extensions:
111
112
- `.jsp`: JavaServer Pages
113
- `.jspx`: JSP XML format
114
- `.jspf`: JSP fragment files
115
- `.tag`: JSP tag files
116
117
### Command Line Usage
118
119
```bash
120
# Run CPD on JSP files
121
pmd cpd --minimum-tokens 50 --language jsp --dir src/main/webapp
122
123
# Include JSP files in multi-language analysis
124
pmd cpd --minimum-tokens 50 --language jsp,java --dir src
125
```
126
127
### Programmatic Usage
128
129
```java
130
import net.sourceforge.pmd.cpd.CPD;
131
import net.sourceforge.pmd.cpd.CPDConfiguration;
132
import net.sourceforge.pmd.lang.jsp.JspLanguageModule;
133
134
// Configure CPD for JSP analysis
135
CPDConfiguration config = new CPDConfiguration();
136
config.setMinimumTileSize(50);
137
config.setLanguage(JspLanguageModule.getInstance());
138
139
// Create and run CPD
140
CPD cpd = new CPD(config);
141
cpd.addAllInDirectory("src/main/webapp");
142
cpd.go();
143
144
// Process results
145
cpd.getMatches().forEach(match -> {
146
System.out.println("Duplicate found:");
147
System.out.println(" Lines: " + match.getLineCount());
148
System.out.println(" Tokens: " + match.getTokenCount());
149
match.getMarkSet().forEach(mark -> {
150
System.out.println(" File: " + mark.getFilename() +
151
" at line " + mark.getBeginLine());
152
});
153
});
154
```
155
156
## Duplicate Detection Examples
157
158
### Common JSP Duplicates
159
160
**Duplicate JSP Expressions:**
161
```jsp
162
<!-- File 1 -->
163
<%= request.getAttribute("userName") %>
164
165
<!-- File 2 -->
166
<%= request.getAttribute("userName") %>
167
```
168
169
**Duplicate Element Structures:**
170
```jsp
171
<!-- File 1 -->
172
<div class="form-group">
173
<label for="email">Email:</label>
174
<input type="email" id="email" name="email" required>
175
</div>
176
177
<!-- File 2 -->
178
<div class="form-group">
179
<label for="email">Email:</label>
180
<input type="email" id="email" name="email" required>
181
</div>
182
```
183
184
**Duplicate Scriptlet Blocks:**
185
```jsp
186
<!-- File 1 -->
187
<%
188
String userName = (String) session.getAttribute("user");
189
if (userName == null) {
190
response.sendRedirect("login.jsp");
191
return;
192
}
193
%>
194
195
<!-- File 2 -->
196
<%
197
String userName = (String) session.getAttribute("user");
198
if (userName == null) {
199
response.sendRedirect("login.jsp");
200
return;
201
}
202
%>
203
```
204
205
## Advanced CPD Features
206
207
### Custom Token Filtering
208
209
```java
210
import net.sourceforge.pmd.cpd.CpdLexer;
211
import net.sourceforge.pmd.cpd.TokenEntry;
212
213
public class CustomJspCpdLexer extends JspCpdLexer {
214
215
@Override
216
protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc) {
217
TokenManager<JavaccToken> tokenManager = super.makeLexerImpl(doc);
218
219
// Apply custom filtering logic
220
return new FilteringTokenManager(tokenManager);
221
}
222
223
private static class FilteringTokenManager implements TokenManager<JavaccToken> {
224
private final TokenManager<JavaccToken> delegate;
225
226
public FilteringTokenManager(TokenManager<JavaccToken> delegate) {
227
this.delegate = delegate;
228
}
229
230
@Override
231
public JavaccToken getNextToken() {
232
JavaccToken token = delegate.getNextToken();
233
234
// Skip whitespace-only text tokens
235
while (token != null && isWhitespaceOnlyText(token)) {
236
token = delegate.getNextToken();
237
}
238
239
return token;
240
}
241
242
private boolean isWhitespaceOnlyText(JavaccToken token) {
243
return token.getImage().trim().isEmpty();
244
}
245
}
246
}
247
```
248
249
### Integration with Build Tools
250
251
**Maven Integration:**
252
```xml
253
<plugin>
254
<groupId>com.github.spotbugs</groupId>
255
<artifactId>spotbugs-maven-plugin</artifactId>
256
<configuration>
257
<includeTests>false</includeTests>
258
<languages>
259
<language>jsp</language>
260
<language>java</language>
261
</languages>
262
<minimumTokens>50</minimumTokens>
263
</configuration>
264
</plugin>
265
```
266
267
**Gradle Integration:**
268
```gradle
269
plugins {
270
id 'pmd'
271
}
272
273
pmd {
274
consoleOutput = true
275
toolVersion = "7.13.0"
276
ruleSetFiles = files("config/pmd/jsp-cpd-rules.xml")
277
}
278
279
task cpdJsp(type: JavaExec) {
280
main = "net.sourceforge.pmd.cpd.CPD"
281
classpath = configurations.pmd
282
args = [
283
"--minimum-tokens", "50",
284
"--language", "jsp",
285
"--dir", "src/main/webapp",
286
"--format", "text"
287
]
288
}
289
```
290
291
## CPD Reporting
292
293
### Report Formats
294
295
CPD supports multiple output formats for JSP duplicate detection:
296
297
- **Text**: Human-readable console output
298
- **XML**: Structured XML for tool integration
299
- **CSV**: Comma-separated values for spreadsheet analysis
300
- **JSON**: JSON format for programmatic processing
301
- **HTML**: Web-viewable reports with syntax highlighting
302
303
### Custom Report Processing
304
305
```java
306
import net.sourceforge.pmd.cpd.Match;
307
import net.sourceforge.pmd.cpd.Mark;
308
309
public class JspDuplicateAnalyzer {
310
311
public void analyzeDuplicates(List<Match> matches) {
312
for (Match match : matches) {
313
System.out.println("Duplicate Block:");
314
System.out.println(" Size: " + match.getTokenCount() + " tokens, " +
315
match.getLineCount() + " lines");
316
317
for (Mark mark : match.getMarkSet()) {
318
System.out.println(" Location: " + mark.getFilename() +
319
":" + mark.getBeginLine() + "-" + mark.getEndLine());
320
321
// Analyze JSP-specific patterns
322
if (mark.getFilename().endsWith(".jsp")) {
323
analyzeJspDuplicate(mark, match);
324
}
325
}
326
}
327
}
328
329
private void analyzeJspDuplicate(Mark mark, Match match) {
330
// Custom analysis for JSP duplicates
331
if (match.getSourceCodeSlice().contains("<%=")) {
332
System.out.println(" Contains JSP expressions");
333
}
334
if (match.getSourceCodeSlice().contains("${")) {
335
System.out.println(" Contains EL expressions");
336
}
337
if (match.getSourceCodeSlice().contains("<%@")) {
338
System.out.println(" Contains JSP directives");
339
}
340
}
341
}
342
```
343
344
## Performance Considerations
345
346
### Large JSP File Handling
347
348
For large JSP applications:
349
350
1. **Increase Minimum Token Count**: Use higher values (100-200) to focus on significant duplicates
351
2. **Directory Filtering**: Exclude generated JSP files and third-party libraries
352
3. **Parallel Processing**: Use CPD's built-in parallel processing for large codebases
353
4. **Memory Configuration**: Increase JVM heap size for very large projects
354
355
### Optimization Tips
356
357
```java
358
// Configure CPD for optimal JSP analysis
359
CPDConfiguration config = new CPDConfiguration();
360
config.setMinimumTileSize(100); // Higher threshold for large projects
361
config.setSkipDuplicateFiles(true); // Skip identical files
362
config.setIgnoreIdentifiers(false); // Keep identifier sensitivity for JSP
363
config.setIgnoreLiterals(true); // Ignore string/numeric literal differences
364
```
365
366
The CPD integration provides comprehensive duplicate detection for JSP files, helping maintain code quality and identify refactoring opportunities in JSP-based web applications.