or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

ast-node-types.mdcopy-paste-detection.mdindex.mdjsp-parser-ast.mdlanguage-module.mdrule-development.mdvisitor-pattern.md

copy-paste-detection.mddocs/

0

# Copy-Paste Detection

1

2

The PMD JSP module provides copy-paste detection (CPD) capabilities for JSP files through a specialized lexer that tokenizes JSP content for duplicate code analysis.

3

4

## CPD Lexer

5

6

### JspCpdLexer

7

8

Tokenizes JSP files for PMD's Copy-Paste Detector to identify duplicate code blocks.

9

10

```java { .api }

11

public class JspCpdLexer extends JavaccCpdLexer {

12

protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc);

13

}

14

```

15

16

**Usage:**

17

18

```java

19

import net.sourceforge.pmd.lang.jsp.cpd.JspCpdLexer;

20

import net.sourceforge.pmd.lang.document.TextDocument;

21

import net.sourceforge.pmd.cpd.CpdLexer;

22

23

// Create CPD lexer for JSP files

24

CpdLexer lexer = new JspCpdLexer();

25

26

// The lexer is typically used by PMD's CPD framework automatically

27

// when processing JSP files for duplicate detection

28

```

29

30

**Methods:**

31

32

- `makeLexerImpl(TextDocument)`: Creates a token manager for the given JSP document

33

34

## Integration with PMD CPD

35

36

### Language Module Integration

37

38

The JSP language module integrates CPD support through the `CpdCapableLanguage` interface:

39

40

```java { .api }

41

public class JspLanguageModule extends SimpleLanguageModuleBase implements CpdCapableLanguage {

42

public CpdLexer createCpdLexer(LanguagePropertyBundle bundle);

43

}

44

```

45

46

**Usage:**

47

48

```java

49

import net.sourceforge.pmd.lang.jsp.JspLanguageModule;

50

import net.sourceforge.pmd.cpd.CpdLexer;

51

import net.sourceforge.pmd.lang.LanguagePropertyBundle;

52

53

// Get language module

54

JspLanguageModule module = JspLanguageModule.getInstance();

55

56

// Create CPD lexer with configuration

57

LanguagePropertyBundle properties = // ... PMD configuration

58

CpdLexer lexer = module.createCpdLexer(properties);

59

```

60

61

## Token Management

62

63

### JSP Token Handling

64

65

The CPD lexer uses the same token management as the main JSP parser:

66

67

```java { .api }

68

public final class JspTokenKinds {

69

public static final String[] TOKEN_NAMES;

70

public static TokenManager<JavaccToken> newTokenManager(CharStream cs);

71

}

72

```

73

74

### Token Behavior

75

76

```java { .api }

77

public final class InternalApiBridge {

78

@InternalApi

79

public static JavaccTokenDocument.TokenDocumentBehavior getJspTokenBehavior();

80

}

81

```

82

83

**Note:** `InternalApiBridge` provides access to token behavior configuration but is marked as internal API.

84

85

## CPD Analysis Process

86

87

### Tokenization Process

88

89

1. **Document Input**: JSP files are provided as `TextDocument` instances

90

2. **Token Generation**: `JspCpdLexer` creates tokens representing JSP constructs

91

3. **Token Filtering**: Tokens are processed to identify meaningful code blocks

92

4. **Duplicate Detection**: PMD's CPD engine compares token sequences across files

93

5. **Report Generation**: Duplicate blocks are reported with file locations and similarity metrics

94

95

### Token Types

96

97

The lexer generates tokens for:

98

99

- **HTML Elements**: Tags, attributes, and content

100

- **JSP Directives**: Page directives, includes, taglib declarations

101

- **JSP Actions**: JSP expressions, scriptlets, declarations

102

- **Expression Language**: EL expressions and JSF value bindings

103

- **Comments**: Both HTML and JSP comments

104

- **Text Content**: Plain text and CDATA sections

105

106

## CPD Configuration

107

108

### File Extensions

109

110

CPD automatically processes files with JSP-related extensions:

111

112

- `.jsp`: JavaServer Pages

113

- `.jspx`: JSP XML format

114

- `.jspf`: JSP fragment files

115

- `.tag`: JSP tag files

116

117

### Command Line Usage

118

119

```bash

120

# Run CPD on JSP files

121

pmd cpd --minimum-tokens 50 --language jsp --dir src/main/webapp

122

123

# Include JSP files in multi-language analysis

124

pmd cpd --minimum-tokens 50 --language jsp,java --dir src

125

```

126

127

### Programmatic Usage

128

129

```java

130

import net.sourceforge.pmd.cpd.CPD;

131

import net.sourceforge.pmd.cpd.CPDConfiguration;

132

import net.sourceforge.pmd.lang.jsp.JspLanguageModule;

133

134

// Configure CPD for JSP analysis

135

CPDConfiguration config = new CPDConfiguration();

136

config.setMinimumTileSize(50);

137

config.setLanguage(JspLanguageModule.getInstance());

138

139

// Create and run CPD

140

CPD cpd = new CPD(config);

141

cpd.addAllInDirectory("src/main/webapp");

142

cpd.go();

143

144

// Process results

145

cpd.getMatches().forEach(match -> {

146

System.out.println("Duplicate found:");

147

System.out.println(" Lines: " + match.getLineCount());

148

System.out.println(" Tokens: " + match.getTokenCount());

149

match.getMarkSet().forEach(mark -> {

150

System.out.println(" File: " + mark.getFilename() +

151

" at line " + mark.getBeginLine());

152

});

153

});

154

```

155

156

## Duplicate Detection Examples

157

158

### Common JSP Duplicates

159

160

**Duplicate JSP Expressions:**

161

```jsp

162

<!-- File 1 -->

163

<%= request.getAttribute("userName") %>

164

165

<!-- File 2 -->

166

<%= request.getAttribute("userName") %>

167

```

168

169

**Duplicate Element Structures:**

170

```jsp

171

<!-- File 1 -->

172

<div class="form-group">

173

<label for="email">Email:</label>

174

<input type="email" id="email" name="email" required>

175

</div>

176

177

<!-- File 2 -->

178

<div class="form-group">

179

<label for="email">Email:</label>

180

<input type="email" id="email" name="email" required>

181

</div>

182

```

183

184

**Duplicate Scriptlet Blocks:**

185

```jsp

186

<!-- File 1 -->

187

<%

188

String userName = (String) session.getAttribute("user");

189

if (userName == null) {

190

response.sendRedirect("login.jsp");

191

return;

192

}

193

%>

194

195

<!-- File 2 -->

196

<%

197

String userName = (String) session.getAttribute("user");

198

if (userName == null) {

199

response.sendRedirect("login.jsp");

200

return;

201

}

202

%>

203

```

204

205

## Advanced CPD Features

206

207

### Custom Token Filtering

208

209

```java

210

import net.sourceforge.pmd.cpd.CpdLexer;

211

import net.sourceforge.pmd.cpd.TokenEntry;

212

213

public class CustomJspCpdLexer extends JspCpdLexer {

214

215

@Override

216

protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc) {

217

TokenManager<JavaccToken> tokenManager = super.makeLexerImpl(doc);

218

219

// Apply custom filtering logic

220

return new FilteringTokenManager(tokenManager);

221

}

222

223

private static class FilteringTokenManager implements TokenManager<JavaccToken> {

224

private final TokenManager<JavaccToken> delegate;

225

226

public FilteringTokenManager(TokenManager<JavaccToken> delegate) {

227

this.delegate = delegate;

228

}

229

230

@Override

231

public JavaccToken getNextToken() {

232

JavaccToken token = delegate.getNextToken();

233

234

// Skip whitespace-only text tokens

235

while (token != null && isWhitespaceOnlyText(token)) {

236

token = delegate.getNextToken();

237

}

238

239

return token;

240

}

241

242

private boolean isWhitespaceOnlyText(JavaccToken token) {

243

return token.getImage().trim().isEmpty();

244

}

245

}

246

}

247

```

248

249

### Integration with Build Tools

250

251

**Maven Integration:**

252

```xml

253

<plugin>

254

<groupId>com.github.spotbugs</groupId>

255

<artifactId>spotbugs-maven-plugin</artifactId>

256

<configuration>

257

<includeTests>false</includeTests>

258

<languages>

259

<language>jsp</language>

260

<language>java</language>

261

</languages>

262

<minimumTokens>50</minimumTokens>

263

</configuration>

264

</plugin>

265

```

266

267

**Gradle Integration:**

268

```gradle

269

plugins {

270

id 'pmd'

271

}

272

273

pmd {

274

consoleOutput = true

275

toolVersion = "7.13.0"

276

ruleSetFiles = files("config/pmd/jsp-cpd-rules.xml")

277

}

278

279

task cpdJsp(type: JavaExec) {

280

main = "net.sourceforge.pmd.cpd.CPD"

281

classpath = configurations.pmd

282

args = [

283

"--minimum-tokens", "50",

284

"--language", "jsp",

285

"--dir", "src/main/webapp",

286

"--format", "text"

287

]

288

}

289

```

290

291

## CPD Reporting

292

293

### Report Formats

294

295

CPD supports multiple output formats for JSP duplicate detection:

296

297

- **Text**: Human-readable console output

298

- **XML**: Structured XML for tool integration

299

- **CSV**: Comma-separated values for spreadsheet analysis

300

- **JSON**: JSON format for programmatic processing

301

- **HTML**: Web-viewable reports with syntax highlighting

302

303

### Custom Report Processing

304

305

```java

306

import net.sourceforge.pmd.cpd.Match;

307

import net.sourceforge.pmd.cpd.Mark;

308

309

public class JspDuplicateAnalyzer {

310

311

public void analyzeDuplicates(List<Match> matches) {

312

for (Match match : matches) {

313

System.out.println("Duplicate Block:");

314

System.out.println(" Size: " + match.getTokenCount() + " tokens, " +

315

match.getLineCount() + " lines");

316

317

for (Mark mark : match.getMarkSet()) {

318

System.out.println(" Location: " + mark.getFilename() +

319

":" + mark.getBeginLine() + "-" + mark.getEndLine());

320

321

// Analyze JSP-specific patterns

322

if (mark.getFilename().endsWith(".jsp")) {

323

analyzeJspDuplicate(mark, match);

324

}

325

}

326

}

327

}

328

329

private void analyzeJspDuplicate(Mark mark, Match match) {

330

// Custom analysis for JSP duplicates

331

if (match.getSourceCodeSlice().contains("<%=")) {

332

System.out.println(" Contains JSP expressions");

333

}

334

if (match.getSourceCodeSlice().contains("${")) {

335

System.out.println(" Contains EL expressions");

336

}

337

if (match.getSourceCodeSlice().contains("<%@")) {

338

System.out.println(" Contains JSP directives");

339

}

340

}

341

}

342

```

343

344

## Performance Considerations

345

346

### Large JSP File Handling

347

348

For large JSP applications:

349

350

1. **Increase Minimum Token Count**: Use higher values (100-200) to focus on significant duplicates

351

2. **Directory Filtering**: Exclude generated JSP files and third-party libraries

352

3. **Parallel Processing**: Use CPD's built-in parallel processing for large codebases

353

4. **Memory Configuration**: Increase JVM heap size for very large projects

354

355

### Optimization Tips

356

357

```java

358

// Configure CPD for optimal JSP analysis

359

CPDConfiguration config = new CPDConfiguration();

360

config.setMinimumTileSize(100); // Higher threshold for large projects

361

config.setSkipDuplicateFiles(true); // Skip identical files

362

config.setIgnoreIdentifiers(false); // Keep identifier sensitivity for JSP

363

config.setIgnoreLiterals(true); // Ignore string/numeric literal differences

364

```

365

366

The CPD integration provides comprehensive duplicate detection for JSP files, helping maintain code quality and identify refactoring opportunities in JSP-based web applications.