tessl/maven-net-sourceforge-pmd--pmd-jsp

PMD JSP language module providing static code analysis capabilities for JavaServer Pages files with lexical analysis, AST parsing, and rule-based code quality checks.

—

Pending

Overview

Eval results

Files

Copy-Paste Detection

Name: tessl/maven-net-sourceforge-pmd--pmd-jsp
Author: tessl

The PMD JSP module provides copy-paste detection (CPD) capabilities for JSP files through a specialized lexer that tokenizes JSP content for duplicate code analysis.

CPD Lexer

JspCpdLexer

Tokenizes JSP files for PMD's Copy-Paste Detector to identify duplicate code blocks.

public class JspCpdLexer extends JavaccCpdLexer {
    protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc);
}

Usage:

import net.sourceforge.pmd.lang.jsp.cpd.JspCpdLexer;
import net.sourceforge.pmd.lang.document.TextDocument;
import net.sourceforge.pmd.cpd.CpdLexer;

// Create CPD lexer for JSP files
CpdLexer lexer = new JspCpdLexer();

// The lexer is typically used by PMD's CPD framework automatically
// when processing JSP files for duplicate detection

Methods:

makeLexerImpl(TextDocument): Creates a token manager for the given JSP document

Integration with PMD CPD

Language Module Integration

The JSP language module integrates CPD support through the CpdCapableLanguage interface:

public class JspLanguageModule extends SimpleLanguageModuleBase implements CpdCapableLanguage {
    public CpdLexer createCpdLexer(LanguagePropertyBundle bundle);
}

Usage:

import net.sourceforge.pmd.lang.jsp.JspLanguageModule;
import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.LanguagePropertyBundle;

// Get language module
JspLanguageModule module = JspLanguageModule.getInstance();

// Create CPD lexer with configuration
LanguagePropertyBundle properties = // ... PMD configuration
CpdLexer lexer = module.createCpdLexer(properties);

Token Management

JSP Token Handling

The CPD lexer uses the same token management as the main JSP parser:

public final class JspTokenKinds {
    public static final String[] TOKEN_NAMES;
    public static TokenManager<JavaccToken> newTokenManager(CharStream cs);
}

Token Behavior

public final class InternalApiBridge {
    @InternalApi
    public static JavaccTokenDocument.TokenDocumentBehavior getJspTokenBehavior();
}

Note: InternalApiBridge provides access to token behavior configuration but is marked as internal API.

CPD Analysis Process

Tokenization Process

Document Input: JSP files are provided as TextDocument instances
Token Generation: JspCpdLexer creates tokens representing JSP constructs
Token Filtering: Tokens are processed to identify meaningful code blocks
Duplicate Detection: PMD's CPD engine compares token sequences across files
Report Generation: Duplicate blocks are reported with file locations and similarity metrics

Token Types

The lexer generates tokens for:

HTML Elements: Tags, attributes, and content
JSP Directives: Page directives, includes, taglib declarations
JSP Actions: JSP expressions, scriptlets, declarations
Expression Language: EL expressions and JSF value bindings
Comments: Both HTML and JSP comments
Text Content: Plain text and CDATA sections

CPD Configuration

File Extensions

CPD automatically processes files with JSP-related extensions:

.jsp: JavaServer Pages
.jspx: JSP XML format
.jspf: JSP fragment files
.tag: JSP tag files

Command Line Usage

# Run CPD on JSP files
pmd cpd --minimum-tokens 50 --language jsp --dir src/main/webapp

# Include JSP files in multi-language analysis
pmd cpd --minimum-tokens 50 --language jsp,java --dir src

Programmatic Usage

import net.sourceforge.pmd.cpd.CPD;
import net.sourceforge.pmd.cpd.CPDConfiguration;
import net.sourceforge.pmd.lang.jsp.JspLanguageModule;

// Configure CPD for JSP analysis
CPDConfiguration config = new CPDConfiguration();
config.setMinimumTileSize(50);
config.setLanguage(JspLanguageModule.getInstance());

// Create and run CPD
CPD cpd = new CPD(config);
cpd.addAllInDirectory("src/main/webapp");
cpd.go();

// Process results
cpd.getMatches().forEach(match -> {
    System.out.println("Duplicate found:");
    System.out.println("  Lines: " + match.getLineCount());
    System.out.println("  Tokens: " + match.getTokenCount());
    match.getMarkSet().forEach(mark -> {
        System.out.println("  File: " + mark.getFilename() + 
                          " at line " + mark.getBeginLine());
    });
});

Duplicate Detection Examples

Common JSP Duplicates

Duplicate JSP Expressions:

<!-- File 1 -->
<%= request.getAttribute("userName") %>

<!-- File 2 -->  
<%= request.getAttribute("userName") %>

Duplicate Element Structures:

<!-- File 1 -->
<div class="form-group">
    <label for="email">Email:</label>
    <input type="email" id="email" name="email" required>
</div>

<!-- File 2 -->
<div class="form-group">
    <label for="email">Email:</label>
    <input type="email" id="email" name="email" required>
</div>

Duplicate Scriptlet Blocks:

<!-- File 1 -->
<%
    String userName = (String) session.getAttribute("user");
    if (userName == null) {
        response.sendRedirect("login.jsp");
        return;
    }
%>

<!-- File 2 -->
<%
    String userName = (String) session.getAttribute("user");
    if (userName == null) {
        response.sendRedirect("login.jsp");
        return;
    }
%>

Advanced CPD Features

Custom Token Filtering

import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.cpd.TokenEntry;

public class CustomJspCpdLexer extends JspCpdLexer {
    
    @Override
    protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc) {
        TokenManager<JavaccToken> tokenManager = super.makeLexerImpl(doc);
        
        // Apply custom filtering logic
        return new FilteringTokenManager(tokenManager);
    }
    
    private static class FilteringTokenManager implements TokenManager<JavaccToken> {
        private final TokenManager<JavaccToken> delegate;
        
        public FilteringTokenManager(TokenManager<JavaccToken> delegate) {
            this.delegate = delegate;
        }
        
        @Override
        public JavaccToken getNextToken() {
            JavaccToken token = delegate.getNextToken();
            
            // Skip whitespace-only text tokens
            while (token != null && isWhitespaceOnlyText(token)) {
                token = delegate.getNextToken();
            }
            
            return token;
        }
        
        private boolean isWhitespaceOnlyText(JavaccToken token) {
            return token.getImage().trim().isEmpty();
        }
    }
}

Integration with Build Tools

Maven Integration:

<plugin>
    <groupId>com.github.spotbugs</groupId>
    <artifactId>spotbugs-maven-plugin</artifactId>
    <configuration>
        <includeTests>false</includeTests>
        <languages>
            <language>jsp</language>
            <language>java</language>
        </languages>
        <minimumTokens>50</minimumTokens>
    </configuration>
</plugin>

Gradle Integration:

plugins {
    id 'pmd'
}

pmd {
    consoleOutput = true
    toolVersion = "7.13.0"
    ruleSetFiles = files("config/pmd/jsp-cpd-rules.xml")
}

task cpdJsp(type: JavaExec) {
    main = "net.sourceforge.pmd.cpd.CPD"
    classpath = configurations.pmd
    args = [
        "--minimum-tokens", "50",
        "--language", "jsp", 
        "--dir", "src/main/webapp",
        "--format", "text"
    ]
}

CPD Reporting

Report Formats

CPD supports multiple output formats for JSP duplicate detection:

Text: Human-readable console output
XML: Structured XML for tool integration
CSV: Comma-separated values for spreadsheet analysis
JSON: JSON format for programmatic processing
HTML: Web-viewable reports with syntax highlighting

Custom Report Processing

import net.sourceforge.pmd.cpd.Match;
import net.sourceforge.pmd.cpd.Mark;

public class JspDuplicateAnalyzer {
    
    public void analyzeDuplicates(List<Match> matches) {
        for (Match match : matches) {
            System.out.println("Duplicate Block:");
            System.out.println("  Size: " + match.getTokenCount() + " tokens, " + 
                              match.getLineCount() + " lines");
            
            for (Mark mark : match.getMarkSet()) {
                System.out.println("  Location: " + mark.getFilename() + 
                                  ":" + mark.getBeginLine() + "-" + mark.getEndLine());
                
                // Analyze JSP-specific patterns
                if (mark.getFilename().endsWith(".jsp")) {
                    analyzeJspDuplicate(mark, match);
                }
            }
        }
    }
    
    private void analyzeJspDuplicate(Mark mark, Match match) {
        // Custom analysis for JSP duplicates
        if (match.getSourceCodeSlice().contains("<%=")) {
            System.out.println("    Contains JSP expressions");
        }
        if (match.getSourceCodeSlice().contains("${")) {
            System.out.println("    Contains EL expressions");
        }
        if (match.getSourceCodeSlice().contains("<%@")) {
            System.out.println("    Contains JSP directives");
        }
    }
}

Performance Considerations

Large JSP File Handling

For large JSP applications:

Increase Minimum Token Count: Use higher values (100-200) to focus on significant duplicates
Directory Filtering: Exclude generated JSP files and third-party libraries
Parallel Processing: Use CPD's built-in parallel processing for large codebases
Memory Configuration: Increase JVM heap size for very large projects

Optimization Tips

// Configure CPD for optimal JSP analysis
CPDConfiguration config = new CPDConfiguration();
config.setMinimumTileSize(100);           // Higher threshold for large projects
config.setSkipDuplicateFiles(true);       // Skip identical files
config.setIgnoreIdentifiers(false);       // Keep identifier sensitivity for JSP
config.setIgnoreLiterals(true);          // Ignore string/numeric literal differences

The CPD integration provides comprehensive duplicate detection for JSP files, helping maintain code quality and identify refactoring opportunities in JSP-based web applications.

Install with Tessl CLI

npx tessl i tessl/maven-net-sourceforge-pmd--pmd-jsp

docs

ast-node-types.md

copy-paste-detection.md

tessl/maven-net-sourceforge-pmd--pmd-jsp