CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-net-sourceforge-pmd--pmd-cpp

C++ language module for PMD's Copy-Paste Detector providing lexical analysis and tokenization support for C++ source code

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

index.mddocs/

PMD C++ Language Module

PMD C++ Language Module provides C++ language support for PMD's Copy-Paste Detector (CPD). This library enables duplicate code detection in C++ source files by providing lexical analysis and tokenization capabilities specifically tailored for C++ syntax, including preprocessor directive handling and configurable code block filtering.

Package Information

  • Package Name: pmd-cpp
  • Package Type: maven
  • Language: Java
  • Group ID: net.sourceforge.pmd
  • Artifact ID: pmd-cpp
  • Installation: Add as Maven dependency:
<dependency>
    <groupId>net.sourceforge.pmd</groupId>
    <artifactId>pmd-cpp</artifactId>
    <version>7.13.0</version>
</dependency>

Core Imports

import net.sourceforge.pmd.lang.cpp.CppLanguageModule;
import net.sourceforge.pmd.lang.cpp.cpd.CppCpdLexer;
import net.sourceforge.pmd.lang.cpp.cpd.CppEscapeTranslator;
import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
import net.sourceforge.pmd.cpd.CpdLanguageProperties;

Basic Usage

import net.sourceforge.pmd.lang.cpp.CppLanguageModule;
import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.LanguagePropertyBundle;

// Get the C++ language module instance
CppLanguageModule cppModule = CppLanguageModule.getInstance();

// Create a property bundle with default settings
LanguagePropertyBundle properties = cppModule.newPropertyBundle();

// Create a CPD lexer for C++ tokenization
CpdLexer lexer = cppModule.createCpdLexer(properties);

// The lexer can now be used to tokenize C++ source files for duplicate detection

Architecture

The PMD C++ module is built around several key components:

  • CppLanguageModule: Main entry point that extends PMD's language module framework
  • CppCpdLexer: Core tokenizer that processes C++ source code into tokens for CPD analysis
  • CppEscapeTranslator: Processes C++ backslash line continuations according to C++ standards
  • Internal Components: Block skipping and preprocessing functionality handled internally
  • Token Processing: Supports comprehensive C++ token recognition including literals, identifiers, operators, and comments

Capabilities

Language Module Management

Core language module functionality for registering and configuring C++ support within the PMD framework.

public class CppLanguageModule extends CpdOnlyLanguageModuleBase {
    public CppLanguageModule();
    public static CppLanguageModule getInstance();
    public LanguagePropertyBundle newPropertyBundle();
    public CpdLexer createCpdLexer(LanguagePropertyBundle bundle);
}

Properties:

public static final PropertyDescriptor<String> CPD_SKIP_BLOCKS;

C++ Tokenization

Advanced C++ lexical analysis with support for preprocessor directives, line continuations, and configurable filtering.

public class CppCpdLexer extends JavaccCpdLexer {
    public CppCpdLexer(LanguagePropertyBundle cppProperties);
    protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc);
    protected TokenManager<JavaccToken> filterTokenStream(TokenManager<JavaccToken> tokenManager);
    protected void processToken(TokenFactory tokenEntries, JavaccToken currentToken);
}

Escape Processing

Handles C++ backslash line continuation processing according to C++ language standards.

public class CppEscapeTranslator extends BackslashEscapeTranslator {
    public CppEscapeTranslator(TextDocument input);
    protected int handleBackslash(int maxOff, int backSlashOff);
}

Configuration Properties

The module supports several CPD configuration properties that can be set via the LanguagePropertyBundle:

Skip Blocks Configuration

// Skip code blocks matching start/end patterns (pipe-separated)
CppLanguageModule.CPD_SKIP_BLOCKS

Default: Skips conditionally compiled code (#if 0|#endif)

Usage: Set to empty string to disable, or provide custom start|end patterns

Standard CPD Properties

// Ignore literal sequences in duplicate detection
CpdLanguageProperties.CPD_IGNORE_LITERAL_SEQUENCES

// Ignore literal and identifier sequences  
CpdLanguageProperties.CPD_IGNORE_LITERAL_AND_IDENTIFIER_SEQUENCES

// Replace identifiers with generic placeholders
CpdLanguageProperties.CPD_ANONYMIZE_IDENTIFIERS

// Replace literals with generic placeholders
CpdLanguageProperties.CPD_ANONYMIZE_LITERALS

Supported File Extensions

The C++ language module automatically recognizes the following file extensions:

  • .h - C/C++ header files
  • .hpp - C++ header files
  • .hxx - C++ header files
  • .c - C source files
  • .cpp - C++ source files
  • .cxx - C++ source files
  • .cc - C++ source files
  • .C - C++ source files

Token Types

The lexer recognizes and processes the following C++ token categories:

Literals

  • String literals: Regular strings and raw strings
  • Character literals: Single character constants
  • Numeric literals: Decimal, hexadecimal, octal, binary integers and floating-point numbers

Identifiers and Keywords

  • Identifiers: Variable, function, and type names
  • C++ Keywords: All standard C++ language keywords

Operators and Punctuation

  • Arithmetic operators: +, -, *, /, %, etc.
  • Logical operators: &&, ||, !, etc.
  • Comparison operators: ==, !=, <, >, <=, >=
  • Assignment operators: =, +=, -=, etc.
  • Punctuation: {, }, (, ), [, ], ;, ,, etc.

Comments

  • Single-line comments: // comment text
  • Multi-line comments: /* comment text */
  • Preprocessor comments: Comments within preprocessor directives

Advanced Features

Preprocessor Directive Handling

  • Automatic skipping of preprocessor output
  • Configurable conditional compilation block skipping
  • Support for line continuation handling in preprocessor directives

Line Continuation Processing

  • Proper handling of backslash-newline sequences
  • Support for both Unix (\n) and Windows (\r\n) line endings
  • Maintains source location accuracy after escape processing

Token Filtering

  • Configurable literal sequence filtering for improved duplicate detection
  • Identifier anonymization for focusing on code structure
  • Balanced brace handling in complex expressions
  • Smart sequence detection for array/object initializers

Error Handling

The module provides robust error handling for:

  • Malformed C++ source files
  • Invalid preprocessor directives
  • Unterminated comments or strings
  • Character encoding issues

Errors are reported through PMD's standard error reporting mechanisms with accurate source location information.

docs

index.md

tile.json