or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/maven-net-sourceforge-pmd--pmd-cpp

C++ language module for PMD's Copy-Paste Detector providing lexical analysis and tokenization support for C++ source code

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/net.sourceforge.pmd/pmd-cpp@7.13.x

To install, run

npx @tessl/cli install tessl/maven-net-sourceforge-pmd--pmd-cpp@7.13.0

index.mddocs/

PMD C++ Language Module

PMD C++ Language Module provides C++ language support for PMD's Copy-Paste Detector (CPD). This library enables duplicate code detection in C++ source files by providing lexical analysis and tokenization capabilities specifically tailored for C++ syntax, including preprocessor directive handling and configurable code block filtering.

Package Information

  • Package Name: pmd-cpp
  • Package Type: maven
  • Language: Java
  • Group ID: net.sourceforge.pmd
  • Artifact ID: pmd-cpp
  • Installation: Add as Maven dependency:
<dependency>
    <groupId>net.sourceforge.pmd</groupId>
    <artifactId>pmd-cpp</artifactId>
    <version>7.13.0</version>
</dependency>

Core Imports

import net.sourceforge.pmd.lang.cpp.CppLanguageModule;
import net.sourceforge.pmd.lang.cpp.cpd.CppCpdLexer;
import net.sourceforge.pmd.lang.cpp.cpd.CppEscapeTranslator;
import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
import net.sourceforge.pmd.cpd.CpdLanguageProperties;

Basic Usage

import net.sourceforge.pmd.lang.cpp.CppLanguageModule;
import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.LanguagePropertyBundle;

// Get the C++ language module instance
CppLanguageModule cppModule = CppLanguageModule.getInstance();

// Create a property bundle with default settings
LanguagePropertyBundle properties = cppModule.newPropertyBundle();

// Create a CPD lexer for C++ tokenization
CpdLexer lexer = cppModule.createCpdLexer(properties);

// The lexer can now be used to tokenize C++ source files for duplicate detection

Architecture

The PMD C++ module is built around several key components:

  • CppLanguageModule: Main entry point that extends PMD's language module framework
  • CppCpdLexer: Core tokenizer that processes C++ source code into tokens for CPD analysis
  • CppEscapeTranslator: Processes C++ backslash line continuations according to C++ standards
  • Internal Components: Block skipping and preprocessing functionality handled internally
  • Token Processing: Supports comprehensive C++ token recognition including literals, identifiers, operators, and comments

Capabilities

Language Module Management

Core language module functionality for registering and configuring C++ support within the PMD framework.

public class CppLanguageModule extends CpdOnlyLanguageModuleBase {
    public CppLanguageModule();
    public static CppLanguageModule getInstance();
    public LanguagePropertyBundle newPropertyBundle();
    public CpdLexer createCpdLexer(LanguagePropertyBundle bundle);
}

Properties:

public static final PropertyDescriptor<String> CPD_SKIP_BLOCKS;

C++ Tokenization

Advanced C++ lexical analysis with support for preprocessor directives, line continuations, and configurable filtering.

public class CppCpdLexer extends JavaccCpdLexer {
    public CppCpdLexer(LanguagePropertyBundle cppProperties);
    protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc);
    protected TokenManager<JavaccToken> filterTokenStream(TokenManager<JavaccToken> tokenManager);
    protected void processToken(TokenFactory tokenEntries, JavaccToken currentToken);
}

Escape Processing

Handles C++ backslash line continuation processing according to C++ language standards.

public class CppEscapeTranslator extends BackslashEscapeTranslator {
    public CppEscapeTranslator(TextDocument input);
    protected int handleBackslash(int maxOff, int backSlashOff);
}

Configuration Properties

The module supports several CPD configuration properties that can be set via the LanguagePropertyBundle:

Skip Blocks Configuration

// Skip code blocks matching start/end patterns (pipe-separated)
CppLanguageModule.CPD_SKIP_BLOCKS

Default: Skips conditionally compiled code (#if 0|#endif)

Usage: Set to empty string to disable, or provide custom start|end patterns

Standard CPD Properties

// Ignore literal sequences in duplicate detection
CpdLanguageProperties.CPD_IGNORE_LITERAL_SEQUENCES

// Ignore literal and identifier sequences  
CpdLanguageProperties.CPD_IGNORE_LITERAL_AND_IDENTIFIER_SEQUENCES

// Replace identifiers with generic placeholders
CpdLanguageProperties.CPD_ANONYMIZE_IDENTIFIERS

// Replace literals with generic placeholders
CpdLanguageProperties.CPD_ANONYMIZE_LITERALS

Supported File Extensions

The C++ language module automatically recognizes the following file extensions:

  • .h - C/C++ header files
  • .hpp - C++ header files
  • .hxx - C++ header files
  • .c - C source files
  • .cpp - C++ source files
  • .cxx - C++ source files
  • .cc - C++ source files
  • .C - C++ source files

Token Types

The lexer recognizes and processes the following C++ token categories:

Literals

  • String literals: Regular strings and raw strings
  • Character literals: Single character constants
  • Numeric literals: Decimal, hexadecimal, octal, binary integers and floating-point numbers

Identifiers and Keywords

  • Identifiers: Variable, function, and type names
  • C++ Keywords: All standard C++ language keywords

Operators and Punctuation

  • Arithmetic operators: +, -, *, /, %, etc.
  • Logical operators: &&, ||, !, etc.
  • Comparison operators: ==, !=, <, >, <=, >=
  • Assignment operators: =, +=, -=, etc.
  • Punctuation: {, }, (, ), [, ], ;, ,, etc.

Comments

  • Single-line comments: // comment text
  • Multi-line comments: /* comment text */
  • Preprocessor comments: Comments within preprocessor directives

Advanced Features

Preprocessor Directive Handling

  • Automatic skipping of preprocessor output
  • Configurable conditional compilation block skipping
  • Support for line continuation handling in preprocessor directives

Line Continuation Processing

  • Proper handling of backslash-newline sequences
  • Support for both Unix (\n) and Windows (\r\n) line endings
  • Maintains source location accuracy after escape processing

Token Filtering

  • Configurable literal sequence filtering for improved duplicate detection
  • Identifier anonymization for focusing on code structure
  • Balanced brace handling in complex expressions
  • Smart sequence detection for array/object initializers

Error Handling

The module provides robust error handling for:

  • Malformed C++ source files
  • Invalid preprocessor directives
  • Unterminated comments or strings
  • Character encoding issues

Errors are reported through PMD's standard error reporting mechanisms with accurate source location information.