or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-net-sourceforge-pmd--pmd-cpp

C++ language module for PMD's Copy-Paste Detector providing lexical analysis and tokenization support for C++ source code

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/net.sourceforge.pmd/pmd-cpp@7.13.x

To install, run

npx @tessl/cli install tessl/maven-net-sourceforge-pmd--pmd-cpp@7.13.0

0

# PMD C++ Language Module

1

2

PMD C++ Language Module provides C++ language support for PMD's Copy-Paste Detector (CPD). This library enables duplicate code detection in C++ source files by providing lexical analysis and tokenization capabilities specifically tailored for C++ syntax, including preprocessor directive handling and configurable code block filtering.

3

4

## Package Information

5

6

- **Package Name**: pmd-cpp

7

- **Package Type**: maven

8

- **Language**: Java

9

- **Group ID**: net.sourceforge.pmd

10

- **Artifact ID**: pmd-cpp

11

- **Installation**: Add as Maven dependency:

12

13

```xml

14

<dependency>

15

<groupId>net.sourceforge.pmd</groupId>

16

<artifactId>pmd-cpp</artifactId>

17

<version>7.13.0</version>

18

</dependency>

19

```

20

21

## Core Imports

22

23

```java

24

import net.sourceforge.pmd.lang.cpp.CppLanguageModule;

25

import net.sourceforge.pmd.lang.cpp.cpd.CppCpdLexer;

26

import net.sourceforge.pmd.lang.cpp.cpd.CppEscapeTranslator;

27

import net.sourceforge.pmd.cpd.CpdLexer;

28

import net.sourceforge.pmd.lang.LanguagePropertyBundle;

29

import net.sourceforge.pmd.cpd.CpdLanguageProperties;

30

```

31

32

## Basic Usage

33

34

```java

35

import net.sourceforge.pmd.lang.cpp.CppLanguageModule;

36

import net.sourceforge.pmd.cpd.CpdLexer;

37

import net.sourceforge.pmd.lang.LanguagePropertyBundle;

38

39

// Get the C++ language module instance

40

CppLanguageModule cppModule = CppLanguageModule.getInstance();

41

42

// Create a property bundle with default settings

43

LanguagePropertyBundle properties = cppModule.newPropertyBundle();

44

45

// Create a CPD lexer for C++ tokenization

46

CpdLexer lexer = cppModule.createCpdLexer(properties);

47

48

// The lexer can now be used to tokenize C++ source files for duplicate detection

49

```

50

51

## Architecture

52

53

The PMD C++ module is built around several key components:

54

55

- **CppLanguageModule**: Main entry point that extends PMD's language module framework

56

- **CppCpdLexer**: Core tokenizer that processes C++ source code into tokens for CPD analysis

57

- **CppEscapeTranslator**: Processes C++ backslash line continuations according to C++ standards

58

- **Internal Components**: Block skipping and preprocessing functionality handled internally

59

- **Token Processing**: Supports comprehensive C++ token recognition including literals, identifiers, operators, and comments

60

61

## Capabilities

62

63

### Language Module Management

64

65

Core language module functionality for registering and configuring C++ support within the PMD framework.

66

67

```java { .api }

68

public class CppLanguageModule extends CpdOnlyLanguageModuleBase {

69

public CppLanguageModule();

70

public static CppLanguageModule getInstance();

71

public LanguagePropertyBundle newPropertyBundle();

72

public CpdLexer createCpdLexer(LanguagePropertyBundle bundle);

73

}

74

```

75

76

**Properties**:

77

78

```java { .api }

79

public static final PropertyDescriptor<String> CPD_SKIP_BLOCKS;

80

```

81

82

### C++ Tokenization

83

84

Advanced C++ lexical analysis with support for preprocessor directives, line continuations, and configurable filtering.

85

86

```java { .api }

87

public class CppCpdLexer extends JavaccCpdLexer {

88

public CppCpdLexer(LanguagePropertyBundle cppProperties);

89

protected TokenManager<JavaccToken> makeLexerImpl(TextDocument doc);

90

protected TokenManager<JavaccToken> filterTokenStream(TokenManager<JavaccToken> tokenManager);

91

protected void processToken(TokenFactory tokenEntries, JavaccToken currentToken);

92

}

93

```

94

95

### Escape Processing

96

97

Handles C++ backslash line continuation processing according to C++ language standards.

98

99

```java { .api }

100

public class CppEscapeTranslator extends BackslashEscapeTranslator {

101

public CppEscapeTranslator(TextDocument input);

102

protected int handleBackslash(int maxOff, int backSlashOff);

103

}

104

```

105

106

107

## Configuration Properties

108

109

The module supports several CPD configuration properties that can be set via the LanguagePropertyBundle:

110

111

### Skip Blocks Configuration

112

113

```java { .api }

114

// Skip code blocks matching start/end patterns (pipe-separated)

115

CppLanguageModule.CPD_SKIP_BLOCKS

116

```

117

118

**Default**: Skips conditionally compiled code (`#if 0|#endif`)

119

120

**Usage**: Set to empty string to disable, or provide custom start|end patterns

121

122

### Standard CPD Properties

123

124

```java { .api }

125

// Ignore literal sequences in duplicate detection

126

CpdLanguageProperties.CPD_IGNORE_LITERAL_SEQUENCES

127

128

// Ignore literal and identifier sequences

129

CpdLanguageProperties.CPD_IGNORE_LITERAL_AND_IDENTIFIER_SEQUENCES

130

131

// Replace identifiers with generic placeholders

132

CpdLanguageProperties.CPD_ANONYMIZE_IDENTIFIERS

133

134

// Replace literals with generic placeholders

135

CpdLanguageProperties.CPD_ANONYMIZE_LITERALS

136

```

137

138

## Supported File Extensions

139

140

The C++ language module automatically recognizes the following file extensions:

141

142

- `.h` - C/C++ header files

143

- `.hpp` - C++ header files

144

- `.hxx` - C++ header files

145

- `.c` - C source files

146

- `.cpp` - C++ source files

147

- `.cxx` - C++ source files

148

- `.cc` - C++ source files

149

- `.C` - C++ source files

150

151

## Token Types

152

153

The lexer recognizes and processes the following C++ token categories:

154

155

### Literals

156

- **String literals**: Regular strings and raw strings

157

- **Character literals**: Single character constants

158

- **Numeric literals**: Decimal, hexadecimal, octal, binary integers and floating-point numbers

159

160

### Identifiers and Keywords

161

- **Identifiers**: Variable, function, and type names

162

- **C++ Keywords**: All standard C++ language keywords

163

164

### Operators and Punctuation

165

- **Arithmetic operators**: `+`, `-`, `*`, `/`, `%`, etc.

166

- **Logical operators**: `&&`, `||`, `!`, etc.

167

- **Comparison operators**: `==`, `!=`, `<`, `>`, `<=`, `>=`

168

- **Assignment operators**: `=`, `+=`, `-=`, etc.

169

- **Punctuation**: `{`, `}`, `(`, `)`, `[`, `]`, `;`, `,`, etc.

170

171

### Comments

172

- **Single-line comments**: `// comment text`

173

- **Multi-line comments**: `/* comment text */`

174

- **Preprocessor comments**: Comments within preprocessor directives

175

176

## Advanced Features

177

178

### Preprocessor Directive Handling

179

- Automatic skipping of preprocessor output

180

- Configurable conditional compilation block skipping

181

- Support for line continuation handling in preprocessor directives

182

183

### Line Continuation Processing

184

- Proper handling of backslash-newline sequences

185

- Support for both Unix (`\n`) and Windows (`\r\n`) line endings

186

- Maintains source location accuracy after escape processing

187

188

### Token Filtering

189

- Configurable literal sequence filtering for improved duplicate detection

190

- Identifier anonymization for focusing on code structure

191

- Balanced brace handling in complex expressions

192

- Smart sequence detection for array/object initializers

193

194

### Error Handling

195

The module provides robust error handling for:

196

- Malformed C++ source files

197

- Invalid preprocessor directives

198

- Unterminated comments or strings

199

- Character encoding issues

200

201

Errors are reported through PMD's standard error reporting mechanisms with accurate source location information.