Tessl Tile for maven/net.sourceforge.pmd/pmd-scala_2.12@6.55.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

ast-nodes.md cpd.md index.md language-parsing.md rule-development.md visitor-pattern.md

cpd.mddocs/

0
# Copy-Paste Detection
1

2
Scala tokenization support for PMD's copy-paste detection (CPD) system, providing language-specific tokenization and filtering for duplicate code analysis. The CPD system identifies duplicated code blocks across Scala source files.
3

4
## Capabilities
5

6
### Scala Language Registration
7

8
Registers Scala language support with PMD's copy-paste detection system.
9

10
```java { .api }
11
/**
12
 * Language implementation for Scala CPD
13
 */
14
public class ScalaLanguage extends AbstractLanguage {
15
    /**
16
     * Creates a new Scala Language instance for CPD
17
     * Registers with name "Scala", terse name "scala", and file extension ".scala"
18
     */
19
    public ScalaLanguage();
20
}
21
```
22

23
**Usage Examples:**
24

25
```java
26
// Language is automatically registered via SPI
27
// Located in META-INF/services/net.sourceforge.pmd.cpd.Language
28

29
// Manual instantiation (not typically needed)
30
ScalaLanguage scalaLang = new ScalaLanguage();
31
System.out.println("Language: " + scalaLang.getName()); // "Scala"
32
System.out.println("Extension: " + scalaLang.getExtension()); // ".scala"
33
```
34

35
### Scala Tokenizer
36

37
Main tokenizer implementation that processes Scala source code into tokens for duplicate detection.
38

39
```java { .api }
40
/**
41
 * Scala Tokenizer class. Uses the Scala Meta Tokenizer, but adapts it for use with generic filtering
42
 */
43
public class ScalaTokenizer implements Tokenizer {
44
    /**
45
     * Property key for specifying Scala version dialect
46
     * Valid values: "2.10", "2.11", "2.12", "2.13"
47
     */
48
    public static final String SCALA_VERSION_PROPERTY = "net.sourceforge.pmd.scala.version";
49
    
50
    /**
51
     * Create the Tokenizer using properties from the system environment
52
     * Uses default Scala version if SCALA_VERSION_PROPERTY not set
53
     */
54
    public ScalaTokenizer();
55
    
56
    /**
57
     * Create the Tokenizer given a set of properties
58
     * @param properties the Properties object containing configuration
59
     */
60
    public ScalaTokenizer(Properties properties);
61
    
62
    /**
63
     * Tokenize source code for copy-paste detection
64
     * @param sourceCode the source code to tokenize
65
     * @param tokenEntries output collection for generated tokens
66
     * @throws IOException if source reading fails
67
     */
68
    @Override
69
    public void tokenize(SourceCode sourceCode, Tokens tokenEntries) throws IOException;
70
}
71
```
72

73
**Usage Examples:**
74

75
```java
76
import net.sourceforge.pmd.cpd.*;
77
import java.util.Properties;
78

79
// Create tokenizer with default settings
80
ScalaTokenizer tokenizer = new ScalaTokenizer();
81

82
// Create tokenizer with specific Scala version
83
Properties props = new Properties();
84
props.setProperty(ScalaTokenizer.SCALA_VERSION_PROPERTY, "2.12");
85
ScalaTokenizer tokenizer212 = new ScalaTokenizer(props);
86

87
// Tokenize source code
88
SourceCode sourceCode = new SourceCode("HelloWorld.scala", "object HelloWorld { def main(args: Array[String]): Unit = println(\"Hello\") }");
89
Tokens tokens = new Tokens();
90

91
try {
92
    tokenizer.tokenize(sourceCode, tokens);
93
    
94
    // Process tokens for duplicate detection
95
    for (TokenEntry token : tokens.getTokens()) {
96
        System.out.println("Token: " + token.getValue() + 
97
                         " at line " + token.getBeginLine() + 
98
                         ", column " + token.getBeginColumn());
99
    }
100
} catch (IOException e) {
101
    System.err.println("Tokenization failed: " + e.getMessage());
102
}
103
```
104

105
### Token Filtering and Processing
106

107
The tokenizer includes sophisticated filtering to identify meaningful tokens for duplicate detection while ignoring irrelevant elements.
108

109
**Filtered Token Types:**
110

111
The tokenizer automatically filters out these Scalameta token types:
112
- `Token.Space` - Whitespace characters
113
- `Token.Tab` - Tab characters  
114
- `Token.CR` - Carriage return
115
- `Token.LF` - Line feed
116
- `Token.FF` - Form feed
117
- `Token.LFLF` - Double line feed
118
- `Token.EOF` - End of file markers
119
- `Token.Comment` - Comments (handled separately)
120

121
**Comment Handling:**
122

123
Comments receive special processing to support PMD's comment-based features:
124

125
```java { .api }
126
/**
127
 * Internal token adapter that wraps Scalameta tokens for PMD compatibility
128
 */
129
public class ScalaTokenAdapter implements GenericToken {
130
    /**
131
     * Create token adapter with optional previous comment context
132
     * @param scalaToken the underlying Scalameta token
133
     * @param previousComment the most recent comment token for context
134
     */
135
    public ScalaTokenAdapter(Token scalaToken, GenericToken previousComment);
136
    
137
    @Override
138
    public String getImage();
139
    @Override
140
    public int getBeginLine();
141
    @Override
142
    public int getBeginColumn();
143
    @Override
144
    public int getEndLine();
145
    @Override
146
    public int getEndColumn();
147
    @Override
148
    public GenericToken getPreviousComment();
149
}
150
```
151

152
### Version-Specific Tokenization
153

154
The tokenizer supports different Scala versions through dialect configuration, ensuring accurate tokenization for version-specific syntax.
155

156
**Supported Dialects:**
157

158
```java
159
// Version selection through properties
160
Properties versionProps = new Properties();
161

162
// Scala 2.10
163
versionProps.setProperty(ScalaTokenizer.SCALA_VERSION_PROPERTY, "2.10");
164
ScalaTokenizer tokenizer210 = new ScalaTokenizer(versionProps);
165

166
// Scala 2.11  
167
versionProps.setProperty(ScalaTokenizer.SCALA_VERSION_PROPERTY, "2.11");
168
ScalaTokenizer tokenizer211 = new ScalaTokenizer(versionProps);
169

170
// Scala 2.12
171
versionProps.setProperty(ScalaTokenizer.SCALA_VERSION_PROPERTY, "2.12");
172
ScalaTokenizer tokenizer212 = new ScalaTokenizer(versionProps);
173

174
// Scala 2.13 (default)
175
versionProps.setProperty(ScalaTokenizer.SCALA_VERSION_PROPERTY, "2.13");
176
ScalaTokenizer tokenizer213 = new ScalaTokenizer(versionProps);
177

178
// Or use system property
179
System.setProperty(ScalaTokenizer.SCALA_VERSION_PROPERTY, "2.12");
180
ScalaTokenizer systemTokenizer = new ScalaTokenizer();
181
```
182

183
### Error Handling in Tokenization
184

185
The tokenizer handles various error conditions during tokenization:
186

187
```java { .api }
188
// Exception types thrown by tokenizer
189
import net.sourceforge.pmd.lang.ast.TokenMgrError;
190
import scala.meta.tokenizers.TokenizeException;
191
```
192

193
**Error Handling Examples:**
194

195
```java
196
try {
197
    tokenizer.tokenize(sourceCode, tokens);
198
} catch (IOException e) {
199
    // I/O errors reading source
200
    System.err.println("Failed to read source: " + e.getMessage());
201
} catch (TokenMgrError e) {
202
    // Tokenization errors from Scalameta  
203
    System.err.println("Tokenization error at line " + e.getLine() + 
204
                      ", column " + e.getColumn() + ": " + e.getMessage());
205
    
206
    // Original Scalameta exception available as cause
207
    if (e.getCause() instanceof TokenizeException) {
208
        TokenizeException originalError = (TokenizeException) e.getCause();
209
        System.err.println("Scalameta error: " + originalError.getMessage());
210
    }
211
} catch (Exception e) {
212
    // Other unexpected errors
213
    System.err.println("Unexpected tokenization error: " + e.getMessage());
214
}
215
```
216

217
### Integration with PMD CPD
218

219
The Scala tokenizer integrates seamlessly with PMD's copy-paste detection pipeline:
220

221
**Complete CPD Integration Example:**
222

223
```java
224
import net.sourceforge.pmd.cpd.*;
225

226
// Create CPD configuration for Scala
227
CPDConfiguration config = new CPDConfiguration();
228
config.setMinimumTileSize(50); // Minimum duplicate size
229
config.setLanguage(new ScalaLanguage());
230

231
// Add Scala source files
232
SourceCode source1 = new SourceCode("File1.scala", scalaCode1);
233
SourceCode source2 = new SourceCode("File2.scala", scalaCode2);
234

235
// Run CPD analysis
236
CPD cpd = new CPD(config);
237
cpd.add(source1);
238
cpd.add(source2);
239
cpd.go();
240

241
// Process results
242
Iterator<Match> matches = cpd.getMatches();
243
while (matches.hasNext()) {
244
    Match match = matches.next();
245
    System.out.println("Duplicate found:");
246
    System.out.println("  Size: " + match.getTokenCount() + " tokens");
247
    System.out.println("  Lines: " + match.getLineCount());
248
    
249
    for (Mark mark : match.getMarkSet()) {
250
        System.out.println("  Location: " + mark.getFilename() + 
251
                         " (line " + mark.getBeginLine() + ")");
252
    }
253
}
254
```
255

256
### Advanced Tokenization Features
257

258
**Custom Token Processing:**
259

260
```java
261
// Access internal token manager for advanced use cases
262
public class CustomScalaTokenProcessor {
263
    public void processTokens(SourceCode sourceCode) {
264
        Properties props = new Properties();
265
        props.setProperty(ScalaTokenizer.SCALA_VERSION_PROPERTY, "2.12");
266
        
267
        ScalaTokenizer tokenizer = new ScalaTokenizer(props);
268
        Tokens tokens = new Tokens();
269
        
270
        try {
271
            tokenizer.tokenize(sourceCode, tokens);
272
            
273
            // Custom processing of token stream
274
            for (TokenEntry token : tokens.getTokens()) {
275
                if (token.getValue().matches("[A-Z][a-zA-Z]*")) {
276
                    // Process class/object names
277
                    System.out.println("Potential type name: " + token.getValue());
278
                } else if (token.getValue().matches("[a-z][a-zA-Z]*")) {
279
                    // Process method/variable names  
280
                    System.out.println("Potential member name: " + token.getValue());
281
                }
282
            }
283
        } catch (IOException e) {
284
            System.err.println("Processing failed: " + e.getMessage());
285
        }
286
    }
287
}
288
```
289

290
**Token Statistics:**
291

292
```java
293
public class TokenStatistics {
294
    public void analyzeTokens(SourceCode sourceCode) throws IOException {
295
        ScalaTokenizer tokenizer = new ScalaTokenizer();
296
        Tokens tokens = new Tokens();
297
        tokenizer.tokenize(sourceCode, tokens);
298
        
299
        Map<String, Integer> tokenCounts = new HashMap<>();
300
        int totalTokens = 0;
301
        
302
        for (TokenEntry token : tokens.getTokens()) {
303
            if (!token.getValue().equals(TokenEntry.EOF.getValue())) {
304
                tokenCounts.merge(token.getValue(), 1, Integer::sum);
305
                totalTokens++;
306
            }
307
        }
308
        
309
        System.out.println("Total tokens: " + totalTokens);
310
        System.out.println("Unique tokens: " + tokenCounts.size());
311
        
312
        // Most frequent tokens
313
        tokenCounts.entrySet().stream()
314
            .sorted(Map.Entry.<String, Integer>comparingByValue().reversed())
315
            .limit(10)
316
            .forEach(entry -> System.out.println(entry.getKey() + ": " + entry.getValue()));
317
    }
318
}
319
```

Version

Tile

Files

cpd.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

cpd.mddocs/