or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-search.mdindex.mdnotification-system.mdresult-management.mdsite-management.md

site-management.mddocs/

0

# Site Management

1

2

Site configuration and data management for loading, filtering, and organizing information about supported social networks and platforms. The site management system handles the database of over 400 supported sites with their detection methods and metadata.

3

4

## Capabilities

5

6

### Individual Site Information

7

8

Container class that holds comprehensive information about a single social media platform or website, including detection methods and testing data.

9

10

```python { .api }

11

class SiteInformation:

12

"""

13

Information about a specific website/platform.

14

15

Contains all data needed to check for username existence on a particular site.

16

"""

17

18

def __init__(

19

self,

20

name: str,

21

url_home: str,

22

url_username_format: str,

23

username_claimed: str,

24

information: dict,

25

is_nsfw: bool,

26

username_unclaimed: str = secrets.token_urlsafe(10)

27

):

28

"""

29

Create Site Information Object.

30

31

Args:

32

name: String which identifies the site

33

url_home: String containing URL for home page of site

34

url_username_format: String containing URL format for usernames on site

35

(should contain "{}" placeholder for username substitution)

36

Example: "https://somesite.com/users/{}"

37

username_claimed: String containing username known to be claimed on website

38

information: Dictionary containing site-specific detection information

39

(includes custom detection methods and parameters)

40

is_nsfw: Boolean indicating if site is Not Safe For Work

41

username_unclaimed: String containing username known to be unclaimed

42

(defaults to secrets.token_urlsafe(10) if not provided)

43

"""

44

45

name: str # Site identifier name

46

url_home: str # Homepage URL

47

url_username_format: str # URL template with {} placeholder

48

username_claimed: str # Known claimed username for testing

49

username_unclaimed: str # Known unclaimed username for testing

50

information: dict # Site-specific detection configuration

51

is_nsfw: bool # Not Safe For Work flag

52

53

def __str__(self) -> str:

54

"""

55

String representation showing site name and homepage.

56

57

Returns:

58

Formatted string with site name and homepage URL

59

"""

60

```

61

62

### Site Collection Management

63

64

Manager class that loads and organizes information about all supported sites, with filtering and querying capabilities.

65

66

```python { .api }

67

class SitesInformation:

68

"""

69

Container for information about all supported sites.

70

71

Manages the collection of site data and provides filtering and access methods.

72

"""

73

74

def __init__(self, data_file_path: str = None):

75

"""

76

Create Sites Information Object.

77

78

Loads site data from JSON file or URL. If no path specified, uses the

79

default live data from GitHub repository for most up-to-date information.

80

81

Args:

82

data_file_path: Path to JSON data file. Supports:

83

- Absolute file path: "/path/to/data.json"

84

- Relative file path: "data.json"

85

- URL: "https://example.com/data.json"

86

- None (default): Uses live GitHub data

87

88

Raises:

89

FileNotFoundError: If data file cannot be accessed

90

ValueError: If JSON data cannot be parsed

91

"""

92

93

sites: dict # Dictionary mapping site names to SiteInformation objects

94

95

def remove_nsfw_sites(self, do_not_remove: list = []):

96

"""

97

Remove NSFW (Not Safe For Work) sites from the collection.

98

99

Filters out sites marked with isNSFW flag, with optional exceptions.

100

101

Args:

102

do_not_remove: List of site names to keep even if marked NSFW

103

(case-insensitive matching)

104

"""

105

106

def site_name_list(self) -> list:

107

"""

108

Get sorted list of all site names.

109

110

Returns:

111

List of strings containing site names, sorted alphabetically (case-insensitive)

112

"""

113

114

def __iter__(self):

115

"""

116

Iterator over SiteInformation objects.

117

118

Yields:

119

SiteInformation objects for each site in the collection

120

"""

121

122

def __len__(self) -> int:

123

"""

124

Get number of sites in collection.

125

126

Returns:

127

Integer count of sites

128

"""

129

```

130

131

## Site Data Structure

132

133

### JSON Configuration Format

134

135

Site data is stored in JSON format with the following structure for each site:

136

137

```json

138

{

139

"SiteName": {

140

"urlMain": "https://example.com/",

141

"url": "https://example.com/user/{}",

142

"username_claimed": "known_user",

143

"errorType": "status_code",

144

"isNSFW": false,

145

"headers": {

146

"User-Agent": "custom-agent"

147

}

148

}

149

}

150

```

151

152

### Detection Methods

153

154

Sites use various methods to detect username existence:

155

156

- **status_code**: HTTP status codes (200 = exists, 404 = not found)

157

- **message**: Text content analysis in response body

158

- **response_url**: URL redirection patterns

159

160

## Usage Examples

161

162

### Load Default Site Data

163

164

```python

165

from sherlock_project.sites import SitesInformation

166

167

# Load default site data from GitHub (most up-to-date)

168

sites = SitesInformation()

169

170

print(f"Loaded {len(sites)} sites")

171

print(f"Available sites: {sites.site_name_list()[:10]}") # First 10 sites

172

```

173

174

### Load Custom Site Data

175

176

```python

177

# Load from local file

178

sites = SitesInformation("custom_sites.json")

179

180

# Load from URL

181

sites = SitesInformation("https://example.com/my_sites.json")

182

183

# Load from absolute path

184

sites = SitesInformation("/path/to/sites.json")

185

```

186

187

### Filter NSFW Sites

188

189

```python

190

# Remove all NSFW sites

191

sites = SitesInformation()

192

print(f"Before filtering: {len(sites)} sites")

193

194

sites.remove_nsfw_sites()

195

print(f"After filtering: {len(sites)} sites")

196

197

# Keep specific NSFW sites while removing others

198

sites = SitesInformation()

199

sites.remove_nsfw_sites(do_not_remove=["Reddit", "Tumblr"])

200

```

201

202

### Explore Site Information

203

204

```python

205

# Iterate through all sites

206

for site in sites:

207

print(f"Site: {site.name}")

208

print(f" Homepage: {site.url_home}")

209

print(f" URL Format: {site.url_username_format}")

210

print(f" NSFW: {site.is_nsfw}")

211

print(f" Test User: {site.username_claimed}")

212

print()

213

214

# Access specific site

215

github_site = sites.sites["GitHub"]

216

print(f"GitHub URL format: {github_site.url_username_format}")

217

print(f"GitHub test user: {github_site.username_claimed}")

218

219

# Check if site exists

220

if "Twitter" in sites.sites:

221

twitter_site = sites.sites["Twitter"]

222

print(f"Twitter detection method: {twitter_site.information.get('errorType')}")

223

```

224

225

### Create Site Subsets

226

227

```python

228

# Create subset of specific sites

229

social_media_sites = {

230

name: sites.sites[name] for name in sites.sites

231

if name in ["GitHub", "Twitter", "Instagram", "Facebook", "LinkedIn"]

232

}

233

234

print(f"Social media subset: {len(social_media_sites)} sites")

235

236

# Create subset by category

237

tech_sites = {}

238

for name, site in sites.sites.items():

239

if any(keyword in site.url_home.lower() for keyword in ["github", "gitlab", "stackoverflow", "dev"]):

240

tech_sites[name] = site

241

242

print(f"Tech-related sites: {len(tech_sites)} sites")

243

```

244

245

### Analyze Site Configuration

246

247

```python

248

# Analyze detection methods

249

detection_methods = {}

250

nsfw_count = 0

251

252

for site in sites:

253

method = site.information.get('errorType', 'unknown')

254

detection_methods[method] = detection_methods.get(method, 0) + 1

255

256

if site.is_nsfw:

257

nsfw_count += 1

258

259

print("Detection method distribution:")

260

for method, count in detection_methods.items():

261

print(f" {method}: {count} sites")

262

263

print(f"\nNSFW sites: {nsfw_count}/{len(sites)}")

264

```

265

266

### Custom Site Configuration

267

268

```python

269

# Create custom site information

270

custom_site = SiteInformation(

271

name="CustomSite",

272

url_home="https://customsite.com/",

273

url_username_format="https://customsite.com/profile/{}",

274

username_claimed="testuser",

275

information={

276

"errorType": "status_code",

277

"headers": {

278

"User-Agent": "Custom-Bot/1.0"

279

}

280

},

281

is_nsfw=False

282

)

283

284

# Add to existing site collection

285

sites.sites["CustomSite"] = custom_site

286

287

print(f"Added custom site. Total sites: {len(sites)}")

288

```

289

290

### Site Data Export and Import

291

292

```python

293

import json

294

295

# Export current site configuration

296

site_data = {}

297

for name, site in sites.sites.items():

298

site_data[name] = {

299

"urlMain": site.url_home,

300

"url": site.url_username_format,

301

"username_claimed": site.username_claimed,

302

"isNSFW": site.is_nsfw,

303

**site.information # Include all detection-specific data

304

}

305

306

# Save to file

307

with open("exported_sites.json", "w") as f:

308

json.dump(site_data, f, indent=2)

309

310

# Load and verify

311

test_sites = SitesInformation("exported_sites.json")

312

print(f"Exported and reloaded {len(test_sites)} sites")

313

```

314

315

### Site Performance Analysis

316

317

```python

318

from sherlock_project.sherlock import sherlock

319

from sherlock_project.notify import QueryNotifyPrint

320

import statistics

321

322

# Test a small subset for performance analysis

323

test_sites = {name: sites.sites[name] for name in list(sites.sites.keys())[:20]}

324

325

notify = QueryNotifyPrint(verbose=True)

326

results = sherlock("testuser", test_sites, notify)

327

328

# Analyze response times

329

response_times = []

330

for site_name, result_data in results.items():

331

result = result_data['status']

332

if result.query_time:

333

response_times.append(result.query_time)

334

335

if response_times:

336

print(f"\nPerformance Analysis:")

337

print(f" Average response time: {statistics.mean(response_times):.3f}s")

338

print(f" Median response time: {statistics.median(response_times):.3f}s")

339

print(f" Fastest site: {min(response_times):.3f}s")

340

print(f" Slowest site: {max(response_times):.3f}s")

341

```