or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

authentication.mdblock-fetching.mdfile-management.mdindex.mdmesos.mdprotocol.mdshuffle-client.mdshuffle-server.md

authentication.mddocs/

0

# Authentication and Security

1

2

SASL-based authentication system for securing shuffle operations between clients and external shuffle services.

3

4

## Capabilities

5

6

### ShuffleSecretManager

7

8

Manages shuffle secrets for external shuffle service authentication.

9

10

```java { .api }

11

/**

12

* Manages shuffle secrets for external shuffle service authentication

13

*/

14

public class ShuffleSecretManager implements SecretKeyHolder {

15

/**

16

* Default SASL user name for Spark shuffle operations

17

*/

18

private static final String SPARK_SASL_USER = "sparkSaslUser";

19

20

/**

21

* Create a new shuffle secret manager

22

*/

23

public ShuffleSecretManager();

24

25

/**

26

* Register an application with its shuffle secret

27

* @param appId - Application ID to register

28

* @param shuffleSecret - Secret key for the application as string

29

*/

30

public void registerApp(String appId, String shuffleSecret);

31

32

/**

33

* Register an application with its shuffle secret

34

* @param appId - Application ID to register

35

* @param shuffleSecret - Secret key for the application as ByteBuffer

36

*/

37

public void registerApp(String appId, ByteBuffer shuffleSecret);

38

39

/**

40

* Unregister an application and remove its secret

41

* @param appId - Application ID to unregister

42

*/

43

public void unregisterApp(String appId);

44

45

/**

46

* Get the SASL user name for an application

47

* @param appId - Application ID

48

* @return SASL user name (typically SPARK_SASL_USER)

49

*/

50

@Override

51

public String getSaslUser(String appId);

52

53

/**

54

* Get the secret key for an application

55

* @param appId - Application ID

56

* @return Secret key as string, or null if not registered

57

*/

58

@Override

59

public String getSecretKey(String appId);

60

}

61

```

62

63

**Usage Examples:**

64

65

```java

66

import org.apache.spark.network.sasl.ShuffleSecretManager;

67

import org.apache.spark.network.shuffle.ExternalShuffleClient;

68

import org.apache.spark.network.util.TransportConf;

69

70

// Create shuffle secret manager

71

ShuffleSecretManager secretManager = new ShuffleSecretManager();

72

73

// Register applications with their secrets

74

String appId1 = "app-20231201-001";

75

String appId2 = "app-20231201-002";

76

String secret1 = "mySecretKey123";

77

String secret2 = "anotherSecretKey456";

78

79

secretManager.registerApp(appId1, secret1);

80

secretManager.registerApp(appId2, secret2);

81

82

// Verify registration

83

String retrievedSecret = secretManager.getSecretKey(appId1);

84

System.out.println("Retrieved secret for " + appId1 + ": " + (retrievedSecret != null ? "OK" : "MISSING"));

85

86

String saslUser = secretManager.getSaslUser(appId1);

87

System.out.println("SASL user for " + appId1 + ": " + saslUser);

88

89

// Use with external shuffle client for authenticated connections

90

TransportConf conf = new TransportConf("shuffle");

91

ExternalShuffleClient authenticatedClient = new ExternalShuffleClient(

92

conf, secretManager, true, 10000 // authEnabled = true

93

);

94

95

// Register ByteBuffer secret (alternative method)

96

ByteBuffer secretBuffer = ByteBuffer.wrap("bufferSecret789".getBytes());

97

String appId3 = "app-20231201-003";

98

secretManager.registerApp(appId3, secretBuffer);

99

100

// Clean up - unregister applications when done

101

secretManager.unregisterApp(appId1);

102

secretManager.unregisterApp(appId2);

103

secretManager.unregisterApp(appId3);

104

105

// Verify cleanup

106

String cleanedSecret = secretManager.getSecretKey(appId1);

107

System.out.println("Secret after cleanup: " + (cleanedSecret == null ? "REMOVED" : "STILL PRESENT"));

108

```

109

110

### Authentication Flow

111

112

The SASL authentication flow between shuffle clients and servers works as follows:

113

114

1. **Secret Registration**: Applications register their secrets with ShuffleSecretManager

115

2. **Client Creation**: ExternalShuffleClient is created with authentication enabled

116

3. **Connection Establishment**: Client attempts to connect to shuffle server

117

4. **SASL Handshake**: Client and server perform SASL authentication using shared secret

118

5. **Authenticated Communication**: All subsequent shuffle operations are authenticated

119

120

### Security Best Practices

121

122

1. **Secret Management**:

123

- Use unique, randomly generated secrets for each application

124

- Rotate secrets regularly in production environments

125

- Never log or expose secrets in plain text

126

127

2. **Authentication Configuration**:

128

- Always enable authentication in production deployments

129

- Use strong secrets with sufficient entropy

130

- Configure appropriate timeouts for authentication operations

131

132

3. **Network Security**:

133

- Use TLS/SSL for additional transport security when possible

134

- Implement proper firewall rules to restrict shuffle service access

135

- Monitor authentication failures for potential security issues

136

137

4. **Secret Storage**:

138

- Store secrets securely outside of application code

139

- Use secure key management systems in production

140

- Implement proper secret cleanup and disposal

141

142

### Common Authentication Patterns

143

144

```java

145

// Pattern 1: Basic authentication setup

146

ShuffleSecretManager secretManager = new ShuffleSecretManager();

147

secretManager.registerApp("myApp", generateSecureSecret());

148

149

ExternalShuffleClient client = new ExternalShuffleClient(

150

conf, secretManager, true, 10000

151

);

152

153

// Pattern 2: Multiple application management

154

ShuffleSecretManager multiAppSecretManager = new ShuffleSecretManager();

155

Map<String, String> appSecrets = loadAppSecretsFromSecureStorage();

156

157

for (Map.Entry<String, String> entry : appSecrets.entrySet()) {

158

multiAppSecretManager.registerApp(entry.getKey(), entry.getValue());

159

}

160

161

// Pattern 3: Dynamic secret rotation

162

public void rotateAppSecret(String appId, String newSecret) {

163

secretManager.unregisterApp(appId);

164

secretManager.registerApp(appId, newSecret);

165

// Notify clients to reconnect with new secret

166

}

167

168

// Pattern 4: Cleanup on application termination

169

public void cleanupApplication(String appId) {

170

try {

171

// Perform any necessary cleanup operations

172

client.close();

173

} finally {

174

// Always unregister the application secret

175

secretManager.unregisterApp(appId);

176

}

177

}

178

```

179

180

### Integration with Spark Security

181

182

The ShuffleSecretManager integrates with Spark's broader security framework:

183

184

- **Spark Authentication**: Works with `spark.authenticate` configuration

185

- **ACLs**: Integrates with Spark's access control lists

186

- **Encryption**: Can be combined with Spark's encryption features

187

- **Kerberos**: Compatible with Kerberos-based Spark deployments

188

189

### Troubleshooting Authentication Issues

190

191

Common authentication problems and solutions:

192

193

1. **Authentication Failures**:

194

- Verify secrets match between client and server

195

- Check that authentication is enabled on both sides

196

- Ensure proper secret registration before client initialization

197

198

2. **Connection Timeouts**:

199

- Increase `registrationTimeoutMs` for slow networks

200

- Check network connectivity between client and server

201

- Verify shuffle service is running and accessible

202

203

3. **Secret Management Issues**:

204

- Ensure secrets are registered before client operations

205

- Verify secret cleanup doesn't interfere with active connections

206

- Check for secret string encoding issues

207

208

4. **Performance Impact**:

209

- Authentication adds small overhead to connections

210

- Monitor connection establishment times

211

- Consider connection pooling for high-frequency operations