Empirical calibration for DJL face_feature (ArcFace/FaceNet 512-d) embeddings: cosine distance bands, piecewise confidence formula, enrollment quality targets. Replaces the dlib-based jbaruch/face-recognition-calibration tile for Kotlin/JVM pipelines.
81
86%
Does it follow best practices?
Impact
100%
2.17xAverage score across 2 eval scenarios
Passed
No known issues
For DJL face_feature (ArcFace-derived, 512-d, L2-normalized, cosine distance), the textbook similarity formula does the wrong thing visually. This skill encodes the calibration we measured on real hardware.
// Textbook: from face_recognition tutorials and most blog posts
val conf = max(0f, 1f - dist / TOL) // TOL = 0.6With our measured baruch-in-frame distance of 0.20–0.30:
The user is clearly recognized at d=0.30 but the bar shows yellow. They lean in to "improve the signal" and the bar... goes to green at d=0.18. The textbook formula compresses strong matches into the middle band.
fun confidenceOf(d: Float): Float = when {
d <= 0.30f -> 1.0f
d >= 0.65f -> 0.0f
else -> (0.65f - d) / 0.35f
}With the same distances:
import ai.djl.modality.cv.Image
import ai.djl.modality.cv.ImageFactory
import ai.djl.ndarray.NDList
import ai.djl.ndarray.types.DataType
import ai.djl.repository.zoo.Criteria
import ai.djl.translate.Batchifier
import ai.djl.translate.Translator
import ai.djl.translate.TranslatorContext
import kotlin.math.sqrt
class FaceFeatureTranslator : Translator<Image, FloatArray> {
override fun getBatchifier(): Batchifier = Batchifier.STACK
override fun processInput(ctx: TranslatorContext, input: Image): NDList {
var array = input.toNDArray(ctx.ndManager, Image.Flag.COLOR)
array = array.transpose(2, 0, 1).toType(DataType.FLOAT32, false)
array = array.sub(127.5f).mul(0.0078125f) // (x - 127.5) / 128
return NDList(array)
}
override fun processOutput(ctx: TranslatorContext, list: NDList): FloatArray {
val raw = list.singletonOrThrow().toFloatArray()
// L2 normalize so cosine distance = 1 - dot(a, b)
val norm = sqrt(raw.sumOf { (it * it).toDouble() }).toFloat().coerceAtLeast(1e-8f)
return FloatArray(raw.size) { raw[it] / norm }
}
}
fun loadFaceFeatureModel() = Criteria.builder()
.setTypes(Image::class.java, FloatArray::class.java)
.optModelUrls("https://resources.djl.ai/test-models/pytorch/face_feature.zip")
.optModelName("face_feature")
.optTranslator(FaceFeatureTranslator())
.optEngine("PyTorch")
.build()
.loadModel()
fun cosineDistance(a: FloatArray, b: FloatArray): Float {
var dot = 0f
for (i in a.indices) dot += a[i] * b[i]
return 1f - dot // both L2-normalized
}
fun confidenceOf(d: Float): Float = when {
d <= 0.30f -> 1.0f
d >= 0.65f -> 0.0f
else -> (0.65f - d) / 0.35f
}For each enrolled person, embed N reference photos and average:
val embeddings: List<FloatArray> = photos.map { predictor.predict(it) }
val avg = FloatArray(embeddings[0].size)
for (e in embeddings) for (i in e.indices) avg[i] += e[i]
for (i in avg.indices) avg[i] /= embeddings.size.toFloat()
// Re-normalize the average (otherwise cosineDistance is meaningless)
val norm = sqrt(avg.sumOf { (it * it).toDouble() }).toFloat().coerceAtLeast(1e-8f)
for (i in avg.indices) avg[i] /= normFor "known vs unknown" decision (Stage 2 identity color), the threshold is 0.60 for our measured distances:
val (who, dist) = enrolled
.map { (name, ref) -> name to cosineDistance(emb, ref) }
.minBy { it.second }
val label = if (dist > 0.60f) "unknown" else whoFor a confidence MEASURE (Stage 3 semaphore), use the piecewise confidenceOf(d). They serve different purposes:
conf = 1 - d / TOL (textbook) — compresses strong matches.face-recognition-calibration constants (d ≤ 0.30 → 1.0, d ≥ 0.60 → 0.0) verbatim — the upper bound is wrong for DJL face_feature (use 0.65 instead).If recognition is unreliable, print the distance to every enrolled person for 5 s:
val dists = enrolled.map { (name, ref) -> name to cosineDistance(emb, ref) }
logger.info("dists: {}", dists.joinToString { "${it.first}=${"%.3f".format(it.second)}" })Look for:
If "true identity = 0.55, others = 0.58" → enrollment is too loose. Re-enroll with tighter face crops. If "true identity = 0.20, others = 0.25" → enrollment is too tight (probably you enrolled the same photo multiple times).