Q. How is the probability distribution function measured for a biometric system's authorized and unauthorized users?

In order to investigate the performance of a biometric verification system, one looks at how the system reacts to a large number of inquires for biometric features from authorized as well as unauthorized users.  Due to natural fluctuations and measurement imperfections, the results of such an investigation are never absolutely certain, instead are only predictable to a certain extent.  In order to determine the error rates, "false acceptance" and "false rejection," the yes/no decisions of "authorized/unauthorized" are not used, instead the underlying degree of similarity between an inquiry and the saved reference feature.  In a series of measurements, similarity ratings ("score values") are collected for authorized and unauthorized users.  Then the frequency of incidence is counted for every similarity rating.  After being normalized with the total number of inquiries, both resulting histograms make up an approximation to the probability distribution function.  They show the measured estimation of a certain similarity rating's (n) probability of occurring for authorized users (pB(n)) and unauthorized users (pN(n)):
pB(n) ~

Number of measurements with similarity rating n for authorized user


Total number of measurements for authorized users

pN(n) ~

Number of measurements with the similarity rating n for unauthorized


Total number of measurements for unauthorized users

The higher the total number of measurements, the more accurate the estimation.  (See "Statistical Significance" .  A mathematical determination of probabilities as a relationship between the relevant possibilities and the total number of possibilities fails because as opposed to dice, there are simply too many different possibilities to be able to include.) In an ideal case (unfortunately unachievable), both distribution curves do not overlap.  That means, inquiries for unauthorized users have the low similarity ratings, whereas all the high similarity ratings are for authorized users.  In such a case it is easy to define a decision threshold, that clearly differentiates between authorized and unauthorized users.  In practice, however, there is always an overlap when the number of users is high enough. Here comes a typical diagram: q

Q. How do the FAR/FRR paired graphs affect a biometric system?

The error graphs of FAR and FRR are respectively defined as the probability that an unauthorized user is accepted as authorized, and that an authorized user is rejected as unauthorized.  The curves are dependent upon an adjustable decision threshold for the similarity of a scanned biometric characteristic to a saved reference.  The following derivations apply under the assumption that a similarity rating value can be any whole number between 0 and K, and that, for simplicity's sake, the probability of value K occurring is 0. It also makes sense in practical applications, when we first consider the FMR and the FNMR and later extract the threshold-independent rejections due to insufficient image quality from the FAR and FRR. Furthermore, we assume that for acceptance the coincidence of two features and for rejection the non-coincidence is required.

If a general probability distribution function p is given for discrete similarity values n, the probability PM(th) that the scanned biometric characteristic with similarity rating n falls below threshold th ("misses") is:

 PM(0) := 0
 PM(th)  =

th-1

q

n=0

p(n)

th = 1, 2, 3, ..., K

The sum of correct matches and mismatches must equal the number of total events.  For that reason, the probability PH(th) that the similarity rating of the scanned trait reaches or exceeds threshold th ("hits") will be:
PH(th) = 1 - PM(th) =

K

q

n=th

p(n)

th = 0, 1, 2, ..., K

The False Match Rate FMR(th) is an estimation to the probability that the similarity of two non-identical features does not reach or exceed a certain threshold value th.  Therefore:
FMR(th) ~ PH(th)= 1 -

th-1

q

n=0

 pN(n)

th = 1, 2, 3, ..., K

For the False Non-Match Rate FNMR (th), applies the analogous:
FNMR(th)  ~ PM(th) =

th-1

q

n=0

 pB(n)

th = 1, 2, 3, ..., K

where pN is the probability frequency function for non authorized users and pB is for authorized users. The approximation (~) indicates that only the expected value of the measured failure rates FMR and FNMR are identical with the probabilities PH resp. PM. The limit values are:
FMR(0) = 1 FMR(K) = 0
FNMR(0) = 0 FNMR(K) = 1
To calculate FAR and FRR, the threshold-independent quality rejection rate QRR (equals FTA, depending on definition) has to be taken into consideration. Provided that a false acceptance is assigned to a false match, we obtain:
FAR(th) = (1 - QRR) FMR(th)
FRR(th) = QRR + (1 - QRR) FNMR(th)
For the border values we then get:
FAR(0) = 1 - QRR FAR(K) = 0
FRR(0) = QRR FRR(K) = 1
Setting a similarity rating th as the threshold to differentiate between authorized and non authorized users, results in the experimental estimation of false acceptance rate FAR(th), as the number of similarity ratings of non authorized users that fall above this threshold in comparison to all trials / number of similarity ratings.  Conversely, the false rejection rate FRR is the number of authorized user's similarity ratings which fall below this same threshold compared with the total inquiries.  Through integration (in practice, successive summation) of the probability distribution curves, FAR and FRR graphs are determined, which are dependent on the adjustable adopted threshold th. The following diagrams show typical results in linear and logarithmic scale: q q

Q. How does one determine the Receiver Operating Characteristic (ROC) of a biometric system?

The FAR/FRR curve pair is excellently suited to set an optimal threshold for the biometric system.  Further predictors of a system's performance, however, are limited.  This is partially due to the interpretation of the threshold and similarity measures.   The definition of the similarity measures is a question of implementation.  Almost arbitrary scaling and transformations are possible, which affect the appearance of FAR/FRR curves but not the FAR-FRR values at a certain threshold. A popular example is the use of a "distance measure" between the biometric reference and the scanned biometric features.  The greater the similarity, the smaller the distance.  The result is a mirror image of the FAR/FRR curves.  A favorite trick is to stretch the scale of FAR/FRR curves near the EER (Equal Error Rate: FAR(th) = FRR(th)), (i.e., using more threshold values) thus making the system appear less sensitive to threshold changes.

In order to reach an effective comparison of different systems, a description independent of threshold scaling is required.  One such example from the radar technology is the Receiver Operating Characteristic (ROC), which plots FRR values directly against FAR values, thereby eliminating threshold parameters.  The ROC, like the FRR, can only take on values between 0 and 1 and is limited to values between 0 and 1 on the x axis (FAR).  It has the following characteristics:
  •  The ideal ROC only have values that lie either on the x axis (FAR) or the y axis (FRR); i.e., when the FRR is not 0, the FAR is 1, or vice versa.
  • The highest point (linear scale under the definitions used here) is for all systems given by FAR=0 and FRR=1.
  • The ROC cannot increase
As the ROC curves for good systems lie very near the coordinate axis, it is reasonable for one or both axis to use a logarithmic scale: q Remark: Instead of "ROC", sometimes the term "DET" (Detection Error Tradeoff) is used. In those cases, the term "ROC" is reserved for the complimentary plot 1 - FRR against FAR.

Q. How does a transition from verification to identification affect the FAR?

In a verification a biometric feature is compared with only one reference, whereas in an identification, it is compared with N (N>1) different references.  This transition to an identification results in higher FAR, and in an ideal case is as follows:
FARN = 1 - (1 - FAR1)N
where FARN is the false acceptance rate for N different stored references. The formula is restricted to the "access control" case where the correct assignment to an identity is not essential. For an N·FAR1 significantly smaller than 1,  we have approximated:
FARN ~ N·FAR1
Example:  A data base has 100 000 different references.  In an identification, FAR is raised from 10-7 to about 10-2! If in an application the correct assignment of ID data is essential (e.g., for bank transactions), other methods have to be used, as explained under Determination of FIR.

Q. How does a transition from verification to identification affect the FRR?

During identification the recognition biometric features are compared to all references. Obviously, in contrast to a verification, more than one similarity value (score) is generated. This fact complicates the decision, whether a biometric characteristicis to be accepted, or not. In particular, there are multiple ways to decide, if, e.g., several scores exceed a threshold. As a result, each decision procedure needs its own definition for a false rejection. Two examples are given: One must differentiate between applications which allow access to personal data after a successful identification (e.g., access to a personal bank account), and applications which grant general access not dependent on one's identity (e.g., entrance to a room without a protocol of an identified person's presence). In the first case an assignment of a biometric characteristic to a false identity may happen. This is called a false identification, characterized by the False Identification Rate FIR. Furthermore, it is conceivable that more than one reference template will generate a score above the threshold. This case is treated in Determination of FIR, showing that different decision strategies may yield different results.

In the second case, with increasing numbers of different references, the false rejection rate FRR decreases!  How can that be?  Very simply:  it increases the probability that a justified user is "identified" not only from his or her own personal features, but also those of others, as normally would be considered a false acceptance.  The user, however, does not notice the system's mistake.  Mathematically, under ideal conditions this appears:
FRRN = FRR1(1-FAR1)N-1
How is the False Identification Rate (FIR) calculated?

During an identification, the recognition biometric features are compared to many references and possibly, the similarity value will exceed the threshold for more than one reference. This is non-critical if only granting access, but can be very problematic if the correct assignment of personal data to the biometric characteristic is required (Example: access to a bank account via ATM).

The probability for the identification of further (by definition false) candidates (independent of the correct reference) can be calculated from the FAR since these candidates would represent false acceptances in the case of verification. Its value is given by:
1 - (1 - FAR1)N-1 ~ (N - 1) FAR1
whereby FAR1 is the False Acceptance Rate for a system with one reference. N represents the number of references. The approximation (right side) applies in the case that the resulting value lies considerably under 1.

The False Identification Rate can first be calculated after selecting one of the candidates. One standard, which is often found in practical applications, could be, for example, that the candidate with the highest similarity value is chosen (presuming that there is only one). Unfortunately, the FIR is only ascertainable when the probability density functions are available for false acceptance as well as false rejection.

Easier to calculate is the rule that multiple candidates are completely rejected, which raises the FRR and lowers FAR. The following definitions apply here:
FAR probability that a non-authorized person is identified
FRR probability that an authorized person is not identified
FIR probability that an authorized person is identified, but is assigned a false ID
These definitions result in the following formulas under ideal conditions (statistic independence, same error rates for all people, ...); where the index N is again the number of references:
FARN = N FAR1 (1 - FAR1)N-1
FRRN = 1 - (1 - FRR1 - FAR1 + N FRR1 FAR1) (1 - FAR1)N-2
FIRN = (N - 1) FRR1 FAR1 (1 - FAR1)N-2

Q. When are FAR and FRR values statistically significant?

A value is considered statistically significant when it is likely that is falls within a given error interval and the probability of falling outside this area by chance is relatively low.  Statistical significance is dependent upon the number of trials or sample size.  Because biometric values are difficult to model, the existence of statistical significance is hard to estimate.  As a rule of thumb ("Doddington's rule"), one must conduct enough tests that a minimum of 30 erroneous cases occur [Porter 1977]. Example: An FAR of 10-6 can be considered reliable, when 30 errors occur in 30 million trials. One error in a million trials also has an FAR of 10-6, but statistically is far less significant.  One can see that biometric tests are very expensive if performance needs to be very high.  The situation would be easier, if further information could be considered along with the yes/no questions (or accept/reject), as for example the proximity of a decision to the acceptance threshold.

Q. What is essential when comparing the ROC performance of biometric systems?

The accuracy performance of a verification system can be determined by exactly three statistical quantities: FAR, FER, and FRR. Since these three quantities influence each other when parameters (e.g., quality acceptance thresholds for enrolment and authentication) are changed, a comparison of one quantity between two systems makes only sense when the other two quantities are mutually equal. For example, let the FARs of different systems be compared. Then the corresponding FRRs must be equal, and the FERs must be equal, too. Regarding a ROC diagram, this condition can be easily fulfilled for all FRRs for which the curve has been measured, provided that the FERs of all curves are constant and the same. However, this is often violated since the FERs are actually different!

A solution to this problem comes from the procedure used, e.g., in the Fingerprint Verification Competition FVC2002, where different algorithms for fingerprint recognition have been tested. The idea is to consider a failure-to-enrol case as a virtual "FTE user" with the properties:

  • If the virtual FTE user tries a (virtual!) authentication, the result is always a rejection, thus increasing the FRR.
  • If an impostor tries an authentication attempt against a virtual FTE user, always a rejection is supposed, thus decreasing the FAR.
This way, the FER is eliminated and the ROC curves as well as the FAR/FRR values are forced to become comparable. Mathematically, we implement this method by introducing a Generalized FRR (GFRR) and a Generalized FAR (GFAR). (It will be a matter of standardization to fix these terms. Here they are used until standardization is finalized.) The calculation of GFRR and GFAR is quite simple, if we assume that each authentication trial is preceded by its own enrolment trial. This should make sense because authentication performance is not independent of enrolment: a good enrolment delivers better FRR values than a worse one. Therefore it seems to be statistically more accurate not to base a whole FRR statistics on a single enrolment!
GFAR(th) = (1 - FER) FAR(th)
GFRR(th) = FER + (1 - FER) FRR(th)
Here (th) denotes the dependency on the decision threshold parameter th which is assumed to range between 0 and K (arbitrary), see "How do the FAR/FRR paired graphs affect a biometric system?". These formulas show a strong relationship to those derived for FAR and FRR when including the FTA (Failure-to-Acquire). Similarly, we get for the border values:
GFAR(0) = (1 - FER)(1 - QRR) GFAR(K) = 0
GFRR(0) = FER + (1 - FER) QRR GFRR(K) = 1
Both formulas are symmetric in QRR (= FTA) and FER (= FTE), showing the strong relationship between Failure to Enrol and Failure to Acquire. In some cases these two values are even equal. This happens when the biometric system uses the same quality rejection mechanisms and levels for enrolment and for authentication. In practice, higher quality requirements during enrolment, leading to a higher FTE, might be quite reasonable to prevent enrolment of nonsense features. Furthermore, too low an enrolment quality will decrease usability of the authentication systems in daily use. In many applications it is better to spend more time during enrolment than losing time by multiple authentication trials.

A ROC diagram using GFAR and GFRR will be called Generalized ROC (GROC) diagram for consistency.

Q. What does separability of a biometric system mean?

The Receiver Operating Characteristic (ROC) offers an objective comparison of different biometric systems, in the form of a graph.  More practical would be the specification of one single measured value, which forms a kind of average of all the systems settings.  Therewith, only a global description of the system would be possible.  One must therefore understand that a system can be better overall, despite worse local functioning, for example in an operating point.

Separability is intuitively the ability of a biometric system to differentiate authorized and unauthorized users on the basis of a biometric feature.  The higher the separability, the fewer the errors while differentiating authorized and unauthorized users.  The measure of the separability, like that of the ROC, cannot be dependent on implementation specific scales.  Additionally, a separability measure should be easy to calculate.

A well known measure for the (inverse) separability is the Equal Error Rate (EER).  Unfortunately, the EER describes only one single point of the ROC.  While the definition is simple, the calculation is not so easy; the EER point does not exist as a measurement, instead it is derived through decision and approximation.

An (inverse) separability measure, which also prevents the EER disadvantages, is the area below the ROC graph.  It allows easy calculation of all ROC values through summation.  The only difficulty is the fact  that the ROC values are not equidistant.  Therefore, every y value (FAR) must be weighted by the distance between its corresponding x value (FRR) and the next value.  This distance for every ROC point is just the difference (that is, the gradient) of two consecutive values in the FAR graph.  As a result, the distance is given by the probability distribution graph of non authorized users.  (For continuous functions, in which the sum can be replaced by an integral, this would be a consequence of the substitution rule for integrals!)   The ROC area, here called ROCA, is (K+1 is the number of similarity ratings considered):

ROCA =

K

q

n=1

 FRR(n)pN(n-1)

pN: Probability distribution function for unauthorized users

This formula simply needs additions and multiplications of existing measured values.  Even though implementation specific similarity ratings n are summed, the ROCA is still independent of their definition. However, one must assume that no threshold-independent rejections occurs, i.e., FRR = FNMR and FAR = FMR.

Both EER and ROCA can take on values between 0 and 1.  Ideal separability of a biometric system and therewith the distribution pB and pN obviously result in EER and ROCA values of 0.  But what value belongs to the ideal non separability.  Intuitively, ideal non separability can only mean that both distributions pB and pN are exactly the same.  But in the case:
pN = pB

=>

FAR = 1 - FRR

=>

EER = ½
and:
pN = pB

=>

ROCA =

K

q

n=1

 FRR(n)pB(n-1)  ~ ½

(Proof for the approximation: one replaces the sum with an integral and considers pB as the derivative of FRR.  Now, only the rules for partial integration are needed.)

Reasonable vales for EER and ROCA lie between the extremes: 0 for perfect separability and ½ for perfect non separability.  What do values between ½ and 1 then mean?  This range is left for cases, in which distributions pB and pN trade roles and change places in the diagram.  For separability, this range has practically no meaning in biometrics.

Tags: , , , ,