In statistics, the F1 score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct results divided by the number of all returned results and r is the number of correct results divided by the number of results that should have been returned. The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.

The traditional F-measure or balanced F-score (F1 score) is the harmonic mean of precision and recall:

- $F\; =\; 2\; cdot\; (mathrm\{precision\}\; cdot\; mathrm\{recall\})\; /\; (mathrm\{precision\}\; +\; mathrm\{recall\}).,$

The general formula for non-negative real β is:

- $F\_beta\; =\; (1\; +\; beta^2)\; cdot\; (mathrm\{precision\}\; cdot\; mathrm\{recall\})\; /\; (beta^2\; cdot\; mathrm\{precision\}\; +\; mathrm\{recall\}).,$

The formula in terms of Type I and type II errors:

- $F\_beta\; =\; frac\; \{(1\; +\; beta^2)\; cdot\; mathrm\{true\; positive\}\; \}\{((1\; +\; beta^2)\; cdot\; mathrm\{true\; positive\}\; +\; beta^2\; cdot\; mathrm\{false\; positive\}\; +\; mathrm\{false\; negative\})\}.,$

Two other commonly used F measures are the $F\_\{2\}$ measure, which weights recall twice as much as precision, and the $F\_\{0.5\}$ measure, which weights precision twice as much as recall.

The F-measure was derived by van Rijsbergen (1979) so that $F\_beta$ "measures the effectiveness of retrieval with respect to a user who attaches β times as much importance to recall as precision". It is based on van Rijsbergen's effectiveness measure $E\; =\; 1-(1/(alpha/P\; +\; (1-alpha)/R))$. Their relationship is $F\_beta\; =\; 1\; -\; E$ where $alpha=1/(beta^2+1)$.

