http://www.zorba-xquery.com/modules/excel/statistical
import module namespace excel = "http://www.zorba-xquery.com/modules/excel/statistical";
This is a library module offering a part of the set of statistical functions defined by Microsoft Excel 2003.
Daniel Turcanu
xquery version "3.0" encoding "utf-8";
- the XQuery module can be found here.
Imported modules:
For more details please also see:
| excel | http://www.zorba-xquery.com/modules/excel/statistical |
| excel-err | http://www.zorba-xquery.com/modules/excel/errors |
| excel-math | http://www.zorba-xquery.com/modules/excel/math |
| ver | http://www.zorba-xquery.com/options/versioning |
|
avedev
(
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType Returns the average of the absolute deviations of data points from their mean. |
|
|
average
(
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType Returns the average (arithmetic mean) of the arguments. |
|
|
averagea
(
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType Calculates the average (arithmetic mean) of the values in the sequence of arguments. |
|
|
count
(
$numbers as xs:anyAtomicType*
) as xs:integer Counts the number of cells that contain numbers or values castable to numeric. |
|
|
counta
(
$numbers as xs:anyAtomicType*
) as xs:integer Counts the number of values that are not empty. |
|
|
countblank
(
$cells as xs:anyAtomicType*
) as xs:integer Counts the empty values in a sequence. |
|
|
large
(
$numbers as xs:anyAtomicType*,
$k as xs:integer
) as xs:anyAtomicType Returns the k-th largest value in a data set. |
|
|
max
(
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType Returns the largest number in a sequence. |
|
|
maxa
(
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType Returns the largest value in a list of arguments. |
|
|
median
(
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType Returns the median of the given numbers. |
|
|
min
(
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType Returns the smallest number in a sequence. |
|
|
mina
(
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType Returns the smallest value in a list of arguments. |
|
|
mode
(
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType Returns the most frequently occurring, or repetitive, value in a sequence. |
|
|
percentile
(
$numbers as xs:anyAtomicType*,
$k_at as xs:anyAtomicType
) as xs:anyAtomicType Returns the k-th percentile of values in a sequence. |
|
|
percentrank
(
$numbers as xs:anyAtomicType*,
$x as xs:anyAtomicType
) as xs:decimal Returns the rank of a value in a data set as a percentage of the data set. |
|
|
prob
(
$x_range as xs:anyAtomicType+,
$prob_range as xs:anyAtomicType+,
$range_lower_limit as xs:anyAtomicType
) as xs:anyAtomicType This is the same as above, only that upper_limit is not specified. |
|
|
prob
(
$x_range as xs:anyAtomicType+,
$prob_range as xs:anyAtomicType+,
$range_lower_limit as xs:anyAtomicType,
$upper_limit as xs:anyAtomicType
) as xs:anyAtomicType Returns the probability that values in a range are between two limits. |
|
|
quartile
(
$numbers as xs:anyAtomicType*,
$quart as xs:integer
) as xs:anyAtomicType Returns the quartile of a data set. |
|
|
rank
(
$x as xs:anyAtomicType,
$numbers as xs:anyAtomicType*
) as xs:decimal This RANK function is same as the above, only that $order_ascending is set by default to false. |
|
|
rank
(
$x as xs:anyAtomicType,
$numbers as xs:anyAtomicType*,
$order_ascending as xs:boolean
) as xs:decimal Returns the rank of a number in a list of numbers. |
|
|
slope
(
$known_y as xs:anyAtomicType+,
$known_x as xs:anyAtomicType+
) as xs:anyAtomicType Returns the slope of the linear regression line through data points in known_y's and known_x's. |
|
|
small
(
$numbers as xs:anyAtomicType*,
$k as xs:integer
) as xs:anyAtomicType This function computes the k-th smallest value in a data set. |
|
|
standardize
(
$x as xs:anyAtomicType,
$mean as xs:anyAtomicType,
$standard_dev as xs:anyAtomicType
) as xs:double Returns a normalized value from a distribution characterized by mean and standard_dev. |
|
|
var
(
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType Estimates variance based on a sample. |
|
|
vara
(
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType Estimates variance based on a sample. |
|
|
varp
(
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType Calculates variance based on the entire population. |
|
|
varpa
(
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType Calculates variance based on the entire population. |
declare function excel:avedev (
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType
Returns the average of the absolute deviations of data points from their mean. The formula is sum(abs(x - average_x))/n, where n is the count of x in the sequence.
- $numbers the sequence of numbers or values castable to numeric. Sequence can be of any length from 1 up.
- The formula result
- excel-err:Value if the parameters cannot be casted to numeric type
declare function excel:average (
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType
Returns the average (arithmetic mean) of the arguments. Arguments can be empty values, otherwise must be castable to numeric. If sequence is empty then zero is returned. The sequence can be of any length.
- $numbers The sequence of numbers or empty values.
- The sum of all numbers divided by the number of non-empty values.
- excel-err:Value if the parameters cannot be casted to numeric type.
declare function excel:averagea (
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType
Calculates the average (arithmetic mean) of the values in the sequence of arguments. Arguments can be of any type. The numbers are added, and the sum is divided by the size of entire sequence.
- $numbers the sequence of values of any type. The sequence can be of any length, from 1 up.
- The result
declare function excel:count (
$numbers as xs:anyAtomicType*
) as xs:integer
Counts the number of cells that contain numbers or values castable to numeric.
- $numbers The sequence of values, of any length.
- The count of numbers.
declare function excel:counta (
$numbers as xs:anyAtomicType*
) as xs:integer
Counts the number of values that are not empty. Empty values are the one with string value "".
- $numbers the sequence of values of any type, any length
- The count of non-empty values
declare function excel:countblank (
$cells as xs:anyAtomicType*
) as xs:integer
Counts the empty values in a sequence. The empty values are the ones with string value "". The value 0 is not counted.
- $cells the sequence of values, of any length
- The count
declare function excel:large ( $numbers as xs:anyAtomicType*, $k as xs:integer ) as xs:anyAtomicType
Returns the k-th largest value in a data set. If n is the number of data points in a range, then LARGE(array,1) returns the largest value, and LARGE(array,n) returns the smallest value.
- $numbers the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.
- $k the position of largest value, with value from 1 to count of values
- The k-th largest value as numeric type
- excel-err:Value if the parameters cannot be casted to numeric type
- excel-err:Num if the sequence is empty
- excel-err:Num if k is not a value between 1 and the sequence size
declare function excel:max (
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType
Returns the largest number in a sequence.
- $numbers the sequence of numbers or values castable to numeric. The sequence can be of any length.
- The max
- excel-err:Value if the parameters cannot be casted to numeric type
declare function excel:maxa (
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType
Returns the largest value in a list of arguments. In this implementation there is no difference between MAX and MAXA.
- $numbers the sequence of numbers or values castable to numeric The sequence can be of any length.
- The max
- excel-err:Value if the parameters cannot be casted to numeric type
declare function excel:median (
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType
Returns the median of the given numbers. The median is the number in the middle of a set of numbers. Half the numbers have values that are greater than the median, and half the numbers have values that are less than the median.
- $numbers the sequence of numbers, of any length
- for odd count of numbers return the number in the middle of the sorted sequence. For even count of numbers return the average of the two numbers in the middle.
- excel-err:Value if the parameters cannot be casted to numeric type
declare function excel:min (
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType
Returns the smallest number in a sequence.
- $numbers the sequence of numbers or values castable to numeric. The sequence can be of any length.
- The min
- excel-err:Value if the parameters cannot be casted to numeric type
declare function excel:mina (
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType
Returns the smallest value in a list of arguments. In this implementation there is no difference between MAX and MAXA.
- $numbers the sequence of numbers or values castable to numeric The sequence can be of any length.
- The min
- excel-err:Value if the parameters cannot be casted to numeric type
declare function excel:mode (
$numbers as xs:anyAtomicType*
) as xs:anyAtomicType
Returns the most frequently occurring, or repetitive, value in a sequence. Arguments must be castable to numeric.
- $numbers the sequence of numbers, of any length
- The most occuring number
- excel-err:Value if the parameters cannot be casted to numeric type
- fn:QName("http://www.zorba-xquery.com/modules/excel/errors", "excel-err:NA") if there are no duplicate numbers
declare function excel:percentile ( $numbers as xs:anyAtomicType*, $k_at as xs:anyAtomicType ) as xs:anyAtomicType
Returns the k-th percentile of values in a sequence. If k is not a multiple of 1/(n - 1), PERCENTILE interpolates to determine the value at the k-th percentile. The function is computed by (max-min)*k + min
- $numbers the sequence of numbers, of any length
- $k_at the percentile, with value between 0 .. 1 inclusive
- The computed percentile
- excel-err:Value if the parameters cannot be casted to numeric type
- excel-err:Num if percentile is not between 0 .. 1
declare function excel:percentrank ( $numbers as xs:anyAtomicType*, $x as xs:anyAtomicType ) as xs:decimal
Returns the rank of a value in a data set as a percentage of the data set.
If x does not match one of the values in array,
PERCENTRANK interpolates to return the correct percentage rank.
The formula is uses: (RANK - 1) / (size - 1) .
- $numbers the sequence of numbers or values castable to numbers. The sequence can be of any length, from 1 up.
- $x is the value for which you want to know the rank
- The percentage of rank.
- excel-err:Value if the parameters cannot be casted to numeric type
- excel-err:Num if the sequence is zero length
declare function excel:prob ( $x_range as xs:anyAtomicType+, $prob_range as xs:anyAtomicType+, $range_lower_limit as xs:anyAtomicType ) as xs:anyAtomicType
This is the same as above, only that upper_limit is not specified. The probability is computed only for range_lower_limit.
- $x_range is the range of numeric values of x with which there are associated probabilities. This does not need to be ordered.
- $prob_range is a set of probabilities associated with values in x_range.
- $range_lower_limit is the value for which you want a probability.
- The probability of the range_lower_limit value
- excel-err:Num if any probability is not between 0 and 1
- excel-err:Num if the sum of probabilities is not equal to 1
- excel-err:Value if any parameter is not castable to numeric
- excel-err:Num if x_range and prob_range do not have the same number of values
declare function excel:prob ( $x_range as xs:anyAtomicType+, $prob_range as xs:anyAtomicType+, $range_lower_limit as xs:anyAtomicType, $upper_limit as xs:anyAtomicType ) as xs:anyAtomicType
Returns the probability that values in a range are between two limits.
- $x_range is the range of numeric values of x with which there are associated probabilities. This does not need to be ordered.
- $prob_range is a set of probabilities associated with values in x_range.
- $range_lower_limit is the lower bound on the value for which you want a probability.
- $upper_limit is the upper bound on the value for which you want a probability.
- The probability of the entire range
- excel-err:Num if any probability is not between 0 and 1
- excel-err:Num if the sum of probabilities is not equal to 1
- excel-err:Value if any parameter is not castable to numeric
- excel-err:Num if x_range and prob_range do not have the same number of values
declare function excel:quartile ( $numbers as xs:anyAtomicType*, $quart as xs:integer ) as xs:anyAtomicType
Returns the quartile of a data set.
- $numbers sequence of numbers or values castable to numbers. The sequence can be of any length, from 1 up.
- $quart one of the values 0, 1, 2, 3, 4 with meaning: 0 compute minimum value 1 compute first quartile (25th percentile) 2 compute median value (50th percentile) 3 compute third quartile (75th percentile) 4 compute maximum value
- the computed quartile, as numeric type
- excel-err:Value if the parameters cannot be casted to numeric type
- excel-err:Num if the sequence is zero length
- excel-err:Num if $quart is not one of the values 0, 1, 2, 3, 4
declare function excel:rank ( $x as xs:anyAtomicType, $numbers as xs:anyAtomicType* ) as xs:decimal
This RANK function is same as the above, only that $order_ascending is set by default to false.
- $x The number whose rank you want to find.
- $numbers the sequence of numbers or values castable to numbers. The sequence can be of any length.
- The rank of $x.
- excel-err:Value if the parameters cannot be casted to numeric type.
declare function excel:rank ( $x as xs:anyAtomicType, $numbers as xs:anyAtomicType*, $order_ascending as xs:boolean ) as xs:decimal
Returns the rank of a number in a list of numbers. The rank of a number is its size relative to other values in a list. (If you were to sort the list, the rank of the number would be its position.) RANK gives duplicate numbers the same rank.
- $x The number whose rank you want to find.
- $numbers The sequence of numbers or values castable to numbers. The sequence can be of any length.
- $order_ascending A boolean having the meaning: false then rank the number as if the sequence was sorted in descending order. true then rank the number as if the sequence was sorted in ascending order.
- The rank of $x.
- excel-err:Value if the parameters cannot be casted to numeric type.
declare function excel:slope ( $known_y as xs:anyAtomicType+, $known_x as xs:anyAtomicType+ ) as xs:anyAtomicType
Returns the slope of the linear regression line through data points in known_y's and known_x's.
The slope is the vertical distance divided by the horizontal distance between
any two points on the line, which is the rate of change along the regression line.
It computes the formula:
sum((x - average_x)(y - average_y)) / sum((x - average_x)^2)
where average_x and average_y are computed with AVERAGE function.
- $known_y the sequence of y numbers. The sequence can be of any length, from 1 up.
- $known_x the sequence of x numbers. The sequence can be of any length, from 1 up.
- The slope value, as numeric type
- excel-err:Value if any parameter cannot be casted to numeric
- fn:QName("http://www.zorba-xquery.com/modules/excel/errors", "excel-err:NA") if there are different numbers of x's and y's
- fn:QName("http://www.zorba-xquery.com/modules/excel/errors", "excel-err:NA") if any sequence is empty
- excel-err:Div0 if all x's are equal
declare function excel:small ( $numbers as xs:anyAtomicType*, $k as xs:integer ) as xs:anyAtomicType
This function computes the k-th smallest value in a data set. Use this function to return values with a particular relative standing in a data set. If n is the number of data points in array, SMALL(array,1) equals the smallest value, and SMALL(array,n) equals the largest value.
- $numbers A sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.
- $k The position (from the smallest) in the sequence of data to return. Must have value between 1 and size of sequence.
- The k-th smallest value of $numbers.
- excel-err:Value if the parameters cannot be casted to numeric type.
- excel-err:Num if the sequence is zero length.
- excel-err:Num if $k is not a value between 1 and the size of sequence.
declare function excel:standardize ( $x as xs:anyAtomicType, $mean as xs:anyAtomicType, $standard_dev as xs:anyAtomicType ) as xs:double
Returns a normalized value from a distribution characterized by mean and standard_dev.
The formula is (x - mean) / standard_dev .
- $x is the value you want to normalize
- $mean is the arithmetic mean of the distribution.
- $standard_dev is the standard deviation of the distribution.
- The normalized x, as numeric type
- excel-err:Value if any parameter cannot be casted to numeric
- excel-err:Num if standard_dev is a value smaller than zero or equal
declare function excel:var (
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType
Estimates variance based on a sample.
The formula is sum(x - average_x)^2 / (n - 1).
average_x is computed with AVERAGE function.
n is the count of numbers from the sequence, excluding empty values.
- $numbers the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.
- The variance, as numeric type
- excel-err:Value if the parameters cannot be casted to numeric type
declare function excel:vara (
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType
Estimates variance based on a sample.
The formula is sum(x - average_x)^2 / (n - 1).
average_x is computed with AVERAGE function.
n is the size of sequence, including empty values.
- $numbers the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.
- The variance, as numeric type
- excel-err:Value if the parameters cannot be casted to numeric type
declare function excel:varp (
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType
Calculates variance based on the entire population.
The formula is sum(x - average_x)^2 / n.
average_x is computed with AVERAGE function.
n is the count of numbers from the sequence, excluding empty values.
- $numbers the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.
- The variance, as numeric type
- excel-err:Value if the parameters cannot be casted to numeric type
declare function excel:varpa (
$numbers as xs:anyAtomicType+
) as xs:anyAtomicType
Calculates variance based on the entire population.
The formula is sum(x - average_x)^2 / n.
average_x is computed with AVERAGE function.
n is the size of sequence, including empty values.
- $numbers the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.
- The variance, as numeric type
- excel-err:Value if the parameters cannot be casted to numeric type