Walkthrough of the Bayesian audit sampling workflow

Koen Derks



This vignette aims to show how the jfa package facilitates auditors in the standard audit sampling workflow (hereafter “audit workflow”). In this example of the audit workflow, we will consider the case of BuildIt. BuildIt is a fictional construction company in the United States that is being audited by an external auditor for a fictional audit firm. At the end of the year, BuildIt has provided a summary of its financial situation in the financial statements. The objective of the auditor is to formulate an opinion about the fairness BuildIt’s financial statements.

The auditor needs to obtain sufficient and appropriate evidence for the hypothesis that the misstatement in the financial statements is lower than a certain amount: the materiality. If the financial statements contain misstatements that are considered material, this means that the errors in the financial statements are large enough that they might influence the decision of stakeholders relying on these financial statements. The performance materiality is the materiality that applies to each of the populations on which the financial statements are based. For this example, the performance materiality is set at 5% of the total value of the population.

In this example, we focus on the BuildIt data set that comes with the jfa package.

##      ID bookValue auditValue
## 1 82884    242.61     242.61
## 2 25064    642.99     642.99
## 3 81235    628.53     628.53
## 4 71769    431.87     431.87
## 5 55080    620.88     620.88
## 6 93224    501.76     501.76

The population of interest consists of 3500 items, each with a booked value. Let’s assume that, before performing audit sampling, the auditor has assessed the quality of BuildIt’s internal control systems and found that they were working properly.

In order to formulate an opinion about the misstatement in the population, the auditor separates their audit workflow into four stages. First, they will need to plan the minimum size of a sample they need to inspect to perform inference about the population. Second, they will need to select the required sample from the population. Third, they will need to inspect the selected sample and determine the audit (true) value of the items it contains. Fourth, they will need to use the information from the inspected sample to perform inference about the misstatement in the population.


Setting up the audit

The auditor wants to make a statement that, with 95% confidence, the misstatement in the population is lower than the performance materiality of 5%. Based on last year’s audit at BuildIt, where the upper bound of the misstatement turned out to be 2.5%, they want to tolerate at most 2.5% errors in the intended sample. The auditor can therefore re-formulate their statistical statement as that they want to conclude that, when 2.5% errors are found in the sample, they can conclude with 95% confidence that the misstatement in the population is lower than the performance materiality of 5%.

Below, the auditor defines the performance materiality, confidence level, and expected misstatements in the sample.

# Specify the confidence, materiality, and expected misstatements.
confidence <- 0.95 # 95%
materiality <- 0.05 # 5%
expected <- 0.025 # 2.5%

Many audits are performed according to the audit risk model (ARM), which determines that the uncertainty about the auditor’s statement as a whole (1 - the confidence) is a factor of three terms: the inherent risk, the control risk, and the detection risk. Inherent risk is the risk posed by an error in BuildIt’s financial statement that could be material, before consideration of any related control systems (e.g., computer systems). Control risk is the risk that a material misstatement is not prevented or detected by BuildIt’s internal control systems. Detection risk is the risk that the auditor will fail to find material misstatements that exist in an BuildIt’s financial statements. The ARM is practically useful because for a given level of audit risk, the tolerable detection risk bears an inverse relation to the other two risks. The ARM is useful for the auditor because it enables them to incorporate existing information on BuildIt’s organization to increase the required risk that they will fail to find a material misstatement.

\[ \text{Audit risk} = \text{Inherent risk} \,\times\, \text{Control risk} \,\times\, \text{Detection risk}\]

Usually the auditor judges inherent risk and control risk on a three-point scale consisting of low, medium, and high. Different audit firms handle different standard percentages for these categories. The auditor’s firm defines the probabilities of low, medium, and high respectively as 50%, 60%, and 100%. Because the auditor assessed BuildIt’s internal control systems, they assess the control risk as medium (60%).

# Specify the inherent risk (ir) and control risk (cr).
ir <- 1 # 100%
cr <- 0.6 # 60%

Stage 1: Planning an audit sample


The auditor can choose to either perform a frequentist analysis, where they use the increased detection risk as their level of uncertainty, or perform a Bayesian analysis, where they incorporate the information about the control risk into a prior distribution. For this example, we will show how to perform a Bayesian analysis. A frequentist analysis can easily be done through the following functions by setting prior = FALSE. In a frequentist analysis, the auditor uses the value c.adj as their new value for confidence.

# Adjust the required confidence for a frequentist analysis.
c.adj <- 1 - ((1 - confidence) / (ir * cr))

However, in a Bayesian audit, the auditor starts at step 0 by defining the prior distribution that corresponds to their assessment of the control risk. Using the auditPrior() function, they can create a prior distribution that incorporates the information in the risk assessments from the ARM. For more information on how this is done, see Derks et al. (2019).

# Step 0: Create a prior distribution according to the audit risk model.
prior <- auditPrior(
  method = "arm", likelihood = "poisson", expected = expected,
  materiality = materiality, ir = ir, cr = cr

The prior distribution can be shown by using the plot() function.


Now that the prior distribution is specified, the auditor can calculate the required sample size for their desired inference by using the planning() function. They uses the prior object as input for the planning() function.

# Step 1: Calculate the required sample size.
stage1 <- planning(materiality = materiality, expected = expected, conf.level = confidence, prior = prior)

The auditor can then inspect the result from her planning procedure by using the summary() function. The result show that, given the prior distribution, the auditor needs to select a sample of 178 items so that, when at most 4.45 misstatements are found, they can conclude with 95% confidence that the misstatement in BuildIt’s financial statements is lower the performance materiality of 5%.

##  Bayesian Audit Sample Planning Summary
## Options:
##   Confidence level:              0.95 
##   Materiality:                   0.05 
##   Hypotheses:                    H₀: Θ > 0.05 vs. H₁: Θ < 0.05 
##   Expected:                      0.025 
##   Likelihood:                    poisson 
##   Prior distribution:            gamma(α = 2.325, β = 53) 
## Results:
##   Minimum sample size:           178 
##   Tolerable errors:              4.45 
##   Posterior distribution:        gamma(α = 6.775, β = 231) 
##   Expected most likely error:    0.025 
##   Expected upper bound:          0.049981 
##   Expected precision:            0.024981 
##   Expected BF₁₀:                 9.6614

The auditor can inspect how the prior distribution compares to the expected posterior distribution by using the plot() function. The expected posterior distribution is the posterior distribution that would occur if the auditor actually observed a sample of 178 items, from which 4.45 were misstated.


Stage 2: Selecting a sample


The auditor is now ready to select the required 178 items from the population. They can choose to do this according to one of two statistical methods. In record sampling (units = "items"), inclusion probabilities are assigned on the item level, treating item with a high value and a low value the same, an item of $5,000 is equally likely to be selected as an item of $1,000. In monetary unit sampling (units = "values"), inclusion probabilities are assigned on the level of individual monetary units (e.g., a dollar). When a dollar is selected to be in the sample, the item that includes that dollar is selected. This favors items with a higher value, as an item with a value of $5,000 is now five times more likely to be selected than an item with a value of $1,000.

The auditor chooses to use monetary unit sampling, as they wants to include more high-valued items. The selection() function enables them to select the sample from the population. She uses the stage1 object as an input for the selection() function, since this passes the calculated sample size to the function.

# Step 2: Draw a sample from the financial statements.
stage2 <- selection(data = BuildIt, size = stage1, units = "values", values = "bookValue", method = "interval")

Like before, the auditor can inspect the outcomes of their sampling procedure
by using the summary() function.

##  Audit Sample Selection Summary
## Options:
##   Requested sample size:         178 
##   Sampling units:                monetary units 
##   Method:                        fixed interval sampling 
##   Starting point:                1 
## Data:
##   Population size:               3500 
##   Population value:              1403221 
##   Selection interval:            7883.3 
## Results:
##   Selected sampling units:       178 
##   Proportion of value:           0.062843 
##   Selected items:                178 
##   Proportion of size:            0.050857

Stage 3: Executing the audit

The selected sample can be isolated by indexing the sample object from the sampling result.

# Step 3: Isolate the sample for execution of the audit.
sample <- stage2$sample

Next, the auditor can execute the audit by annotating the items in the sample with their audit values (for exampling by writing the sample to a .csv file using write.csv(). They can then load the annotated sample back into the R session for further evaluation. For this example, the audit values of the sample items are already included in the auditValue column of the data set.

# To write the sample to a .csv file:
write.csv(x = sample, file = "auditSample.csv", row.names = FALSE)

# To load annotated sample back into R:
sample <- read.csv(file = "auditSample.csv")

Stage 4: Evaluating the sample


Using the annotated sample, the auditor can perform inference about the misstatement in the population via the evaluation() function. By passing the prior object to the function, it automatically sets method = "binomial" to be consistent with the prior distribution.

# Step 4: Evaluate the sample.
stage4 <- evaluation(
  materiality = materiality, conf.level = confidence, data = sample,
  values = "bookValue", values.audit = "auditValue", prior = prior

The auditor can inspect the outcomes of their inference by using the summary() function. The resulting upper bound is 2.278%, which is lower than the performance materiality of 5%. Therefore, the auditor can concluse that there is a 95% probability that the misstatement in BuildIt’s population is lower than 2.278%.

##  Bayesian Audit Sample Evaluation Summary
## Options:
##   Confidence level:               0.95 
##   Materiality:                    0.05 
##   Hypotheses:                     H₀: Θ > 0.05 vs. H₁: Θ < 0.05 
##   Method:                         poisson 
##   Prior distribution:             gamma(α = 2.325, β = 53) 
## Data:
##   Sample size:                    178 
##   Number of errors:               0 
##   Sum of taints:                  0 
## Results:
##   Posterior distribution:         gamma(α = 2.325, β = 231) 
##   Most likely error:              0.0057359 
##   95 percent credible interval:   [0, 0.022781] 
##   Precision:                      0.017045 
##   BF₁₀:                            2179.8

They can inspect the prior and posterior distribution by using the plot() function.



Since the 95% upper credible bound on the misstatement in population is lower than the performance materiality, the auditor has obtained sufficient evidence to conclude that the population does not contain material misstatements. The auditor can create a html or pdf report of the statistical results using the report() function, as shown below.

report(stage4, file = "report.html")