Package 'variationalDCM' reference manual

Title:	Variational Bayesian Estimation for Diagnostic Classification Models
Description:	Enables computationally efficient parameters-estimation by variational Bayesian methods for various diagnostic classification models (DCMs). DCMs are a class of discrete latent variable models for classifying respondents into latent classes that typically represent distinct combinations of skills they possess. Recently, to meet the growing need of large-scale diagnostic measurement in the field of educational, psychological, and psychiatric measurements, variational Bayesian inference has been developed as a computationally efficient alternative to the Markov chain Monte Carlo methods, e.g., Yamaguchi and Okada (2020a) <doi:10.1007/s11336-020-09739-w>, Yamaguchi and Okada (2020b) <doi:10.3102/1076998620911934>, Yamaguchi (2020) <doi:10.1007/s41237-020-00104-w>, Oka and Okada (2023) <doi:10.1007/s11336-022-09884-4>, and Yamaguchi and Martinez (2024) <doi:10.1111/bmsp.12308>. To facilitate their applications, 'variationalDCM' is developed to provide a collection of recently-proposed variational Bayesian estimation methods for various DCMs.
Authors:	Keiichiro Hijikata [aut, cre], Motonori Oka [aut] , Kazuhiro Yamaguchi [aut] , Kensuke Okada [aut]
Maintainer:	Keiichiro Hijikata <[email protected]>
License:	GPL-3
Version:	2.0.2
Built:	2025-03-11 05:31:02 UTC
Source:	https://github.com/khijikata/variationaldcm

Artificial data generating function for the DINA model based on the given Q-matrix

Description

dina_data_gen() returns the artificially generated item response data for the DINA model

Usage

dina_data_gen(Q, N, attr_cor = 0.1, s = 0.2, g = 0.2, seed = 17)
dina_data_gen(Q, N, attr_cor = 0.1, s = 0.2, g = 0.2, seed = 17)

Arguments

`Q`	the $J \times K$ binary matrix
`N`	the number of assumed respondents
`attr_cor`	the true value of the correlation among attributes (default: 0.1)
`s`	the true value of the slip parameter (default: 0.2)
`g`	the true value of the guessing parameter (default: 0.2)
`seed`	the seed value used for random number generation (default: 17)

Value

A list including:

X: the generated artificial item response data
attr_pat: the generated true vale of the attribute mastery pattern

References

Oka, M., & Okada, K. (2023). Scalable Bayesian Approach for the Dina Q-Matrix Estimation Combining Stochastic Optimization and Variational Inference. Psychometrika, 88, 302–331. doi:10.1007/s11336-022-09884-4

Examples

# load Q-matrix
Q = sim_Q_J80K5
sim_data = dina_data_gen(Q=Q,N=200)
# load Q-matrix
Q = sim_Q_J80K5
sim_data = dina_data_gen(Q=Q,N=200)

Artificial data generating function for the hidden-Markov DCM based on the given Q-matrix

Description

hm_dcm_data_gen() returns the artificially generated item response data for the HM-DCM

Usage

hm_dcm_data_gen(
  N,
  Q,
  min_theta = 0.2,
  max_theta = 0.8,
  attr_cor = 0.1,
  seed = 17
)
hm_dcm_data_gen(
  N,
  Q,
  min_theta = 0.2,
  max_theta = 0.8,
  attr_cor = 0.1,
  seed = 17
)

Arguments

`N`	the number of assumed respondents
`Q`	the $J \times K$ binary matrix
`min_theta`	the minimum value of the item parameter $\theta_{jht}$
`max_theta`	the maximum value of the item parameter $\theta_{jht}$
`attr_cor`	the true value of the correlation among attributes (default: 0.1)
`seed`	the seed value used for random number generation (default: 17)

Value

A list including:

X: the generated artificial item response data
attr_pat: the generated true vale of the attribute mastery pattern, matrix form
attr_pat_string: the generated true vale of the attribute mastery pattern, string form

References

Yamaguchi, K., & Martinez, A. J. (2024). Variational Bayes inference for hidden Markov diagnostic classification models. British Journal of Mathematical and Statistical Psychology, 77(1), 55–79. doi:10.1111/bmsp.12308

Examples

indT = 3
Q = sim_Q_J30K3
hm_sim_Q = lapply(1:indT,function(time_point) Q)
hm_sim_data = hm_dcm_data_gen(Q=hm_sim_Q,N=200)

indT = 3
Q = sim_Q_J30K3
hm_sim_Q = lapply(1:indT,function(time_point) Q)
hm_sim_data = hm_dcm_data_gen(Q=hm_sim_Q,N=200)

Artificial data generating function for the multiple-choice DINA model based on the given Q-matrix

Description

mc_dina_data_gen() returns the artificially generated item response data for the MC-DINA model

Usage

mc_dina_data_gen(N, Q, attr_cor = 0.1, seed = 17)
mc_dina_data_gen(N, Q, attr_cor = 0.1, seed = 17)

Arguments

`N`	the number of assumed respondents
`Q`	the $J \times K$ binary matrix
`attr_cor`	the true value of the correlation among attributes (default: 0.1)
`seed`	the seed value used for random number generation (default: 17)

Value

A list including:

X: the generated artificial item response data
attr_pat: the generated true vale of the attribute mastery pattern

References

Yamaguchi, K. (2020). Variational Bayesian inference for the multiple-choice DINA model. Behaviormetrika, 47(1), 159-187. doi:10.1007/s41237-020-00104-w

Examples

# load a simulated Q-matrix
mc_Q = mc_sim_Q
mc_sim_data = mc_dina_data_gen(Q=mc_Q,N=200)

# load a simulated Q-matrix
mc_Q = mc_sim_Q
mc_sim_data = mc_dina_data_gen(Q=mc_Q,N=200)

Artificial Q-matrix for MC-DINA model

Description

Artificial Q-matrix for a 30-item test measuring 5 attributes.

Usage

mc_sim_Q
mc_sim_Q

Format

A matrix with components

column 1: Item number
column 2: Stem
column 3 to end: attributes

References

Yamaguchi, K. (2020). Variational Bayesian inference for the multiple-choice DINA model. Behaviormetrika, 47(1), 159-187. doi:10.1007/s41237-020-00104-w

Artificial Q-matrix for 30 items 3 attributes

Description

this matrix represents an artificial Q-matrix for 30 items and 3 attributes

Usage

sim_Q_J30K3
sim_Q_J30K3

Format

An object of class matrix (inherits from array) with 30 rows and 3 columns.

Source

artificially simulated

Artificial Q-matrix for 80 items 5 attributes

Description

Artificial Q-matrix for a 80-item test measuring 5 attributes

Usage

sim_Q_J80K5
sim_Q_J80K5

Format

An object of class matrix (inherits from array) with 80 rows and 5 columns.

Source

artificially simulated

Variational Bayesian estimation for DCMs

Description

variationalDCM() fits DCMs by VB algorithms.

Usage

variationalDCM(X, Q, model, max_it = 500, epsilon = 1e-04, verbose = TRUE, ...)

## S3 method for class 'variationalDCM'
summary(object, ...)
variationalDCM(X, Q, model, max_it = 500, epsilon = 1e-04, verbose = TRUE, ...)

## S3 method for class 'variationalDCM'
summary(object, ...)

Arguments

`X`	$N \times J$ item response data for the DINA, DINO, MC-DINA, and saturated DCM models. Alternatively, $T$ -length list or 3-dim array whose elements are $N \times J/T$ binary item response data matrices for the HM-DCM
`Q`	$J \times K$ binary Q-matrix for the DINA, DINO, and saturated DCM models. For the MC-DINA model, its size should be $J \times (K+2)$ . Alternatively, $T$ -length list or 3-dim array whose elements are $J/T \times K$ Q-matrices for the HM-DCM
`model`	specify one of "dina", "dino", "mc_dina", "satu_dcm", and "hm_dcm"
`max_it`	Maximum number of iterations (default: `500`)
`epsilon`	convergence tolerance for iterations (default: `1e-4`)
`verbose`	logical, controls whether to print progress (default: `TRUE`)
`...`	additional arguments such as hyperparameter values
`object`	the return of the `variationalDCM` function and the argument of our `summary` function

Value

variationalDCM returns an object of class variationalDCM. We provide the summary function to summarize a result and users can check the following information:

(model parameters): estimates of posteror means and posterior standard deviations of model parameters that vary up to the model
attr_mastery_pat: MAP etimates of attribute mastery patterns
ELBO: resulting value of evidence lower bound
time: time spent in computation

Methods (by generic)

summary(variationalDCM): print summary information

variationalDCM

The variationalDCM() function performs recently-developed variational Bayesian inference for various DCMs. The current version can support the DINA, DINO, MC-DINA, saturated DCM, HM-DCM models. We briefly introduce additional arguments that are specific to each model.

DINA model

The DINA model has two types of model parameters: slip $s_j$ and guessing $g_j$ for $j=1,\cdots,J$ . We name the hyperparameters for the DINA model: delta_0 is a L-dimensional vector, which is a hyperparameter $\boldsymbol{\delta}^0$ for the Dirichlet distribution for the class mixing parameter $\boldsymbol{\pi}$ (default: NULL). When delta_0 is specified as NULL, we set $\boldsymbol{\delta}^0=\boldsymbol{1}_L$ . alpha_s, beta_s, alpha_g, and beta_g are positive values. They are hyperparameters { $\alpha_s$ , $\beta_s$ , $\alpha_g$ , $\beta_g$ } that determines the shape of prior beta distribution for the slip and guessing parameters (default: NULL). When they are specified as NULL, they are set $1$ .

DINO model

The DINO model has the same model parameters and hyperparameters as the DINA model. We thus refer the readers to the DINA model.

MC-DINA model

The MC-DINA model has additional arguments delta_0 and a_0. a_0 corresponds to positive hyperparamters $\mathbf{a}_{jc^\prime}^0$ for all $j$ and $c^\prime$ . a_0 is by default set to NULL, and then it is specified as $1$ for all elements.

Saturated DCM

The saturated DCM is a generalized model such as the G-DINA and GDM. In the saturated DCM, we have hyperparameters $\mathbf{A}^0$ and $\mathbf{B}^0$ in addition to $\boldsymbol{\delta}^0$ , which can be specified as arguments A_0 and B_0. They are specified by default as NULL, and then we set weakly informative priors.

HM-DCM

When model is specified as "hm_dcm", users have additional arguments nondecreasing_attribute, measurement_model, random_block_design, Test_versions, Test_order, random_start, A_0, B_0, delta_0, and omega_0. Users can accommodate the nondecreasing attribute constraint, which represents the assumption that mastered attributes are not forgotten, by setting the logical valued argument nondecreasing_attribute as TRUE (default: FALSE). Users can also control the measurement model by specifying measurement_model (default: "general"), and the current version can deal with the HM-general DCM ("general") and HM-DINA ("dina") models. This function can also handle the datasets collected by a random block design by specifying the logical valued argument random_block_design (default: FALSE). When it is specified as TRUE, users must enter Test_versions and Test_order. Test_versions is an argument indicating which version of the test each respondent has been assigned to based on a random block design, while Test_order indicates the sequence in which items are rearranged based on the random block design. A_0, B_0, delta_0, and omega_0 correspond to hyperparameters $\mathbf{A}^0$ , $\mathbf{B}^0$ , $\boldsymbol{\delta}^0$ , and $\boldsymbol{\Omega}^0$ . $\boldsymbol{\Omega}^0$ is nonnegative hyperparameters of Dirichlet distributions for attribute transition probabilities. omega_0 is by default set to NULL, and then we set $\boldsymbol{\Omega}^0=\mathbf{1}_L\mathbf{1}_L^\top$ .

References

Yamaguchi, K., & Okada, K. (2020). Variational Bayes inference for the DINA model. Journal of Educational and Behavioral Statistics, 45(5), 569-597. doi:10.3102/1076998620911934

Yamaguchi, K. (2020). Variational Bayesian inference for the multiple-choice DINA model. Behaviormetrika, 47(1), 159-187. doi:10.1007/s41237-020-00104-w

Yamaguchi, K., Okada, K. (2020). Variational Bayes Inference Algorithm for the Saturated Diagnostic Classification Model. Psychometrika, 85(4), 973–995. doi:10.1007/s11336-020-09739-w

Examples


# fit the DINA model
Q = sim_Q_J80K5
sim_data = dina_data_gen(Q=Q,N=200)
res = variationalDCM(X=sim_data$X, Q=Q, model="dina")
summary(res)



# fit the DINA model
Q = sim_Q_J80K5
sim_data = dina_data_gen(Q=Q,N=200)
res = variationalDCM(X=sim_data$X, Q=Q, model="dina")
summary(res)

Package 'variationalDCM'

Help Index

Artificial data generating function for the DINA model based on the given Q-matrix

Description

Usage

Arguments

Value

References

Examples

Artificial data generating function for the hidden-Markov DCM based on the given Q-matrix

Description

Usage

Arguments

Value

References

Examples

Artificial data generating function for the multiple-choice DINA model based on the given Q-matrix

Description

Usage

Arguments

Value

References

Examples

Artificial Q-matrix for MC-DINA model

Description

Usage

Format

References

Artificial Q-matrix for 30 items 3 attributes

Description

Usage

Format

Source

Artificial Q-matrix for 80 items 5 attributes

Description

Usage

Format

Source

Variational Bayesian estimation for DCMs

Description

Usage

Arguments

Value

Methods (by generic)

variationalDCM

DINA model

DINO model

MC-DINA model

Saturated DCM

HM-DCM

References

Examples