USA flag logo/image

An Official Website of the United States Government

Markov Chain Monte Carlo and Exact Logistic Regression

Award Information

Agency:
Department of Health and Human Services
Branch:
N/A
Award ID:
53900
Program Year/Program:
2001 / SBIR
Agency Tracking Number:
1R43CA093112-01
Solicitation Year:
N/A
Solicitation Topic Code:
N/A
Solicitation Number:
N/A
Small Business Information
CYTEL SOFTWARE CORPORATION
CYTEL SOFTWARE CORPORATION 675 MASSACHUSETTS AVE CAMBRIDGE, MA 02139
View profile »
Woman-Owned: No
Minority-Owned: No
HUBZone-Owned: Yes
 
Phase 1
Fiscal Year: 2001
Title: Markov Chain Monte Carlo and Exact Logistic Regression
Agency: HHS
Contract: N/A
Award Amount: $113,111.00
 

Abstract:

DESCRIPTION (provided by applicant): Logistic regression is a very popular model for the analysis of binary data with widespread applicability in the physical, behavioral and biomedical sciences. Parameter inference for this model is usually based on maximizing the unconditional likelihood function. However unconditional maximum likelihood inference can produce inconsistent point estimates, inaccurate p-values and inaccurate confidence intervals for small or unbalanced data sets and for data sets with a large number of parameters relative to the number of observations. Sometimes the method fails entirely as no estimates can be found that maximize the unconditional likelihood function. A methodologically sound alternative approach that has none of the aforementioned drawbacks is the exact conditional approach in which one generates the permutation distributions of the sufficient statistics for the parameters of interest conditional on fixing the sufficient statistics of the remaining nuisance parameters at their observed values. The major stumbling block to this approach is the heavy computational burden it imposes. Monte Carlo methods attempt to overcome this problem by sampling from the reference set of possible permutations instead of enumerating them all. Two competing Monte Carlo methods are network based sampling and Markov Chain Monte Carlo (MCMC) sampling. Network sampling suffers from memory limitations while MCMC sampling can produce incorrect results if the Markov chain is not ergodic or if the process is not in the steady state. We propose a novel approach which combines the network and MCMC sampling, draws upon the strengths of each of them and overcomes their individual limitations. We propose to implement this hybrid network-MCMC method in our LogXact software and as an external procedure in the SAS system. PROPOSED COMMERCIAL APPLICATION: There is great demand for logistic regression software that can handle small, sparse or unbalanced data sets by exact methods. Our LogXact package is the only software that can provide exact inference for data sets which are not "toy problems". Yet even LogXact quickly breaks down on moderate sized problems. The new generation of hybrid network-MCMC algorithms will handle substantially larger problems that nevertheless need exact inference. The commercial potential is considerable since such data sets are common in scientific studies.

Principal Investigator:

Cyrus R. Mehta

Business Contact:


6176612011
MEHTA@CYTEL.COM
Small Business Information at Submission:

CYTEL SOFTWARE CORPORATION
675 MASSACHUSETTS AVE CAMBRIDGE, MA 02139

EIN/Tax ID: 023421342
DUNS: N/A
Number of Employees: N/A
Woman-Owned: No
Minority-Owned: No
HUBZone-Owned: No