Statistical contributions in modeling breast cancer data through structured additive regression (star) models
- Castro Rocha Duarte, Elisa Maria
- Carmen María Cadarso Suárez Director
- Bruno Cecilio de Sousa Co-director
Universidade de defensa: Universidade de Santiago de Compostela
Fecha de defensa: 10 de novembro de 2017
- María Luz Durbán Reguera Presidente/a
- Francisco Gude Sampedro Secretario
- Christel Faes Vogal
Tipo: Tese
Resumo
The breast cancer screening program was firstly started in Portugal by the Central Branch of the Liga Portuguesa Contra o Cancro -Núcleo Regional do Centro (LPCC/NRC) The program has been collecting valuable data from women attending the screening program since 1990, which is provided to be explored in the analyses performed in this dissertation. It includes information about lifestyle characteristics of women, characteristics inherent to a woman, together with the municipality of a woman's residence. The information about the cancer diagnostic, only was made available after the first study performed in this work, and enabled us to evaluate the variables that can be seen as breast cancer risk factors and those which may play a role as protective factors. When analyzing data from cancer screening programs, flexible regression specifications are required to account for the highly complex structure in such data. Structured Additive Regression (STAR) models were used to explore spatial and temporal correlations with a wide range of covariates. These models are flexible enough to deal with a variety of complex data sets, allowing us to reveal possible relationships among the variables considered in the studies presented in this work. Based on the belief that breast cancer risk is associated with several reproductive factors, such as early menarche and late menopause, the first study presents a spatio-temporal analysis of the variables age of menarche and age of menopause along with other reproductive and socioeconomic factors based on the registries of the first time a woman enters the screening program. The database consists of 259,652 records of women who entered the screening program for the first time between 1990 and 2007. The analysis performed in this study portrays the time evolution of the age of menarche and age of menopause and their spatial characterization, adding to the identification of factors that could be of the utmost importance in future breast cancer incidence research (published in the Biometrical Journal) . In a second study an extension of structured additive regression models is considered, where in addition to the possibility of including nonlinear and spatial effects, a trivariate interaction between attendance rate, detection rate and mortality rate of the screening program is included. While spatial effects capture unobserved heterogeneity at the municipality level, the trivariate interaction proves important for the understanding of the complex interaction effects resulting from the diversity in municipality coverage and attendance rates. The trivariate interaction is implemented based on a Markov random field representation which enables efficient Bayesian inference and, showed a significant improvement in terms of model fit when compared to a simpler geoadditive regression model (work under revision at Spatial Statistics). Studies addressing breast cancer risk factors, point to a downward trend of age at menarche and an upward trend for age at menopause, meaning an increase of a woman's reproductive lifespan cycle. In addition to studying the effect of the year of birth on the expectation of age at menarche and a woman's reproductive lifespan, it is important to understand how a woman's cohort affects the correlation between these two variables. Since the behavior of age at menarche and menopause may vary with the geographic location of a woman's residence, the spatial effect of the municipality where a woman resides needs to be considered. Thus, in a third study of this work, the Structured Additive Regression model is extended to the Bayesian multivariate structured additive distributional regression model (to appear at the Biometrical Journal). The latter is proposed in order to analyze how a woman's municipality and year of birth affects a woman's age at menarche, her lifespan cycle, and the correlation of the two variables.The breast cancer screening program was firstly started in Portugal by the Central Branch of the Liga Portuguesa Contra o Cancro -Núcleo Regional do Centro, and has been collecting valuable data from women attending the screening program since 1990, which is provided to be explored in the analyses performed in this dissertation. It includes information about a womans lifestyle, characteristics inherent to a woman, together with the municipality of a womans residence. The information about the cancer diagnostic, is also included and enabled us to evaluate the variables that can be seen as breast cancer risk factors and those which may play a role as protective factors. When analyzing data from cancer screening programs, flexible regression specifications are required to account for the highly complex structure in such data. Structured Additive Regression (STAR) models were used to explore spatial and temporal correlations with a wide range of covariates. These models are flexible enough to deal with a variety of complex data sets, allowing us to reveal possible relationships among the variables considered in the studies presented in this work. Since early menarche and late menopause are considered breast cancer risk factors, the first study presents a spatio-temporal analysis of the variables age of menarche and age of menopause along with other reproductive and socioeconomic factors based on the registries of the first time a woman enters the screening program. The database consists of 259,652 records of women who entered the screening program for the first time. This study portrays the time evolution of the age of menarche and age of menopause and their spatial characterization, adding to the identification of factors that could be of the utmost importance in future breast cancer incidence research. In a second study, in addition to the possibility of including nonlinear and spatial effects, a trivariate interaction between attendance rate, detection rate and mortality rate of the screening program is included. While spatial effects capture unobserved heterogeneity at the municipality level, the trivariate interaction proves important for the understanding of the complex interaction effects resulting from the diversity in municipality coverage and attendance rates. The trivariate interaction based on a Markov random fields enables efficient Bayesian inference and, showed a significant improvement in terms of model fit when compared to a simpler geoadditive regression model. Studies addressing breast cancer risk factors, point to a downward trend of age at menarche and an upward trend for age at menopause, meaning an increase of a woman's reproductive lifespan cycle. In addition to studying the effect of the year of birth on the expectation of age at menarche and a woman's reproductive lifespan, it is important to understand how a woman's cohort affects the correlation between these two variables. Since the behavior of age at menarche and menopause may vary with the geographic location of a woman's residence, the spatial effect of the municipality where a woman resides needs to be considered. Thus, in a third study, the Structured Additive Regression model is extended to the Bayesian multivariate structured additive distributional regression model, in order to analyze how a woman's municipality and year of birth affects a woman's age at menarche, her lifespan cycle, and the correlation of the two variables.