본문 바로가기
Project/Personal

Avoid Diabetes to Live Alive

The content is written for pdf viewer. please refer to the attachment.

 

 

 

Avoid Diabetes and Long Live.pdf
0.42MB

 

 

 

 

Abstract

As one of the common chronic diseases in the United States, diabetes affects people’s health. Diabetes is a severe chronic disease that induces a loss of ability to regulate glucose levels in the blood effectively. It can reduce the quality of life and life expectancy. This paper develops predictive models to identify risk factors for diabetes and pre-diabetes, which could help facilitate early diagnosis and intervention.

 

1.      Introduction

One of my relatives had had diabetes, and this disease became a critical factor in her passing away. It is widely known that genetic reason is one factor of diabetes.[1] This disease came to my mom in the end, and I am naturally looking for the prevention from diabetes as much as possible, let alone the genetic factor. Namely, what factors are likely to cause diabetes becomes my primary concern to literally ‘stay alive.’ In this paper, each weight of diabetes factors based on empirical statistics is presented to figure out what factors are statistically the most crucial in inducing diabetes in our real lives, using the matching method of subclassification and propensity score. Note that this paper is not about discovering the fundamental factor of diabetes but what kinds of factors likely cause diabetes from practical statistics.

 

2.      The data

In this paper, the data stems from The Behavioral Risk Factor Surveillance System (BRFSS) by the Centers for Disease Control and Prevention (CDC). BRSFF has conducted health-related telephone surveys that collect state data for U.S. residents about using preventive services, their health-related risk behaviors, and chronic health conditions. The current format was formed in 2011 with more than 500,000 interviews of the states, the District of Columbia, participating U.S. territories, and other geographic areas. Even though the overall survey structure and items are equivalent, some new questions are added, and old questions are omitted/corrected in contrast. I selected 2017 data[2] because it has more detailed questions regarding the food intake behaviors of U.S. residents compared to other years.

In original data, the surveyed group is the noninstitutionalized adult population - aged 18 years or older - who resided in the United States in 2017. Respondents are identified through telephone methods.

 

 

-      Core component: current health-related perceptions, conditions, and behaviors such as health status, health care access, alcohol or smoking consumption, fruits and vegetable consumption, HIV/AIDS risks, and demographic questions.

 

-      Optional BRFSS modules: questions on specific topics (e.g., pre-diabetes, diabetes, sugar-sweetened beverages, excess sun exposure, caregiving, shingles, cancer survivorship)

 

 

Thirteen survey questions are selected in this paper among core components and optional modules. This paper categorizes physical/mental health conditions, behavioral factors (physical activity, alcohol, smoking), and demography (sex, income, education, age group). The variables are listed below:



[1] Tabackman, L. (2021, June 2). Is type 2 diabetes genetic? Environmental factors and more. Healthline. Retrieved May 18, 2022, from https://www.healthline.com/health/type-2-diabetes/genetics

[2] Centers for Disease Control and Prevention. (2018, October 11). CDC - 2017 BRFSS survey data and Documentation. Centers for Disease Control and Prevention. Retrieved May 18, 2022, from https://www.cdc.gov/brfss/annual_data/annual_2017.html


 

 

 

 

Table 1: Notation

*Treatment variables are marked as (T).
**‘age65’ is a simplified binary variable from the variable ‘_AGE_G’ consisting of six-level age groups in the original data.

 

3.      The Matching Method - Subclassification

Subclassification requires conditional independence assumption (CIA). Respondents of the data are restricted to 18 years or older who resided in the United States in 2017. This indicates that country (U.S.) and age were chosen, and respondents were randomized. Hence, the data satisfy the CIA assumption. Directed acyclic graph (DAG) in Figure 1 used only treatment variables except for ‘sex’ and ‘age65’ because they are confounders inducing a biased estimate of average treatment effect (ATE).

 

 

 

 

 

 

 

 

Figure 1: Directed Acyclic Graphs

‘incom50’ and ‘cllgr’ become the backdoor pathway because they are demographic factors that can combine with others. In this Figure, ‘incom50’ and ‘cllgr’ is the case when the value is zero since they are considered the prevention factors of diabetes. This Figure assumes ‘cllgr’ and ‘incom50’ can be combined. All treatment variables can directly form a causal relationship with diabetes.

Table 2 below is the result of subclassification weighting to control for confounders such as ‘age65’ and ‘sex’ to avoid a biased estimate of the average treatment effect (ATE). With these cofounders, it is possible to stratify the data into four groups: young males, young females, old males, and old females. Then calculate the difference in diabetes probabilities for each factor. Next, calculate the number of people in the non-factors and divide by the total number of non-factors population. Lastly, calculate the weighted average diabetes rate using the strata weights. Note that all of the elements below are binary variables, that is, treatment variables.

 

 

Table 2: The weighted average treatment effect (ATE) estimates by percentage

As seen in Table 2, the most crucial factors are whether people have high blood pressure or not (hblpr), which is a medical aspect from the view of absolute value. The degree of education status is indifferent according to the table. A negative percentage implies the opposite effect of having diabetes. In other words, ‘incom50’ helps prevent diabetes. This might be attributed to the fact that people with high income are much more accessible to better surroundings such as housing, eating diets, or the capability to use medical services. However, heavy drinker status (hvdr) is also recorded as a negative percentage. This result is against conventional wisdom that more drinks cause higher chances of diabetes. This may be due to the limitation of the subclassification matching method or inconsistent measurement of the original data; The initial survey measured 14 drinks per week for men and seven drinks per week for women.

 

 

 

 

4.      The Matching Method - Propensity score methods

 

Table 3: The experimental average treatment effect estimate

Although Table 3 includes ATE of ‘sex’ and ‘age65’ unlike previous Table 2, the overall trend is similar to Table 2. ‘hblpr’ is the most related factor to diabetes, and ‘incom50’ is the best prevention factor against diabetes. ‘cllgr’ is not indifferent. ‘sex’ shows the demographic trend of diabetes. This table is an initial step to finding propensity scores.

Logistic regression is needed to find the propensity score. Logistic regression applies to this data because diabetes is the binary response variable with multiple independent variables. For fundamental assumptions that logistic regression requires, the correlation coefficients between selected independent variables are less than 0.5. The VIF[3] for all the independent variables is smaller than 2. Hence, there is no strong multicollinearity in independent variables. The regression removed influential outliers through data cleaning, not to mention that the sample size is large enough. This indicates logistics regression is the appropriate method to analyze the data. The regression result of the multiple logistics regression model is below:



[3] Variance inflation factor (VIF) is a measure of the amount of multicollinearity in a set of multiple regression variables.


 

 

 

 

 

 

Table 4: Regression result

 

 

 

 

 

Table 5: Regression result for treatment variables only

 

 

 

Table 6: The average propensity score by treatment variables

Unlike previous interpretations of ATE, ‘hblpr’ is no longer the most related factor to diabetes. Cardiac or myocardial status (michd) is the most causal factor in having diabetes, by a 33% conditional probability. ‘incom50’ is still the best prevention factor against diabetes with 19%, and ‘cllgr’ follows with 17%. ‘incom50’ and ‘cllgr’ has minor difference in the score. This indicates that ‘cllgr’ cannot be underestimated, unlike the previous ATE result. However, drinking status (hvdr) is still showing the opposite result against the conventional wisdom; untreated propensity score of ‘hvdr’ is higher than treated ‘hvdr,’ and thus ‘hvdr’ becomes the prevention factor. This disparity from reality might rule out the limitation of the analysis method for casual relationships since subclassification shows the same result above. In other words, the result supports the claim that inconsistent measurement of the original data created the disparity from the reality for ‘hvdr.’

 

 

 

 

 

 

Figure 2: Histogram of propensity score by Smoke

Smoking status (smok) has a similar propensity score for being treated by 14% conditional probability. As seen in Figure 2 above, the distribution of propensity scores is similar. This can be interpreted that smoking status (smok) is a comparatively indifferent factor to diabetes.

 

 

 

 

 

 

 

Figure 3: Histogram of propensity score by Michd

The distribution of propensity scores for cardiac or myocardial status (michd) suggests that prevention efforts against the inferior condition of ‘michd’ can be the paramount factor in preventing diabetes. This indicates that casual factors can guide to preventing diabetes had it not been for them.

 

 

 

 

 

 

5.       Conclusions

According to the matching methods, drinking (hvdr) and smoking (smok) seem to have a reverse or indifferent relationship with diabetes, contradicting conventional wisdom; if you drink or smoke, you would have a higher chance of getting diabetes. This relationship is due to the features of cross-sectional data: cross-sectional data is simply a snapshot of samples at a certain point in time. In order to elaborate on the relationship, a follow-up study is needed that traces the subjects' smoking or alcohol intake behaviors for specific periods.

High blood pressure, cardiac, or myocardial status are commonly critical to cause diabetes despite different results of the other matching methods. Then what lifestyle factors, such as smoking and alcohol, are the most related to diabetes? The answer was earning a lot of money and having a high-level education. Although the answer seems materialistic on the surface, considering the fact that avoiding medical factors requires money and knowledge, it is a pretty realistic conclusion for our lives. “You want to live alive? Just study and earn lots of money.”

 

728x90
반응형

'Project > Personal' 카테고리의 다른 글

Local sLLM Chatbot  (2) 2024.06.10
KCB 가계부채위험도 평가모델 개발  (0) 2024.04.22
Mustard and Lott (1997) Replication  (0) 2023.08.18
Quality of Life  (0) 2023.08.18
The Correlation between Lifestyle or Medical Factors and Diabetes  (0) 2023.08.18