Social Mobility

March 30th, 2023

Social mobility following age intervals between 2014 and 2021 in California, United-State

Date Published: 03/04/2023 Author: Nguyen Duc Minh – UPP technology

Abstract: This report analyzes the potential social mobility of people in California, United State in 2014 and 2021 by using the income dataset provided by US Census and Diverse Counterfactual Explanations (DiCE) - an explainable AI[1] method. From this information, we will see more about the change in social mobility in this local between the two years. I hope to contribute to applying Explainable AI into real-life case studies.

KEYWORDS: Explainable AI, DiCE, social mobility.


Social mobility[2] is the change in a person's socio-economic situation, either concerning their parents or throughout their lifetime. Social mobility can be measured in terms of earnings, income, or social class, …

In this study, potential change of income is calculated by changing age and other features. The age of a person is changed to higher with the limitation being the max in the next range in the age intervals defined. Potential social mobility (PSM) is defined by that potential change in income.


Firstly, the report uses a dataset, which is a part of ACS PUMS collected from the US Census dataset by tool folktable [3]. Data collected in California in the US, 2014-dataset has 180K records, and 2021-dataset has 190K records. Data includes 7 features: 'COW' - class of worker, 'SCHL' – educational levels, 'MAR' – marital status, 'RELP': relationship of each person in a family, 'SEX', 'RAC1P': race, 'WKHP'- number of working hours per week, 'income' – 0: income <50K dollar, 1: income > 50K dollar. You can see more details in 2014 and 2021.

Secondly, the report finds counterfactuals in order to see the change in income following features. A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output. DiCE [4] is a method that changes the lowest number of features to generate new counterfactuals (CFs). In the study, by using DiCE with and changing age and another feature, counterfactuals might be generated for each person that has income in the contrastive class. potential social mobility (PSM) will be measured by the percentage of counterfactuals that can be generated with max counterfactuals doing experiments on each observer following age intervals.

Thirdly, you will see more details about how to generate CFs and some limitations of the study. For generating CFs, we need a model first. With the model getting an accuracy of 81% in 2014 and 80% in 2021, income is predicted on test data. From the true test, data will be split following age (from 24) into 5 intervals, each range I get 2 samples from data after predicting: 40 random records with income class 0(<50K$) and 40 records for class income 1(>50K$). Finally, with these samples, DiCE will generate counterfactual (CFs) for each age interval with both two samples by change age and one of five features: 'MAR', 'COW', 'RELP', 'WKHP', 'SCHL'. The limitation of this method is that accuracy of the model leads to CFs generated wrong by bias. Moreover, class income in some cases is very difficult or cannot be changed. In these cases, CFs cannot be generated.


This study analyzes PSM in California in 2014 and 2021. It includes three parts: First part, the report shows an overview by analyzing PSM every age interval in two years. The second part will show you insights through PSM of features and an example of the effective attribute. The third part shows more detail about how potential social mobility changes or trends.


California's PSM reach 49.52% in 2014 and increase to 51.95% in 2021. For detail with PSM following age intervals (Fig.1), the listed range sort increase is: [24-35], [61-70], [36-45], [46-60], [>=70]. In both two years, the age range [>=70] has the highest PSM. It is easy to understand that: they have more experience and may have saved money after a long time.


On the other hand, with PSM the following features (Fig.2). Feature Educational levels (SCHL) have the highest PSM. And married status (MAR) has the lowest PSM while it is interesting that the relationship of the person in the family has quite a high PSM(3rd). And working hours per week (WKHP) also contribute to high PSM. It is exciting when educational levels and working hours per week have the highest PSM score. It means that knowledge and hardworking are the factors that help you change your life. In conclusion, Marital status doesn't affect too much to PSM while the relationship is quite important with top third place. The highest effect is featuring education level.

Moreover, DiCE helps us see the difference between the two years through PSM following features (Fig.2). Two features 'RELP' and 'SCHL' are most different between 2014 and 2021. From 2014 to 2021, the PSM of 'RELP' increased by 18% while 'SCHL' decreased by 10%.


For more detail about how attributes affect PSM, the study provides you with the trend of attributes for each feature. This trend is calculated by the percentage of this attribute in CFs to high income subtract from its percentage in CFs to low income. With 'COW' you can see the top 2 positive and top 2 negative attributes are most effective to PSM of 'COW' (Fig.3). That information shows that people who work in government always have positive trending income because working in government is the most stable. Specially, the Federal government and the boss of a company always have the highest positive trending income. In contrast to people who are gig workers or unpaid family workers, or farm. They have negative trending income at both two years and lower in 2021 – the year affected by COVID, the works become difficult or be canceled. Especially trending income of unpaid family workers and farmers is especially negative: -36.8 % in 2014 and decrease to -40.8 %. Easily to understand that they don't have a stable income or a contract of service. Moreover, the self-employees who don't own incorporated businesses (gig workers) have negative trending income at second place, reaching –6.2% in 2014 and decreasing to –13%, while in group gig workers, the percentage of high income with the total people in 2014 is 27.5% and increase to 32.35% in 2021. That demonstrates how high incomes are becoming more prevalent in group gig workers but the likelihood that potential change to high income in the future is declining. This is because gig workers have expanded, people in this class do more types of jobs so income and social mobility are diverse. For example, there are more people who work in low-skilled jobs such as: drive-bases with uber and lift but don't have a stable income. On the other hand, the conclusion is similar to the report on Gig Economy [5].


CFs provide us more information about trending change class of every age interval or every feature by the percentage of each their class income - potential high income (PHI) (Fig 4). Senior citizens become more involved in working (returnship/back to work) or delayed retirement [6].


The income of people at all intervals tends to decrease more than increase (PHI < 0.5) while in 2021, only group (>70) has a trend to increase. For detail, the PHI of features that affect this range age, is changed (Fig.5). Feature 'COW' has the highest PHI in 2014 then decreased to 3rd place in 2021 while 'WKHP' reach first place and 'SCHL' second place. Specifically, the PHI of 'WKHP' and 'SCHL' from <50% changed to>50%. That means in this range age, education and work hours are the factors that have a high chance to help increase income.


We analyzed potential social mobility in 2014 and 2021 by using DiCE to create counterfactuals for income. Furthermore, we delve into the influencing factors. DiCE help us see and understand more insight into the dataset. By processing the data, improving the model, and adjusting the range of features, this experiment will result in getting even more interesting and effective results.


[1] Explainable AI (XAI) - IBM

[2] Comparative Social Mobility

[3] Folktables

[4] Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations

[5] Gig economy in the U.S. - Statistics & Facts

[6] The Effects of Changes in Social Security's Delayed Retirement Credit: Evidence from Administrative Data

Data Analytics
share icon