-
Notifications
You must be signed in to change notification settings - Fork 11
Add Angrist-Kreuger-CPS dataset #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
5db898f
135d200
5771cb8
927ca03
38a7910
dd0dfb8
731f860
3144acf
db45c0c
5b7c54a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| Creative Commons Legal Code | ||
|
|
||
| CC0 1.0 Universal | ||
|
|
||
| CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE | ||
| LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN | ||
| ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS | ||
| INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES | ||
| REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS | ||
| PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM | ||
| THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED | ||
| HEREUNDER. | ||
|
|
||
| Statement of Purpose | ||
|
|
||
| The laws of most jurisdictions throughout the world automatically confer | ||
| exclusive Copyright and Related Rights (defined below) upon the creator | ||
| and subsequent owner(s) (each and all, an "owner") of an original work of | ||
| authorship and/or a database (each, a "Work"). | ||
|
|
||
| Certain owners wish to permanently relinquish those rights to a Work for | ||
| the purpose of contributing to a commons of creative, cultural and | ||
| scientific works ("Commons") that the public can reliably and without fear | ||
| of later claims of infringement build upon, modify, incorporate in other | ||
| works, reuse and redistribute as freely as possible in any form whatsoever | ||
| and for any purposes, including without limitation commercial purposes. | ||
| These owners may contribute to the Commons to promote the ideal of a free | ||
| culture and the further production of creative, cultural and scientific | ||
| works, or to gain reputation or greater distribution for their Work in | ||
| part through the use and efforts of others. | ||
|
|
||
| For these and/or other purposes and motivations, and without any | ||
| expectation of additional consideration or compensation, the person | ||
| associating CC0 with a Work (the "Affirmer"), to the extent that he or she | ||
| is an owner of Copyright and Related Rights in the Work, voluntarily | ||
| elects to apply CC0 to the Work and publicly distribute the Work under its | ||
| terms, with knowledge of his or her Copyright and Related Rights in the | ||
| Work and the meaning and intended legal effect of CC0 on those rights. | ||
|
|
||
| 1. Copyright and Related Rights. A Work made available under CC0 may be | ||
| protected by copyright and related or neighboring rights ("Copyright and | ||
| Related Rights"). Copyright and Related Rights include, but are not | ||
| limited to, the following: | ||
|
|
||
| i. the right to reproduce, adapt, distribute, perform, display, | ||
| communicate, and translate a Work; | ||
| ii. moral rights retained by the original author(s) and/or performer(s); | ||
| iii. publicity and privacy rights pertaining to a person's image or | ||
| likeness depicted in a Work; | ||
| iv. rights protecting against unfair competition in regards to a Work, | ||
| subject to the limitations in paragraph 4(a), below; | ||
| v. rights protecting the extraction, dissemination, use and reuse of data | ||
| in a Work; | ||
| vi. database rights (such as those arising under Directive 96/9/EC of the | ||
| European Parliament and of the Council of 11 March 1996 on the legal | ||
| protection of databases, and under any national implementation | ||
| thereof, including any amended or successor version of such | ||
| directive); and | ||
| vii. other similar, equivalent or corresponding rights throughout the | ||
| world based on applicable law or treaty, and any national | ||
| implementations thereof. | ||
|
|
||
| 2. Waiver. To the greatest extent permitted by, but not in contravention | ||
| of, applicable law, Affirmer hereby overtly, fully, permanently, | ||
| irrevocably and unconditionally waives, abandons, and surrenders all of | ||
| Affirmer's Copyright and Related Rights and associated claims and causes | ||
| of action, whether now known or unknown (including existing as well as | ||
| future claims and causes of action), in the Work (i) in all territories | ||
| worldwide, (ii) for the maximum duration provided by applicable law or | ||
| treaty (including future time extensions), (iii) in any current or future | ||
| medium and for any number of copies, and (iv) for any purpose whatsoever, | ||
| including without limitation commercial, advertising or promotional | ||
| purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each | ||
| member of the public at large and to the detriment of Affirmer's heirs and | ||
| successors, fully intending that such Waiver shall not be subject to | ||
| revocation, rescission, cancellation, termination, or any other legal or | ||
| equitable action to disrupt the quiet enjoyment of the Work by the public | ||
| as contemplated by Affirmer's express Statement of Purpose. | ||
|
|
||
| 3. Public License Fallback. Should any part of the Waiver for any reason | ||
| be judged legally invalid or ineffective under applicable law, then the | ||
| Waiver shall be preserved to the maximum extent permitted taking into | ||
| account Affirmer's express Statement of Purpose. In addition, to the | ||
| extent the Waiver is so judged Affirmer hereby grants to each affected | ||
| person a royalty-free, non transferable, non sublicensable, non exclusive, | ||
| irrevocable and unconditional license to exercise Affirmer's Copyright and | ||
| Related Rights in the Work (i) in all territories worldwide, (ii) for the | ||
| maximum duration provided by applicable law or treaty (including future | ||
| time extensions), (iii) in any current or future medium and for any number | ||
| of copies, and (iv) for any purpose whatsoever, including without | ||
| limitation commercial, advertising or promotional purposes (the | ||
| "License"). The License shall be deemed effective as of the date CC0 was | ||
| applied by Affirmer to the Work. Should any part of the License for any | ||
| reason be judged legally invalid or ineffective under applicable law, such | ||
| partial invalidity or ineffectiveness shall not invalidate the remainder | ||
| of the License, and in such case Affirmer hereby affirms that he or she | ||
| will not (i) exercise any of his or her remaining Copyright and Related | ||
| Rights in the Work or (ii) assert any associated claims and causes of | ||
| action with respect to the Work, in either case contrary to Affirmer's | ||
| express Statement of Purpose. | ||
|
|
||
| 4. Limitations and Disclaimers. | ||
|
|
||
| a. No trademark or patent rights held by Affirmer are waived, abandoned, | ||
| surrendered, licensed or otherwise affected by this document. | ||
| b. Affirmer offers the Work as-is and makes no representations or | ||
| warranties of any kind concerning the Work, express, implied, | ||
| statutory or otherwise, including without limitation warranties of | ||
| title, merchantability, fitness for a particular purpose, non | ||
| infringement, or the absence of latent or other defects, accuracy, or | ||
| the present or absence of errors, whether or not discoverable, all to | ||
| the greatest extent permissible under applicable law. | ||
| c. Affirmer disclaims responsibility for clearing rights of other persons | ||
| that may apply to the Work or any use thereof, including without | ||
| limitation any person's Copyright and Related Rights in the Work. | ||
| Further, Affirmer disclaims responsibility for obtaining any necessary | ||
| consents, permissions or other rights required for any use of the | ||
| Work. | ||
| d. Affirmer understands and acknowledges that Creative Commons is not a | ||
| party to this document and has no duty or obligation with respect to | ||
| this CC0 or use of the Work. |
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How exactly was this file generated? Was it taken from somewhere or has been LLM generated?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The description has been LLM Generated, many of then made sense which correlated directly with https://cps.ipums.org/cps-action/variables/{tag} description like educ https://cps.ipums.org/cps-action/variables/educ had similar description, so I verified most of them this way, but some tags were not found using the same name, so had to assume what the LLM gave was correct, I have a list of verified and unverified tags If u want I can send that as well.. these 2 tags were the only suspicious one in the list acc to me |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| # Angrist-Krueger-CPS Dataset | ||
|
|
||
| This dataset is an extract of the CPS data containing 30,967 observations on men born 1944-53 from the 1979 and 1981-85 March CPS, matched to lottery number dummies for groups of 25 lottery numbers. There are 72 variables including all covariates. The raw files (`extract.dta` and `samplcps.do`) were replicated and processed into a ready-to-use tabular `.mixed.txt` format suitable for `pgmpy` consumption. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you explain how this processing was done?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Also Also, the website says
Should I do the preprocessing as mentioned in the paper instead that provides this ?? that will give a final shape of dataframe to Also, should I include all the preprocessing I did into the readme as well or something? |
||
|
|
||
| ## Column Descriptions | ||
|
|
||
| The dataset contains 72 variables derived from CPS extracts used to estimate the causal return to schooling using Vietnam draft lottery instruments. | ||
|
|
||
| ### Core Economic Variables | ||
| - educ: Years of completed education. | ||
| - annwage: Annual wage income. | ||
| - weeks: Number of weeks worked during the previous year. | ||
| - hrsly: Hours worked during the previous year. | ||
| - hrslw: Hours worked during the last week. | ||
| - wageflag: Indicator that wage information is valid/observed. | ||
|
|
||
| ### Demographic Variables | ||
| - age: Age of the respondent. | ||
| - agesq: Age squared, used to model nonlinear age effects. | ||
| - age2: Alternative squared age variable used in some regressions. | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here, |
||
| - race: Race category from CPS. | ||
| - black: Dummy variable indicating Black respondents. | ||
| - other: Dummy variable for race other than White or Black. | ||
| - marital: Marital status indicator. | ||
| - spsepres: Indicator for spouse present in the household. | ||
|
|
||
| ### Education Variables | ||
| - higratt: Highest grade attended. | ||
| - higrcomp: Highest grade completed. | ||
| - educ: Years of completed schooling. | ||
| - college: Indicator for college education. | ||
| - someco: Indicator for some college attendance. | ||
|
|
||
| ### Labor Market Variables | ||
| - esr: Employment status recode. | ||
| - esrflag: Indicator for valid employment status data. | ||
| - class: Class of worker (private, government, self-employed, etc.). | ||
| - ind: Industry classification code. | ||
| - occ: Occupation classification code. | ||
| - vet: Veteran status indicator. | ||
| - veteran: Recoded veteran status variable. | ||
|
|
||
| ### Geographic Variables | ||
| - state: State code. | ||
| - division: Census division classification. | ||
| - smsa: Indicator for residence in a Standard Metropolitan Statistical Area. | ||
| - metcode: Metropolitan area code. | ||
| - city: Indicator for residence in a central city. | ||
| - balsmsa: Balanced SMSA classification. | ||
|
|
||
| ### Regional Indicator Variables | ||
| These variables represent U.S. census regions used as regression controls. | ||
|
|
||
| - neweng: New England region indicator. | ||
| - midatl: Mid-Atlantic region indicator. | ||
| - eastnth: East North Central region indicator. | ||
| - westnth: West North Central region indicator. | ||
| - sthatl: South Atlantic region indicator. | ||
| - eaststh: East South Central region indicator. | ||
| - weststh: West South Central region indicator. | ||
| - mount: Mountain region indicator. | ||
| - pacific: Pacific region indicator. | ||
|
|
||
| ### Birth Year Variables | ||
| Dummy variables indicating the respondent’s year of birth. | ||
|
|
||
| - yob: Year of birth. | ||
| - yob44–yob53: Indicator variables for birth years 1944 through 1953. | ||
|
|
||
| ### Survey Year Variables | ||
| Dummy variables identifying the CPS survey year. | ||
|
|
||
| - year: CPS survey year. | ||
| - yr81: Indicator for survey year 1981. | ||
| - yr82: Indicator for survey year 1982. | ||
| - yr83: Indicator for survey year 1983. | ||
| - yr84: Indicator for survey year 1984. | ||
| - yr85: Indicator for survey year 1985. | ||
|
|
||
| ### Draft Lottery Instrument Variables | ||
| These variables represent grouped Vietnam draft lottery numbers used as instruments for education. | ||
|
|
||
| - lott1–lott13: Lottery number group indicator variables. | ||
|
|
||
| ### Sampling and Administrative Variables | ||
| - marchwt: CPS March supplement sampling weight. | ||
| - recode: Observation identifier used in the replication dataset. | ||
|
|
||
| ## Dataset Purpose | ||
|
|
||
| This dataset is used to estimate the causal effect of education on wages using instrumental variables derived from Vietnam draft lottery numbers. The lottery provides exogenous variation in schooling decisions among men born between 1944 and 1953. | ||
|
|
||
| ## References | ||
| **Source Citation:** | ||
| Angrist, J. D., & Krueger, A. B. (1995). Split-Sample Instrumental Variables Estimates of the Return to Schooling. Journal of Business & Economic Statistics, 13(2), 225-235. | ||
| Data extracted from the [Angrist Data Archive](https://economics.mit.edu/people/faculty/josh-angrist/angrist-data-archive). | ||
Uh oh!
There was an error while loading. Please reload this page.