- All legal tax code XML from https://www.irs.gov/…, https://uscode.house.gov or or https://www.law.cornell.edu/uscode/text transformed into English tax code.
- A validated algorithmic mapping from the English legal tax to a format for storing in a relational database.
- Store all the legal tax code in a relational database (MySQL) using the mapping that is suitable for translation to ErgoAI or Catala-lang
{Empty}
They should be able learn Python or know how to code in Python or similar language.
They should be able to learn to parse XML with Python.
They should also be able to learn to work with one of several Python NLP libraries such as NLTK ( https://realpython.com/nltk-nlp-python/ )
{Empty}
Some hands-on experience
{Empty}
University of Connecticut - Stamford
Stamford, Connecticut
CR-Yale
{Empty}
Yes
Already behind3Start date is flexible
6
{Empty}
12/08/2021
{Empty}
06/08/2022
{Empty}
{Empty}
{Empty}
The student will learn how to parse, transform (using XSLT), and store the transformed data.
The student will learn to parse the English tax code using Python NLP library such as NLTK.
The student will learn to organize the legal text for storage to make retrieval easy and mapping easy to either ErgoAI or Catala-lang.
The student will learn some data architecture.
This transformed/organized tax code will be stored in a relational database such as MySQL.
The student will learn SQL and how to interact with a relational database through a database workbench.
The student will learn how to work with a relational database from a language like Python.
If there is time, the student will learn about deduction in ErgoAI or Catala-lang.
{Empty}
{Empty}
No clear need for HPC.
Though the tax code is substantial so there is a possibility the NLP application may require a good deal of CPU cycles.
{Empty}