Blog

Creating a word embedding in Python (using gensim) from the U.S. Tax Code

Tuesday, May 12, 2020 by Matt Pickard word embedding, natural language processing, tax

Introduction I’ve run across a need to create a word embedding for legal tax jargon, specifically centered in the United States Internal Revenue code (aka, the tax code). The tax code is available in an XML format. The XML format is convenient because it allows us to extract the different sections of the tax code. A section is the basic “level” of the tax code document hierarchy (see section 7.

Continue Reading

Calling all Computer and Data Scientists! Accounting Needs You!

Monday, May 27, 2019 by Matt Pickard accounting, data science, computer science

I’ve recently attended several presentations and engaged in several conversations that leave me panning the horizon of the Accounting world asking, “Where are all the computer and data scientists?” Seriously! Where are you? Yes, the big four Accounting firms are hiring some of you. But the need is greater. It does not seem unreasonable that much of the revolutionary changes in the field will occur more organically, from smaller start-ups.

Continue Reading