A tool belt approach for auto-populating fields

Congratulations, you’ve won a contract review project. Sure, there are 5,000 documents and 100 fields that need to be extracted per document, but you can worry about that after you take a moment to celebrate! … OK – now it’s time to worry about that.

Rishi Khullar // September 21, 2021

Congratulations, you’ve won a contract review project. Sure, there are 5,000 documents and 100 fields that need to be extracted per document, but you can worry about that after you take a moment to celebrate!

OK – now it’s time to worry about that.

Why populating 5,000,000 fields is a daunting task

It’s not just the sheer number of fields that is problematic, it’s the accuracy requirement. The contract data your clients need, at the end of the day, must be structured into fields in a pristine way. Accuracy is not a nice-to-have because the data points needed from contracts are often mission-critical.

Another less talked-about problem is the data structure requirement. Text fields alone are simply not going to cut it. As contract review vets will attest, clients don’t care about simply finding the assignment clause. Clients want to know if they can assign, whether consent is required, what the exceptions are to consent, and other answers to specific questions. If you give them an Excel with a bunch of assignment provisions, that’s not as helpful.

Don’t shoot the messenger, this is just the reality of market expectations for contract review projects and other document review projects where data needs to be extracted into fields (think invoice extraction, for example). Now, let’s walk through the options…

Machine learning alone is not an option

Let’s get this out of the way up front. We are, as an industry, not at the point where out-the-box machine learning models alone can address all three of the above problems in a contract review project. No tool can simply run all 5,000 documents in the above example through the machine and have all the fields (across all of the required field types), filled out in a highly accurate manner.

Don’t get me wrong – we’re still working on a rocket that can eventually go to Mars, but as technologists, an important part of our job (and your success) is being honest about where the technology is at today.

Human review alone is likely a bad option

This is another easy one to dispel. While you do solve the problem of many fields accurately captured in the right field types — you run up against time and cost, while also adding back in a highly unpredictable variable – humans. The whole point of your expertise is choosing the right contract review tech and then mastering it is to reduce time and cost so that you can be more competitive in winning bids and building client trust.

Regular expressions alone are also likely a bad option

When there are 100 fields to populate per document (which is not uncommon), there is a good chance some percentage will simply require a subject matter expert to weigh in on and decide. With Heretik, you can craft regular expressions (regex) to auto-populate fields in a powerful and accurate way, but you may not be able to do so for every field.

Furthermore, creating regular expressions to auto-populate fields does have some downsides. It requires regex-writing expertise on your team (at Heretik, we help with this), it requires you to know the patterns you want to capture ahead of time, and it’s rigid in that when new and variant examples pop up, you may have to adjust your regex.

That said, for some percentage of fields, regex to auto-populate will likely be a very smart choice given the time savings and the accuracy.

Machine learning + regex + human review all together now

The most efficient, accurate, and cost-effective way to conduct a contract review project is by having the ability to auto-populate fields with either machine learning or regex (or both) within a document review system that is purpose-built for human review workflows.

The Heretik approach

At Heretik, our philosophy is to provide a flexible and robust toolkit for machine learning, regular expressions, and human review to populate fields so that you can choose the best combo of tools for the job.

In our upcoming Pythagoras release, we’ll be adding a feature that gives users the ability to select text within a document, send it to a field, and then train a custom extraction model to learn how to auto-populate that field based on examples that you provided. This is an alternative to writing regular expressions that doesn’t require you to know patterns in advance as you can find and capture them as you go. It’s also not as rigid in that the model will learn variances based on the examples you provide. Lastly, it doesn’t require any technical expertise to select some text and click on a field. It won’t be your only tool for auto-populating fields, but for some percentage of fields, it may be a better choice than regex or human-only review.

On the machine learning side, we also fully recognize that there may be an ML model somewhere else in the market that is perfect for your use case that but is external to Heretik. In Pythagoras, you can take the results from Amazon Texteract, Microsoft Azure Cognitive Services, Google Cloud AI, or any other contract AI tech with an API and push the results to fields for review in the Heretik Viewer where locations of results in documents are hyperlinked to each auto-populated field.

On the regex and human review sides, this has been our bread and butter. Check out our Partner Resource Center to learn more about creating regular expressions in Heretik and auto-populating fields. And if you’re interested in learning more about our upcoming Pythagoras release, or want to talk through auto-populating fields specifically for your project, let’s chat!


Rishi Blog Photo


Rishi is the Head of Product at Heretik. Prior to Heretik, he worked as a product manager at Relativity. Rishi began his career as a lawyer and worked as a judicial clerk for the Virgin Islands.