The Basics:
Regular Expressions

Claire Williams | August 29, 2019

If you are anything like me, I often find myself getting lost in the sea of technical buzzwords that get thrown around, especially in legal tech. However, a lot of these words and concepts can be extremely important when it comes to actually being able to understand the full capacity of what software can do. In our new series, ‘The Basics,’ we’ll be unpacking some of these words from both technical and non-technical perspectives.

We’re kicking off the series with one you’ve definitely heard while using Heretik: ‘regular expressions’ (also known as ‘regex’).

While we will be focusing on the basic aspects, there are many online resources that are great starting points for furthering your understanding of regular expressions that we will link them throughout this post.

So, what are regular expressions? A regular expression is simply a string of text. This is an example of what one might look like:

“\bpayments\W+(?:\w+\W+){1,9}?days\b”

But it’s what they can do that is most impressive. Regular expressions are able to highlight areas of interest and key information based on a pattern of words or characters in a document’s text. (Spoiler alert – this is especially helpful when there are a lot of documents to sort through.)

Regular expressions are essentially pattern matching. You’re able to write a customized line of code that is then able to find what you are looking for in a document. Depending on how you structure the expression will tell it exactly how to search for what you want to find.

These are some of the characters that are used in a regular expression that each have specific meaning when it comes to searching:

( ), [ ] , { } , \ , $ , ? , ^, – , and more.

For example, ^cat means that the expression begins with the word ‘cat’. If you had a document that you were looking to find the word ‘cat’ at the beginning of the sentence, it would pick out this word specifically.

So, that is a very brief intro into how regular expressions work. We offer our users a library of specific regular expressions with Heretik. Our expressions are built to find things like dates, durations, currencies, and percentages. These are helpful for finding basic information in your contracts, but we’ve taken regular expressions a step further to help easily and find even more information: capture groups.

What are capture groups? Capture groups allow someone who isn’t necessarily a regular expression expert to still be able to quickly find what they are looking for in their documents. Sometimes you might write (or re-use) an expression that initially captures way more than what you might need. Instead of taking hours to figure out how to write that expression to capture exactly what you’re looking for, you can instead write a capture group to extract more.

Heretik Software Engineer Sam Miles explains, “For example, I can’t write a perfect regular expression to find ‘Effective Dates’. But I can write one to capture the part of the section in the contract that has the ‘Effective Date’. Boom. Capture group just allowed me to find ‘Effective Dates’ in my contracts because it had enough of what we are looking for.”

Chris Tkach, Director of Customer Success at Heretik, has over a decade experience in project management. He fully understands what project managers and teams are looking for, as well as their pain points.

Tkach says, “Ultimately what clients want is the ability to quickly know key information within their contracts – what contracts expire in the next six months? Which contracts do I need consent in order to assign? How many days do I have to pay incoming invoices? These questions can be answered by capturing individual data points throughout contracts using regular expressions and capture groups.”

Tkach’s excitement with regular expressions continues with Heretik’s auto-population functionality.

“We stress that there is no easy button in getting a client to structured data. Capturing dozens of data points in a contract could take over an hour. Our aim is to cut down this time considerably, and we can accomplish that through auto-populating these data points. Regular expressions allow us to look for patterns in a contract and auto-populate certain fields.”

For example, utilizing regular expressions, I can isolate payment term language such as “payments must be made within 60 days.” With capture groups, I can get more granular and auto-populate “60” in a whole number field. This is obviously easier to report and filter on than a text field.

In action, if we have a block of text from a contract:

“Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas at neque eu felis rutrum efficitur. Pellentesque lacinia mauris metus, at dapibus ante varius non. Maecenas non pulvinar velit. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Fusce ut risus sollicitudin, interdum dolor condimentum, hendrerit nibh. Donec efficitur sem nisl, ut tristique mauris pulvinar eget. payments must be made within 60 days. Duis eu odio vitae turpis pulvinar placerat vulputate a diam. Maecenas eget risus mauris. Nulla ac convallis erat, a commodo velit. Fusce convallis ultrices neque. Etiam id urna lacus. Mauris fermentum tortor lacus, id congue dui suscipit vitae. Donec mi odio, laoreet a enim et, suscipit varius mauris. Nulla facilisi.”

Utilizing regular expressions, we can quickly pull out the information that we need. We wrote a regular expression specifically to capture the text ‘payments must be made within 60 days”:

“\bpayments\W+(?:\w+\W+){1,9}?days\b”

To show another example, we can modify the regular expression to match the same text but it will instead extract the number “60”, allowing us to populate a Whole Number field in Relativity for a better filtering/reporting experience:

“\bpayments\W+(?:\w+\W+){1,9}?(?<highlight>\d{1,3})\s+?days\b”

If the text of the regular expression looks like someone rolled on their keyboard, don’t worry. That’s normal. We use Regex101, a great site to help explain how that mess of characters translates into a powerful tool for finding data in contracts.

In summary, regular expressions offer a way for Heretik to find the information you are looking for, while also cutting down the amount of time it takes to find that info. If you are interested in seeing regular expressions in action and how Heretik can change the way you analyze documents, click the link below!

SHARE THIS ARTICLE:

Share on facebook
Share on google
Share on twitter
Share on linkedin

CLAIRE WILLIAMS

Claire is a Marketing Coordinator at Heretik. She recently graduated from Miami University Ohio with a double major in Journalism and Mandarin Chinese. Prior to Heretik, Claire worked at Amdur Productions and for Miami University College of Arts and Science.​

Subscribe to stay up to
date on all things Heretik!