Heretik Analysis Part 1:
Segmentation & Classification

Claire Williams | March 21, 2019

Heretik is a contract review solution that helps our clients make smarter, faster, and more favorable decisions. But how do we utilize machine learning to help turn an enormous amount of information into digestible and well-organized data?

Heretik Analysis Sets are used to classify contracts by type (such as Lease, Purchase Agreement, etc.), then break them down by individual sections that is also labeled so you can easily find them with a quick filter. This is comprised of three different analysis techniques: Contract Classification, Segmentation, and Section Classification.

In this blog, we will be focusing on high-level overviews of these analysis techniques. As Heretik Analysis evolves, this foundation will help provide a clear picture of where we are headed… but more on that towards the end of the blog!

So let’s start with the basics. What are ‘Segmentation’ and ‘Classification’? What do they do, and why are they valuable in document review?

Running our Segmentation Analysis allows you to dissect a document or contract by breaking it up into specific sections. Our machine learning pipeline will analyze a slew of different attributes related to the structure of your document and make decisions as to how it should be broken into parts. A reviewer can manually correct any sections that Heretik misses during analysis.

Heretik Software Engineer Sam Miles offered his own example, “Segmentation’s purpose is to chop up documents. We look for certain features that say ‘I’m probably a section’ like a section heading, capital letters on a line by themselves, punctuation and more. If we are confident a piece of the document is a section, it’s time to start cutting up that piece. We continue this all the way down the document until we hit a clue that says ‘this is the end….of the document’. Once this completes, we do quality control check to see if the sections we created make sense.”

When you think of classification, you think of being able to divide things and put them into certain groups. Our Classification Analysis do just that by arranging contracts and sections into categories like Licensing Agreement (contract) and Term & Termination (section), respectively. 

Sam explained a bit more in detail about how classification does its job. “When we classify a contract or section, we use content within the contract or section to place it into a category we think is closest. We then automatically tag the contract or section with this category when you run analysis.”

Heretik Product Support Manager Mitch Kozak described the benefits and value of classification and segmentation for those analyzing larger documents. “At the end, you have visibility into your contact data set. Instead of struggling to recognize what documents are in front of you, you know what kinds of contracts you have in your data set, by using classification. You’re able to break it down even further and get more context as to what that contract includes with segmentation.”

Hopefully this makes clear what Classification and Segmentation do and how they can be useful. In our next post, we’ll talk about our vision for Extraction and Unitization, additional analysis techniques that we are currently building at Heretik!



Claire is a Marketing Coordinator at Heretik. She recently graduated from Miami University Ohio with a double major in Journalism and Mandarin Chinese. Prior to Heretik, Claire worked at Amdur Productions and for Miami University College of Arts and Science.​

Subscribe to stay up to
date on all things Heretik!