Optimizing Contract Risk Analysis with 90% Accuracy through RAG

Optimizing Contract Risk Analysis with 90% Accuracy through RAG

Many companies are exploring the use of LLMs to streamline the analysis of large, complex documents. Although the idea might seem simple, it introduces a range of challenges. This complexity raises questions about how accurate these tools can truly be and whether they’re ready for high-stakes business environments.

In the past two months, CLIWANT has successfully developed a system capable of analyzing complex English contracts—exceeding 100 pages—with 90% accuracy in identifying risk clauses for our enterprise clients. This project was particularly challenging as it was conducted in an on-premises environment, meaning we couldn’t leverage the extended context length capabilities of the latest GPT or Gemini models. Instead, we had to work within the constraints of Meta's open-licensed model, Llama 3.1.

This blog post delves into the main challenges enterprises face in contract analysis and explains how CLIWANT leveraged AI technology to tackle these issues in a technically sophisticated way.


Problem Definition and Objectives

While we can’t disclose specific company names due to NDAs, the client is a global manufacturer managing large-scale projects with contract values reaching millions of dollars per agreement.

Given the high financial stakes, the contracts are extensive and detailed. In the past, missing even a minor clause led to substantial compensation payments, prompting the client to establish a rigorous checklist of over 200 review items, refined through decades of experience. Currently, each item is meticulously reviewed manually, with examples including:

  • In cases of project suspension due to natural disasters or unforeseen events, who holds the liability?
  • If defects emerge post-testing, who bears responsibility?

If a clause shifts all responsibility to the client, they could face astronomical liabilities for issues arising during the project.

The goal of this project is clear: to reduce the time needed for reviewing lengthy contracts from an entire day to just five minutes. Additionally, we aim to enhance the speed and accuracy of contract reviews by minimizing human error through an AI-powered system.

Technical Solution

We’ll start with an overview of the architecture and then walk through each component in detail.

This pipeline employs Retrieval-Augmented Generation (RAG), where documents are divided into chunks. Chunks most likely to contain relevant information are retrieved through similarity search and then analyzed by an LLM, which generates the final output.

Architecture Components

  1. Building metadata for RAG and prompting
  2. Preprocessing ITT (contract) documents and extracting text
  3. Text Chunking:
    • Context-Aware Chunking: creating chunks while maintaining context
    • Fixed-Size Chunking: chunking based on a fixed length (n characters)
  4. Similarity Search (RAG Search): finding highly relevant portions among extracted chunks
  5. LLM:
    • Chunk Analysis (Llama 70b): conducting in-depth analysis of specific chunks to extract necessary information
    • Content Extraction (Llama 70b): precisely selecting relevant information
    • Korean Summary (Yanolja): providing a summary in Korean for the final output

1. Building Metadata

Metadata is constructed to align with each review item, with a structure that includes the following elements:

Similarity Search Queries for Chunk Retrieval:

  • To create effective similarity search queries, we reference past documents, combining frequently appearing keywords within specific sections or leveraging the full content of relevant sections for search. The most effective query depends on the embedding model used in RAG and the characteristics of the review item.

Question Queries for Relevant Content Extraction:

  • These are prompts for the LLM to confirm and summarize information relevant to the review item within each chunk. For instance, a query might ask, "Is there a clause specifying liability in cases of natural disasters (e.g., typhoons, earthquakes)?"

Example Set for Few-Shot Prompting:

  • To enhance accuracy, we include related examples extracted from previous contract documents in a few-shot format. This approach has proven effective in helping the LLM grasp context and deliver precise results.

Background Information for Additional Clarity:

  • When necessary, we provide additional explanations or contextual details to enhance the LLM's understanding. For example, specifying, "Natural disasters include earthquakes, typhoons, tornadoes, and floods," ensures the LLM interprets the context accurately.

2. Preprocessing and Text Extraction

Contract data may be provided in various formats, such as Word documents and PDFs. To standardize processing, all documents are first converted to PDF format, and raw text is extracted.

During this extraction, elements that could reduce accuracy—such as page numbers and unnecessary annotations—are removed to ensure clean, high-quality data.

Additionally, an algorithm is developed to automatically detect sections within the document, enabling efficient parsing of metadata for each section. This structured approach improves the overall accuracy and reliability of the data processing.

3. Chunking

Chunking is performed based on the automatically detected sections from the preprocessing stage. However, if a section isn’t precisely extracted, we employ an alternative approach: fixed-length chunking based on a set number of characters.

Since target items are typically found within a single section, fixed-length chunking can lead to a loss of context, which may reduce the accuracy of the final output. We apply both chunking methods because, while contract formats are generally consistent, exceptions may arise where documents follow different structures, meaning section detection might not always succeed.

Sections are generally formatted as “#.#,” but to account for cases where alphabetic characters are included, we designed a flexible chunking system to maintain adaptability.

In this step, chunks extracted during the chunking phase are vectorized and compared against the query for the review item, retrieving the top k most similar chunks. For instance, if a 100-page document yields 500 chunks, similarity search identifies chunks containing keywords like "criminal offence" or "legal."

For example, using RAG, we retrieve the top k chunks related to criminal and legal keywords, which are then passed on to the next analysis phase.

5. LLM Review and Summary

In this system, the LLM is employed across three primary stages, with each function designated as an "Agent" in recent terminology.

Stage 1: Validation

  • The first validation step is critical for reducing hallucinations and increasing accuracy. The chunks extracted through similarity search in RAG serve as a net to capture relevant information from the entire document. In this step, each chunk’s relevance is thoroughly checked before moving on to the next stage.
  • Passing unrelated chunks to Stage 2 increases the risk of summarizing irrelevant content, so we use few-shot prompting with around five examples—showing both relevant and irrelevant cases—to reinforce validation accuracy. Initially, the Llama 3 8b model was used, but due to low validation accuracy, we upgraded to the Llama 70b model, significantly improving precision.

Stage 2: Extraction and Summary of Relevant Information

  • From the validated chunks, essential information is precisely extracted and summarized in a user-friendly format. For example, a summary might be as concise as, “Clause 10.3 specifies that neither party is liable in case of a natural disaster.” The summarized content is generated in English, as Llama models currently perform best in this language, particularly in legal contexts.

Stage 3: Korean Translation

  • In the final step, the summary from Stage 2 is translated into Korean. Testing showed that LLM models outperform standard translation tools like Papago for this purpose. After experimenting with various models, including KKULM, fine-tuned Llama3, and Solar-based models, we found the Yanolja model to deliver the best results.
  • Specifically, the model yanolja/EEVE-Korean-10.8B-v1.0 proved exceptionally suitable for translating legal clauses at a professional level.

Conclusion

Through the methods described above, we successfully built metadata for each item and collaborated with client experts to implement a system capable of identifying risk items with up to 90% accuracy.

While recent solutions like ChatPDF aim to create all-in-one Q&A functionalities for uploaded documents, we believe such approaches may fall short due to their lack of domain-specific nuances and verification for each query. To precisely extract specific information from documents, it is essential to build comprehensive metadata for each item and conduct thorough validation of the source text.

Enterprise clients interested in leveraging LLMs for complex business document analysis on a project basis are welcome to reach out at any time. 😊

Read more

대기업 계약서 독소조항, RAG로 90% 정확도 검수하기 (청킹, 유사도 검색, LLM 분석)

대기업 계약서 독소조항, RAG로 90% 정확도 검수하기 (청킹, 유사도 검색, LLM 분석)

LLM을 활용한 문서 분석은 겉보기에는 간단해 보이지만, 실제로는 다양한 차원의 복잡성을 포함합니다. 따라서 높은 정확도를 얼마나 보장할 수 있으며, 실제 비즈니스 업무에 바로 적용 가능한지에 대한 의구심이 제기되곤 합니다. 클라이원트는 최근 2개월 동안 엔터프라이즈 고객을 대상으로 100페이지가 넘는 복잡한 영문 계약서에서 90%의 정확도로 독소 조항을 분석하는 시스템을 성공적으로 개발했습니다.

미국, 싱가포르 입찰 기능: 11월 18일 출시 안내 🎉

미국, 싱가포르 입찰 기능: 11월 18일 출시 안내 🎉

안녕하세요, 입찰 분석 솔루션 클라이원트입니다. 클라이원트는 국내를 넘어 글로벌 입찰을 한곳에서 검색하고 분석할 수 있는 플랫폼을 목표로 개발해왔습니다. 그 첫 단계로, 미국과 싱가포르 입찰 기능 출시 소식을 상세히 안내드립니다. 클라이원트, 글로벌 입찰 문의하기 🌎 글로벌 입찰 기능 안내 * 📅 출시일: 11월 18일, 월요일 * ⚙️ 도입 방식: 기존 웹사이트에 통합 (외부 웹사이트로 이동 없이

🇺🇸 클라이원트, 미국 진출을 향한 첫 걸음

🇺🇸 클라이원트, 미국 진출을 향한 첫 걸음

추석 연휴였던 9월 17일에 한국을 떠나 시카고, 앤아버, 시애틀, LA, 버지니아, 메릴랜드, 워싱턴 D.C., 뉴욕까지 총 7개 도시를 방문하며 25개의 미팅을 소화했습니다. 마치 월드 투어를 다녀온 느낌이네요. 우선 운이 좋게 정말 다양한 분야와 규모의 기업인들을 만날 수 있었으며, 이해관계가 전혀 없음에도 어떻게든 도와주려는 한인 동포분들께 많은 신세를 졌습니다. 미국

[클라이원트] AI로 1시간 만에 끝내는 입찰 준비 (웨비나 풀버전)

[클라이원트] AI로 1시간 만에 끝내는 입찰 준비 (웨비나 풀버전)

최근 많은 기업들이 AI를 활용하여 업무 효율성을 높이고 있습니다. 60여 기관의 사업 수주 경험이 있는 '클라이원트'의 조준호 대표가 각종 입찰에 참여하는 기업들을 위해 제안 요청서(RFP) 분석 방법, 입찰 수주 전략, 제안서 작성 방법과 같은 실무적인 내용과 함께 AI를 입찰에 활용하여 입찰 준비 시간을 몇 주에서 단