Videos » NODES 2024 - From Image to Graph: Leveraging Tesseract OCR Engine for Document Chunking and GraphRAG

NODES 2024 - From Image to Graph: Leveraging Tesseract OCR Engine for Document Chunking and GraphRAG

Posted by admin
In this session, you will learn how to take images and translate their content into a graph representation leveraging the Tesseract OCR Engine. Using the location of words identified by Tesseract, you will learn how to create a hierarchy of document chunk nodes--level, block, paragraph, and line. By having a hierarchy of chunks, you will be able to easily traverse different chunk sizes that relate to the same information. This can prove beneficial for RAG, where smaller chunks tend to be better for vector similarity and larger chunks tend to serve as better context for question and answer. with Kim Adler Get certified with GraphAcademy: https://dev.neo4j.com/learngraph Neo4j AuraDB https://dev.neo4j.com/auradb Knowledge Graph Builder https://dev.neo4j.com/KGBuilder Neo4j GenAI https://dev.neo4j.com/graphrag
Posted Nov 20
click to rate

Embed  |  152 views