DataStax delivers Glean and Unstructured integrations to AI platform at RAG++ event in NYC
Simplifying Data Ingestion to Improve Relevancy with Unstructured.io
Data preparation and ingestion is one of the biggest challenges when building a GenAI application. Developers are faced with converting massive amounts of existing data, in different formats, into a format suitable for use in retrieval-augmented generation (RAG). Often these documents are too large for embedding models to ingest and must be broken up into smaller segments or chunked.
To solve this problem Unstructured is now natively integrated with Langflow and Astra DB, simplifying complex configuration options and bringing the power of Unstructured ingestion pipelines to DataStax users. Developers can easily import multiple PDF files of any size, chunk those files, and using DataStax Vectorize, they can generate the vector embeddings for improved query relevancy.
This update adds support for more file types and streamlines data processing by bringing data preparation directly into the data loading process. Users can control chunk sizes to optimise semantic relevance and improve RAG performance. This leads to more relevant query results and better application resource utilisation.
“As developers move beyond the ideation and experimental phase that has characterised the past year or so, they’re looking to deploy GenAI applications into production with ease,” said Ed Anuff, Chief Product Officer at DataStax. “The DataStax AI PaaS offers users the ability to quickly build, iterate, and deploy applications with speed, at scale. It’s a field-proven platform that enables some of the largest global companies to leverage their data to power production-ready GenAI applications and deliver new internal and customer-facing experiences to the market.”
“Data preparation is a common issue for developers as they build their GenAI apps. They need to ingest, process, and chunk more data to ensure applications are delivering accurate, relevant query responses,” said Brian Raymond, CEO of Unstructured. “With our new, native integration with Langflow and Astra DB, we’re allowing AI developers to easily import and process unstructured data like PDFs, emails, and more. This enhanced capability sharpens query results and centralises unstructured data handling within DataStax’s AI PaaS.”
Enabling Seamless Access to Data with New Glean Integrations
DataStax will introduce a new integration that allows users to seamlessly connect their data stored in Astra DB with Glean. With this integration, Glean will be able to directly access and analyse data stored in Astra DB, enabling the platform to answer complex questions and provide relevant, accurate query responses.
Additionally, users will be able to leverage a new Glean Component for DataStax Langflow which enables developers to easily create Glean queries within a Langflow flow. Users can tap into Glean’s indexing capabilities to enrich the context of their operations and make more informed decisions based on real-time data insights.
The Glean integration is another example of the robust GenAI ecosystem being built into DataStax Langflow, which will provide developers the most diverse ecosystem of integration partners via its AI PaaS.
Driving Agility in GenAI Application Development with the Langflow API
DataStax has further enhanced its AI PaaS with the free public preview of the DataStax Langflow API. The Langflow API lets developers build and host their GenAI application anywhere with a simple HTTP call to an API endpoint hosted by DataStax, providing a fast and easy path to production.
This simplifies and speeds up deployment by removing the overhead of self-hosting an application, and integrates with external applications to easily embed GenAI into existing projects. The API is accessible over HTTP, and Langflow includes JavaScript and Python code snippets that can be dropped into a developer’s application.
“DataStax Langflow makes developing AI apps easy,” said Arvind Jain, CEO of Glean. “Now with Glean built-in, developers can connect to all their important corporate data sources and build custom AI experiences that helps their company automate work with AI. DataStax plus Glean will enable both structured and unstructured data to feed AI workflows.”