Snowflake expands capabilities for enterprises to deliver trustworthy AI into production
Snowflake today announced new advancements that accelerate the path for organizations to deliver easy, efficient, and trusted Artificial Intelligence (AI) into production with their enterprise data.
With Snowflake’s latest innovations, developers can effortlessly build conversational apps for structured and unstructured data with high accuracy, efficiently run batch large language model (LLM) inference for natural language processing (NLP) pipelines, and train custom models with GPU-powered containers — all with built-in governance, access controls, observability, and safety guardrails to help ensure AI security and trust remain at the forefront.
“For enterprises, AI hallucinations are simply unacceptable. Today’s organizations require accurate, trustworthy AI in order to drive effective decision-making, and this starts with access to high-quality data from diverse sources to power AI models,” said Baris Gultekin (pictured), Head of AI, Snowflake. “The latest innovations to Snowflake Cortex AI and Snowflake ML enable data teams and developers to accelerate the path to delivering trusted AI with their enterprise data, so they can build chatbots faster, improve the cost and performance of their AI initiatives, and accelerate ML development.”
Snowflake Enables Enterprises to Build High-Quality Conversational Apps, Faster
Thousands of global enterprises leverage Cortex AI to seamlessly scale and productionize their AI-powered apps, with adoption more than doubling in just the past six months alone. Snowflake’s latest innovations make it easier for enterprises to build reliable AI apps with more diverse data sources, simplified orchestration, and built-in evaluation and monitoring — all from within Snowflake Cortex AI, Snowflake’s fully managed AI service that provides a suite of generative AI features. Snowflake’s advancements for end-to-end conversational app development enable customers to:
- Create More Engaging Responses with Multimodal Support: Organizations can now enhance their conversational apps with multimodal inputs like images, soon to be followed by audio and other data types, using multimodal LLMs such as Meta’s Llama 3.2 models with the new Cortex COMPLETE Multimodal Input Support (private preview soon).
- Gain Access to More Comprehensive Answers with New Knowledge Base Connectors: Users can quickly integrate internal knowledge bases using managed connectors such as the new Snowflake Connector for SharePoint (now in public preview), so they can tap into their Microsoft 365 SharePoint files and documents, automatically ingesting files without having to manually preprocess documents. Snowflake is also helping enterprises chat with unstructured data from third parties — including news articles, research publications, scientific journals, textbooks, and more — using the new Cortex Knowledge Extensions on Snowflake Marketplace (now in private preview). This is the first and only third-party data integration for generative AI that respects the publishers’ intellectual property through isolation and clear attribution. It also creates a direct pathway to monetization for content providers.
- Achieve Faster Data Readiness with Document Preprocessing Functions: Business analysts and data engineers can now easily preprocess data using short SQL functions to make PDFs and other documents AI-ready through the new PARSE_DOCUMENT (now in public preview) for layout-aware document text extraction and SPLIT_TEXT_RECURSIVE_CHARACTER (now in private preview) for text chunking functions in Cortex Search (now generally available).
- Reduce Manual Integration and Orchestration Work: To make it easier to receive and respond to questions grounded on enterprise data, developers can use the Cortex Chat API (public preview soon) to streamline the integration between the app front-end and Snowflake. The Cortex Chat API combines structured and unstructured data into a single REST API call, helping developers quickly create Retrieval-Augmented Generation (RAG) and agentic analytical apps with less effort.
- Increase App Trustworthiness and Enhance Compliance Processes with Built-in Evaluation and Monitoring: Users can now evaluate and monitor their generative AI apps with over 20 metrics for relevance, groundedness, stereotype, and latency, both during development and while in production using AI Observability for LLM Apps (now in private preview) — with technology integrated from TruEra (acquired by Snowflake in May 2024).
- Unlock More Accurate Self-Serve Analytics: To help enterprises easily glean insights from their structured data with high accuracy, Snowflake is announcing several improvements to Cortex Analyst (in public preview), including simplified data analysis with advanced joins (now in public preview), increased user friendliness with multi-turn conversations (now in public preview), and more dynamic retrieval with a Cortex Search integration (now in public preview).
Snowflake Empowers Users to Run Cost-Effective Batch LLM Inference for Natural Language Processing
Batch inference allows businesses to process massive datasets with LLMs simultaneously, as opposed to the individual, one-by-one approach used for most conversational apps. In turn, NLP pipelines for batch data offer a structured approach to processing and analyzing various forms of natural language data, including text, speech, and more. To help enterprises with both, Snowflake is unveiling more customization options for large batch text processing, so data teams can build NLP pipelines with high processing speeds at scale, while optimizing for both cost and performance.
Snowflake is adding a broader selection of pre-trained LLMs, embedding model sizes, context window lengths, and supported languages to Cortex AI, providing organizations increased choice and flexibility when selecting which LLM to use, while maximizing performance and reducing cost. This includes adding the multi-lingual embedding model from Voyage, multimodal 3.1 and 3.2 models from Llama, and huge context window models from Jamba for serverless inference. To help organizations choose the best LLM for their specific use case, Snowflake is introducing Cortex Playground (now in public preview), an integrated chat interface designed to generate and compare responses from different LLMs so users can easily find the best model for their needs.
When using an LLM for various tasks at scale, consistent outputs become even more crucial to effectively understand results. As a result, Snowflake is unveiling the new Cortex Serverless Fine-Tuning (generally available soon), allowing developers to customize models with proprietary data to generate results with more accurate outputs. For enterprises that need to process large inference jobs with guaranteed throughput, the new Provisioned Throughput (public preview soon) helps them successfully do so.
Customers Can Now Expedite Reliable ML with GPU-Powered Notebooks and Enhanced Monitoring
Having easy access to scalable and accelerated compute significantly impacts how quickly teams can iterate and deploy models, especially when working with large datasets or using advanced deep learning frameworks. To support these compute-intensive workflows and speed up model development, Snowflake ML now supports Container Runtime (now in public preview on AWS and public preview soon on Microsoft Azure), enabling users to efficiently execute distributed ML training jobs on GPUs. Container Runtime is a fully managed container environment accessible through Snowflake Notebooks (now generally available) and pre-configured with access to distributed processing on both CPUs and GPUs. ML development teams can now build powerful ML models at scale, using any Python framework or language model of their choice, on top of their Snowflake data.
“As an organization connecting over 700,000 healthcare professionals to hospitals across the US, we rely on machine learning to accelerate our ability to place medical providers into temporary and permanent jobs. Using GPUs from Snowflake Notebooks on Container Runtime turned out to be the most cost-effective solution for our machine learning needs, enabling us to drive faster staffing results with higher success rates,” said Andrew Christensen, Data Scientist, CHG Healthcare. “We appreciate the ability to take advantage of Snowflake’s parallel processing with any open source library in Snowflake ML, offering flexibility and improved efficiency for our workflows.”
Organizations also often require GPU compute for inference. As a result, Snowflake is providing customers with new Model Serving in Containers (now public preview on AWS). This enables teams to deploy both internally and externally-trained models, including open source LLMs and embedding models, from the Model Registry into Snowpark Container Services (now generally available on AWS and Microsoft Azure) for production using distributed CPUs or GPUs — without complex resource optimizations.
In addition, users can quickly detect model degradation in production with built-in monitoring with the new Observability for ML Models (now in public preview), which integrates technology from TruEra to monitor performance and various metrics for any model running inference in Snowflake. In turn, Snowflake’s new Model Explainability (now in public preview) allows users to easily compute Shapley values — a widely-used approach that helps explain how each feature is impacting the overall output of the model — for models logged in the Model Registry. Users can now understand exactly how a model is arriving at its final conclusion, and detect model weaknesses by noticing unintuitive behavior in production.