Thursday, November 21, 2024
HomeCryptoCohere adds vision to its RAG search capabilities

Cohere adds vision to its RAG search capabilities

-


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Cohere has added multimodal embeddings to its search model, allowing users to deploy images to RAG-style enterprise search. 

Embed 3, which emerged last year, uses embedding models that transform data into numerical representations. Embeddings have become crucial in retrieval augmented generation (RAG) because enterprises can make embeddings of their documents that the model can then compare to get the information requested by the prompt. 

The new multimodal version can generate embeddings in both images and texts. Cohere claims Embed 3 is “now the most generally capable multimodal embedding model on the market.” Aidan Gonzales, Cohere co-founder and CEO, posted a graph on X showing performance improvements in image search with Embed 3. 

“This advancement enables enterprises to unlock real value from their vast amount of data stored in images,” Cohere said in a blog post. “Businesses can now build systems that accurately and quickly search important multimodal assets such as complex reports, product catalogs and design files to boost workforce productivity.”

Cohere said a more multimodal focus expands the volume of data enterprises can access through an RAG search. Many organizations often limit RAG searches to structured and unstructured text despite having multiple file formats in their data libraries. Customers can now bring in more charts, graphs, product images, and design templates. 

Performance improvements

Cohere said encoders in Embed 3 “share a unified latent space,” allowing users to include both images and text in a database. Some methods of image embedding often require maintaining a separate database for images and text. The company said this method leads to better-mixed modality searches. 

According to the company, “Other models tend to cluster text and image data into separate areas, which leads to weak search results that are biased toward text-only data. Embed 3, on the other hand, prioritizes the meaning behind the data without biasing towards a specific modality.”

Embed 3 is available in more than 100 languages. 

Cohere said multimodal Embed 3 is now available on its platform and Amazon SageMaker. 

Playing catch up

Many consumers are fast becoming familiar with multimodal search, thanks to the introduction of image-based search in platforms like Google and chat interfaces like ChatGPT. As individual users get used to looking for information from pictures, it makes sense that they would want to get the same experience in their working life. 

Enterprises have begun seeing this benefit, too, as other companies that offer embedding models provide some multimodal options. Some model developers, like Google and OpenAI, offer some type of multimodal embedding. Other open-source models can also facilitate embeddings for images and other modalities. The fight is now on the multimodal embeddings model that can perform at the speed, accuracy and security enterprises demand. 

Cohere, which was founded by some of the researchers responsible for the Transformer model (Gomez is one of the writers of the famous “Attention is all you need” paper), has struggled to be top of mind for many in the enterprise space. It updated its APIs in September to allow customers to switch from competitor models to Cohere models easily. At the time, Cohere had said the move was to align itself with industry standards where customers often toggle between models. 



Source link

LATEST POSTS

Palo Alto Networks Tops Estimates, Announces Stock Split

Palo Alto Networks reported better revenue and profit than expected for the first...

Can Trump really pay off the US national debt with bitcoin?

“The press takes him literally, but not seriously; his supporters take him seriously, but not literally.” So wrote Salena Zito, back in September 2016,...

NSE Investments to offload 20% stake in Protean eGov via OFS, sets floor price at Rs 1,550/share

NSE Investments, which is a non-promoter shareholder in Mumbai-headquartered Protean eGov Technologies, plans to sell up to 20.31% stake in the company via an...

Staples, essentials show strong growth amid consumption slowdown, signalling potential recovery

Staples and essentials are largely bucking the consumption slowdown at mostly double digit volume sales growth, which industry executives said indicates consumers are not...

Most Popular

spot_img