Mirascope is a strong and user-friendly library that offers a unified interface for working with a wide range of Huge Language Model (LLM) suppliers, along with OpenAI, Anthropic, Mistral, Google (Gemini and Vertex AI), Groq, Cohere, LiteLLM, Azure AI, and Amazon Bedrock. It simplifies each half from textual content material period and structured information extraction to developing difficult AI-powered workflows and agent strategies.
On this info, we’ll give consideration to using Mirascope’s OpenAI integration to find out and take away semantic duplicates (entries which is able to differ in wording nevertheless carry the similar which means) from a listing of purchaser opinions.
Placing within the dependencies
pip arrange "mirascope[openai]"
OpenAI Key
To get an OpenAI API key, go to https://platform.openai.com/settings/group/api-keys and generate a model new key. For individuals who’re a model new particular person, you may want in order so as to add billing particulars and make a minimal charge of $5 to activate API entry.
import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
Defining the itemizing of purchaser opinions
customer_reviews = [
"Sound quality is amazing!",
"Audio is crystal clear and very immersive.",
"Incredible sound, especially the bass response.",
"Battery doesn't last as advertised.",
"Needs charging too often.",
"Battery drains quickly -- not ideal for travel.",
"Setup was super easy and straightforward.",
"Very user-friendly, even for my parents.",
"Simple interface and smooth experience.",
"Feels cheap and plasticky.",
"Build quality could be better.",
"Broke within the first week of use.",
"People say they can't hear me during calls.",
"Mic quality is terrible on Zoom meetings.",
"Great product for the price!"
]
These opinions seize key purchaser sentiments: reward for sound prime quality and ease of use, complaints about battery life, assemble prime quality, and identify/mic factors, along with a constructive phrase on price for money. They replicate widespread themes current in precise particular person options.
Defining a Pydantic Schema
This Pydantic model defines the development for the response of a semantic deduplication job on purchaser opinions. This schema helps building and validate the output of a language model tasked with clustering or deduplicating pure language enter (e.g., particular person options, bug tales, product opinions).
from pydantic import BaseModel, Topic
class DeduplicatedReviews(BaseModel):
duplicates: itemizing[list[str]] = Topic(
..., description="A list of semantically equal purchaser overview groups"
)
opinions: itemizing[str] = Topic(
..., description="The deduplicated itemizing of core purchaser options themes"
)
Defining a Mirascope @openai.identify for Semantic Deduplication
This code defines a semantic deduplication carry out using Mirascope’s @openai.identify decorator, which permits seamless integration with OpenAI’s gpt-4o model. The deduplicate_customer_reviews carry out takes a listing of purchaser opinions and makes use of a structured speedy—outlined by the @prompt_template decorator—to info the LLM in determining and grouping semantically comparable opinions.
The system message instructs the model to analyze the which means, tone, and intent behind each overview, clustering individuals who convey the similar options even when worded another way. The carry out expects a structured response conforming to the DeduplicatedReviews Pydantic model, which contains two outputs: a listing of distinctive, deduplicated overview sentiments, and a listing of grouped duplicates.
This design ensures that the LLM’s output is every right and machine-readable, making it excellent for purchaser options analysis, survey deduplication, or product overview clustering.
from mirascope.core import openai, prompt_template
@openai.identify(model="gpt-4o", response_model=DeduplicatedReviews)
@prompt_template(
"""
SYSTEM:
You are an AI assistant serving to to analyze purchaser opinions.
Your job is to group semantically comparable opinions collectively -- even once they're worded another way.
- Use your understanding of which means, tone, and implication to group duplicates.
- Return two lists:
1. A deduplicated itemizing of the essential factor distinct overview sentiments.
2. A list of grouped duplicates that share the similar underlying options.
USER:
{opinions}
"""
)
def deduplicate_customer_reviews(opinions: itemizing[str]): ...
The following code executes the deduplicate_customer_reviews carry out using a listing of purchaser opinions and prints the structured output. First, it calls the carry out and outlets the result inside the response variable. To ensure that the model’s output conforms to the anticipated format, it makes use of an assert assertion to validate that the response is an event of the DeduplicatedReviews Pydantic model.
As quickly as validated, it prints the deduplicated ends in two sections. The first half, labeled “✅ Distinct Purchaser Strategies,” exhibits the itemizing of distinctive overview sentiments acknowledged by the model. The second half, “🌀 Grouped Duplicates,” lists clusters of opinions which were acknowledged as semantically equal.
response = deduplicate_customer_reviews(customer_reviews)
# Assure response format
assert isinstance(response, DeduplicatedReviews)
# Print Output
print("✅ Distinct Purchaser Strategies:")
for merchandise in response.opinions:
print("-", merchandise)
print("n🌀 Grouped Duplicates:")
for group in response.duplicates:
print("-", group)
The output displays a transparent summary of purchaser options by grouping semantically comparable opinions. The Distinct Purchaser Strategies half highlights key insights, whereas the Grouped Duplicates half captures utterly totally different phrasings of the similar sentiment. This helps eradicate redundancy and makes the options easier to analyze.
Attempt the entire Codes. All credit score rating for this evaluation goes to the researchers of this endeavor.
Ready to connect with 1 Million+ AI Devs/Engineers/Researchers? See how NVIDIA, LG AI Evaluation, and prime AI corporations leverage MarkTechPost to realize their goal market [Learn More]

I’m a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I’ve a keen curiosity in Data Science, notably Neural Networks and their utility in quite a few areas.
Elevate your perspective with NextTech Data, the place innovation meets notion.
Uncover the most recent breakthroughs, get distinctive updates, and be part of with a worldwide neighborhood of future-focused thinkers.
Unlock tomorrow’s developments proper this second: study further, subscribe to our e-newsletter, and change into part of the NextTech group at NextTech-news.com
Keep forward of the curve with NextBusiness 24. Discover extra tales, subscribe to our e-newsletter, and be part of our rising neighborhood at nextbusiness24.com

