Reddit is bursting with unfiltered, high-context product feedback, but turning that noisy goldmine into something useful requires structure and automation. In this post, we walk through how we built a system to monitor Reddit for HubSpot mentions, filter them for relevance, and use AI to extract insights we can actually use.
Whether you're in product, marketing, or customer experience, this breakdown will show you how to capture voice-of-customer data from Reddit without ever opening a browser tab.
Why Reddit Is a Hidden Source of Product Feedback
Reddit may not be your typical B2B data source, but it is full of high-signal content. Users post detailed questions, share their pain points, and compare tools openly, often more honestly than they would on professional networks.
That honesty comes from anonymity. Unlike LinkedIn, users do not feel pressured to self-promote or sugar-coat their opinions. That gives you access to the raw, unfiltered truth. It is the kind that rarely surfaces in NPS forms or support tickets.
But it is also overwhelming. Without automation, you will spend hours wading through irrelevant threads, reposts, or off-topic rants. This is why we built a smart, AI-powered workflow to handle the heavy lifting.
Step 1: Collecting Reddit Posts About HubSpot

Our system kicks off on a schedule: every Sunday, Tuesday, and Thursday. It uses a Reddit node to search subreddits like r/HubSpot, r/SaaS, and r/CRM for recent posts.
We immediately apply three filters:
Recency: Only posts from the past two days are included.
Deduplication: We exclude already-processed posts.
Upvotes: Higher-voted posts are prioritised as they are more likely to have value.
This cuts down the volume and ensures the next steps only process fresh and relevant content.
We also skip weekend posts on Fridays and Saturdays, since engagement is lower and there is more noise. This gives us cleaner data and reduces false positives.
Step 2: Using AI to Check for Compliance
As a first step in our analysis, we check each post for "compliance". In other words, does this post meet our standards for relevance?
An AI agent reviews each post and adds a field to the data: compliant: yes or no.
If compliant: The post moves forward.
If non-compliant or there's an error: We discard it to keep our output clean.
This AI filter is far better than keyword rules. It understands tone, context, and intent. It filters out irrelevant content while keeping meaningful posts. It also saves time for teams downstream who do not want to sift through irrelevant posts about hiring, memes, or unrelated product lines.
Step 3: Analysing Top Comments with AI.
Next, we extract the top two comments from each compliant post. These are where most of the gold lies. Users are often troubleshooting, comparing tools, or venting frustrations.
An AI agent helps us:
Identify the best responses
Generate a possible reply
Clean the messy text into usable insights
We have found that summarising top comments often reveals better insights than the post itself. This is especially true when the original post is vague, but the community responds with solutions, references, or contrasting tools.
All of this is stored in a Google Sheet, which acts as our running Reddit database.
Step 4: Enriching the Data with Multiple AI Agents

Once we have our batch of good posts and their top comments, we run them through a series of enrichment steps:
a. Summarise each conversation
We generate concise summaries and tag each one with a sentiment score (positive, negative, neutral).
b. Detect company mentions
An LLM scans the content for competitor or third-party tool mentions. It excludes any comments made by HubSpot employees to maintain external focus.
This is incredibly useful for understanding what tools users are comparing or switching from. It also helps us see how HubSpot is being positioned in organic, unprompted conversations.
c. Identify pain points
We isolate common frustrations, broken features, or confusing workflows mentioned in posts or replies. These are tagged and clustered for later review.
d. Optional: Redly.AI analysis
We use Redly.AI to go deeper. It identifies automation opportunities, user friction, or risk indicators. While this is optional and paid, it adds valuable context.
Step 5: Delivering Insights via Email

After enrichment, the best insights are assembled into an HTML email:
Each item includes the Reddit post, summary, sentiment, and top comments.
Links are clickable so stakeholders can jump straight to the thread.
The email goes out to selected team members automatically.
We also enrich the email with suggestions. These might include whether to follow up on a post, investigate a trend further, or forward it to customer support or sales.
There is a second version of the email in development. It is designed for stakeholders who prefer a high-level summary with fewer links and more visuals.
The Impact: From Community Chatter to Strategic Insight
Reddit may seem like a messy place to mine insights. However, once you automate the pipeline, its value becomes obvious. Our workflow has helped us:
Spot common pain points before they hit support
Discover what customers are saying about competitors
Capture unfiltered feedback that rarely makes it into tickets or surveys
Identify use-case gaps and market opportunities early
This setup saves hours of manual research and gives our product and marketing teams better signal from a trusted source.
It also builds a habit. Teams begin to expect these insights, talk about them, and integrate Reddit data into roadmap planning.
FAQs About Using AI to Scrape Comments on Reddit
1. How hard is it to set up a Reddit monitoring system like this from scratch?
It depends on your tech stack. If you're familiar with tools like n8n or custom workflows using APIs and AI models, it is very doable. The most time-consuming part is designing the filters and prompts to make the AI enrichment meaningful.
2. What kind of AI models are used in this workflow?
We use large language models (LLMs) to summarise posts, identify sentiment, extract pain points, and detect company mentions. You can use models from providers like OpenAI, Anthropic, or open-source options if data privacy is a concern.
3. Can this be used outside of HubSpot or SaaS conversations?
Absolutely. The structure is flexible. You can target subreddits related to any industry, from fintech to e-commerce, and adjust the AI prompts to match your domain language and insight goals.
4. What is the benefit of summarising Reddit threads instead of just reading them?
Reddit threads are time-consuming and inconsistent. Summarising them helps surface what matters most, including key issues, user intent, and emotional tone. This avoids requiring teams to manually comb through hundreds of comments.
5. Is this workflow compliant with Reddit’s API and terms?
Yes, as long as you are using the official Reddit API, respecting rate limits, and not storing personal data unnecessarily. Always check Reddit’s latest developer terms if you're deploying at scale.
Want to Turn Reddit into a Real-Time Insight Engine?
If you're curious about using AI to tap into Reddit or want to adapt a similar system for your product, contact us. We would love to hear what you're working on.