How to automate Instagram engagements with computer vision (and get banned)

Obviously, Instagram does not want you to automate engagement. Their HTML is a mess of randomly generated class names and deeply nested divs. The structure changes every deployment. Any script that relies on DOM selectors breaks within weeks because the class name doesn't exist anymore.

But it doesn't matter anyway. Instagram can obfuscate their code all they want because code is for machines. But UI... The UI is for humans. A heart icon has to look like a heart icon. A comment button has to be where users expect it. The layout has to be consistent enough that a person can easily navigate it.

So instead of fighting the DOM, let's just bypass it entirely. Take a screenshot. Find the heart by its visual appearance. Get its coordinates. Move the cursor there. Click. Done.

This works on anything that renders to pixels. Web apps, native apps, games, terminals. If a human can see it and click it, a computer can too. No selectors, no APIs, no platform-specific hooks. Just computer vision and cursor automation.

Unfortunately, you can't just hardcode a position. Things move around all the time. A long caption pushes the action bar down. A location tag adds a line. A carousel of images takes up more vertical space. Every post compresses or expands the layout differently.

Navigate between 2 posts and watch what happens to the hearts' position:

Hearts move between posts. The positions are never the same

Computer vision solves this. Instead of guessing where the hearts should be, you look at the screen and find where they actually are.

The First Problem: Too Much Screen, Too Many False Positives

The naive approach is simple: take the heart icon as a template and find it on the screen. Wherever it matches, that's a heart. It's the most basic computer vision operation you can do.

It doesn't work very well.

A full screenshot is huge. Over 7 million pixels on a typical screen. And a heart is small, roughly 70x60 pixels. That's a lot of surface area to search. In that sea of pixels there are plenty of things that vaguely resemble a heart. You get too many false positives.

The detection technique is fine. The search space is the problem. The screen is full of noise. The more area you search, the more noise you find.

Shrink the Search Space

The fix is to stop searching the whole screen. Instead, find things on screen that are easy to detect and use them to figure out where the hearts must be.

On Instagram, two things are consistently easy to find:

Both can be found with basic template matching in milliseconds. But we don't care about them for their own sake. We care about what they tell us: the triple-dots sits directly above the heart column. The action bar sits directly below it. If we know where those two landmarks are, we know exactly where the hearts are. They're in the vertical strip between them.

crop.x      = triple_dots.x
crop.y      = triple_dots.y + triple_dots.height
crop.width  = triple_dots.width
crop.height = action_bar.y - crop.y - action_bar.height x 0.2

The only things left in the search region are actual hearts and whatever happens to be in that exact column.

And since the crop region is derived from the actual positions of the landmarks on screen rather than being hardcoded, it adapts to every post automatically. The landmarks might be higher or lower depending on the post content, but the geometric relationship between them and the hearts is always the same.

The Sliding Window

Now that the search space is tiny and clean, you can run the sliding window. Take the heart template and slide it across the search region pixel by pixel. Score every position. The better the match, the more likely it's a heart.

The sliding window is deliberately loose to catch every possible heart. But that means it also catches things that aren't hearts.

Hearts on Instagram are all in one vertical column. Every single one. Most detections will be on that line. Anything not on it is an outlier:

·   ·   ·   ·   ·   ·   ·   ·
·   ·   ·   ·   ·   ♡   ·   ·     ← most detections are here
·   ·   ·   ·   ·   ·   ·   ·
·   ·   ·   ·   ·   ♡   ·   ·     ← same column
·   ✕   ·   ·   ·   ·   ·   ·     ← outlier (off to the left)
·   ·   ·   ·   ·   ♡   ·   ·     ← same column
·   ·   ·   ·   ·   ·   ·   ·
·   ·   ·   ·   ·   ♡   ·   ·     ← same column
·   ·   ·   ·   ·   ·   ·   ·
·   ·   ·   ·   ·   ·   ✕   ·     ← outlier (off to the right)
·   ·   ·   ·   ·   ♡   ·   ·     ← same column
·   ·   ·   ·   ·   ·   ·   ·

The hearts (♡) cluster on one X coordinate. The false positives (✕) are scattered. The sliding window thought they looked heart-shaped, but they're not in the column.

So we find the most common X among all detections, the consensus line, and discard anything more than 10 pixels away. It's just finding the mode of the X values and treating everything else as noise. A few lines of code and nearly all false positives are gone. The sliding window was deliberately loose to catch every possible heart. This filter is tight to remove everything that isn't one.

The Full Detection Pipeline

Take screenshot
  → Find triple-dots and action bar via template matching
  → Calculate crop region from their positions
  → Crop to a 60-pixel-wide strip
  → Run sliding window template matching at multiple scales
  → Vertical alignment filter
  → Deduplicate, sort top to bottom
  → Return [{x, y}, ...]

The explainer video shows this whole process:

Each detected heart becomes a click coordinate. The server returns them as JSON. The client moves the cursor to each one and clicks.

The Ban

The account was banned within days.

I tried a few things to make it less obvious. Bezier curves for natural cursor movement, randomized click timing, idle fidgeting between actions, random offsets on the follow button. All the usual tricks.

Of course, it didn't work. Instagram spends a lot of money on bot detection and their team is working on it full time.

But the experiment was still interesting. Can you point a camera at any screen and interact with it using only computer vision and cursor automation? The answer is yes.

Read the code