How Do Websites Track User Activity?

10 min read

Explore how websites track users with cookies, pixels, fingerprinting, and server logs—what’s collected, why it’s used, and how to stay compliant.

SS

Simul Sarker

Founder & Product Designer of DataCops

Last Updated

May 17, 2026

A website tracks you in about a dozen ways, and roughly a third of the time it gets you wrong anyway. That second number is the one nobody puts in the guides. They love to explain the cookies, the pixels, the fingerprinting. They go quiet on the part where the tracking misses real people and counts fake ones.

So this is two posts in one. If you are a curious visitor wondering what a site knows about you, you get the honest mechanics. If you are a marketer who runs these trackers for a living, you get the part that should worry you: the data your tools collect is not a recording of real users. It is a distorted simulation - missing a quarter to a third of your actual humans, and salted with bots pretending to be humans.

This is not a "here is how cookies work" post. It is a "how accurately does any of this work" post.

DataCops is named once, here, as the architectural fix for that accuracy gap - first-party collection that filters and separates the data at the source. We will get there. First, the mechanics.

Quick stuff people keep asking

What methods do websites use to track behavior? The main ones: cookies (small files stored in your browser), tracking pixels (tiny invisible images that report back when loaded), JavaScript tags (code that watches clicks, scrolls, form fills), session recording and heatmaps (replays of your actual movement), server logs, and device fingerprinting (identifying you by your hardware and browser configuration). Most sites run several at once through a tag manager.

Do websites track you without cookies? Yes. Device fingerprinting needs no cookie at all - it identifies you by screen size, fonts, browser version, GPU, timezone, dozens of small signals combined into a near-unique ID. Server-side tracking and IP-based methods also work cookie-free. Killing cookies does not make you invisible.

How does a tracking pixel work? A pixel is a 1x1 transparent image embedded in a page or email. When your browser loads it, it sends a request to the tracking server. That request alone - the fact you loaded it, plus your IP, device, and timestamp - is the data. The "image" is just the delivery mechanism. Meta and Google ad pixels work this way.

What is session recording and is it legal? Session recording captures your real interaction - mouse movement, clicks, scrolling, sometimes keystrokes - and lets the site replay it. It is legal in most places if the site discloses it, gets consent where required, and masks sensitive input like passwords and payment fields. Recording without consent in the EU, or capturing keystrokes in form fields, is where sites get into legal trouble.

How do websites track you across devices? Two ways. Deterministic - you log in with the same account on your phone and laptop, so the site knows it is you. Probabilistic - matching shared signals (same IP, same behavior patterns, same location) to guess that two devices are one person. Logged-in identity is the strong one.

Can websites track users with ad blockers on? Partly. Ad blockers and privacy browsers block known tracker domains - so the pixels and third-party JavaScript fail to load. But first-party server-side tracking, server logs, and IP capture still work. Blockers reduce tracking. They do not end it. And as you will see, the blocked share is exactly where the accuracy problem starts.

What data do websites collect about visitors? Commonly: pages viewed, time on page, clicks, scroll depth, referrer (where you came from), device and browser, approximate location from IP, and - if you submit anything - whatever you typed. Logged-in users get tied to their account history.

How does Google Analytics track activity? GA4 loads a JavaScript tag in your browser. The tag fires events - page views, scrolls, clicks, conversions - and sends them to Google's servers, identifying the session with a first-party cookie or a generated ID. It is a client-side tag, which means it lives or dies in the visitor's browser. That dependency is the whole problem.

How accurately does any of this actually work

Now the part the vendor guides skip. Almost every tracking method above - cookies, pixels, JavaScript tags, GA-style analytics - runs in the visitor's browser. The browser is no longer neutral ground. Two failures happen there, and they distort the data in opposite directions.

Failure one: real users go missing. A real and growing share of people run uBlock Origin, run Brave with shields up, or sit behind networks that filter tracker domains. For those visitors, the pixels and third-party tags simply never load. They browse your site, they read, they buy - and your analytics records none of it. That is 25 to 35 percent of genuine human activity invisible. Worse, it is not random invisibility. The people most likely to block trackers are the privacy-aware, often higher-value segment. You are systematically blind to a specific kind of customer.

Failure two: fake users get counted. Of the traffic that does get tracked, a serious slice is not human. Bots, scrapers, crawlers, automated agents, click-fraud scripts - modern ones execute JavaScript just like a browser. They trip your tags. They fire your pixels. They land in your analytics as "sessions" and "users." On a typical site, 24 to 31 percent of collected events are synthetic. Your "users" report is part real people, part machines.

Here is what that looks like in real life. A company called PillarlabAI built a honeypot - a signup flow designed as bait for automated traffic. Three thousand signups came in. They looked like users. They would have shown up in any analytics tool as 3,000 new sessions, 3,000 conversions. When PillarlabAI took the data apart, 77 percent of it was fraudulent. And 650 of those "signups" traced back to a single device fingerprint. One machine, wearing 650 faces, and every analytics platform on earth would have counted it as 650 different interested humans.

Put the two failures together. Your analytics is missing about a third of your real users and padded with a quarter to a third fake ones. These do not cancel out. They corrupt. You are not looking at a recording of user behavior. You are looking at a distorted simulation - and making decisions on it.

It gets worse for anyone running ads. That contaminated data does not just sit in a dashboard. It gets pushed to Meta and Google to build lookalike audiences. So you are telling the ad platforms: find me more people like these users. Some of those users are bots. The platforms obligingly find you more bot-like traffic. Your return on ad spend slips, quarter after quarter, and the dashboard never explains why - because the dashboard is built from the same poisoned data. Garbage in, garbage optimized, garbage out.

The root cause is not that cookies or pixels are badly made. The root cause is structural: third-party tracking scripts, running in an environment the website does not control, scooping real humans and bots into one undifferentiated stream, with no filtering and no isolation before the data leaves the page.

What accurate tracking actually requires

The fix is to stop depending on the visitor's browser as the collection point, and to filter the stream before you trust it.

First-party, server-side collection. Instead of third-party scripts the blockers recognize, collection runs through a first-party endpoint on the website's own subdomain. Because it is the site's own infrastructure, blockers do not treat it as a foreign tracker, and collection is far more resilient. Much of that lost 25 to 35 percent comes back.

Filtering at the point of collection. Every incoming hit gets scored before it counts as a user. Is the IP a known datacenter range? Does the device fingerprint match 650 other "sessions"? Residential human or proxy? The bot gets flagged at the door - not discovered months later when someone finally audits the funnel.

Two tiers, separated at the source. This is the part that also keeps it legal. Anonymous session analytics - aggregate counts, no identity - are legal essentially everywhere and can be collected unconditionally. "Reject all" on a cookie banner does not mean a site gets zero data; it means it should only get the anonymous tier. Identifiable, personal-level tracking is what needs real consent. An honest architecture splits those two streams at collection, so the anonymous picture stays complete and the identifiable data is properly gated.

That is what DataCops is built to do. First-party architecture on the site's own subdomain. Bot filtering at ingestion, checked against an IP database of more than 361.8 billion addresses. Two-tier isolation so anonymous analytics flow freely and identifiable data is consent-gated. Clean signal forwarded to Meta, Google, TikTok, and LinkedIn through CAPI.

To be straight about the limits: DataCops is a newer brand than the household analytics names, and SOC 2 Type II is in progress rather than finished. No system catches 100 percent of bots either - what a good one does is surface the context and the score so you can judge. That honesty is the point. A tool that promises perfect tracking is selling you the same illusion this article just took apart.

Decision guide

Just a curious visitor. A privacy browser or a good blocker cuts most pixel and third-party tracking. It will not stop fingerprinting or server-side logging. There is no full invisibility online - only less exposure.

Small site owner, light traffic. Standard client-side analytics is fine to start. Just know your numbers run low, and do not over-read small swings.

Marketer running paid ads. Your analytics is feeding the ad platforms. If it is contaminated, your audiences are too. Audit bot traffic before you trust another lookalike.

Running session recording or heatmaps. Disclose it, get consent where the law requires, mask sensitive fields. And remember bot sessions show up in replays as noise.

Care about decisions, not just dashboards. Move to first-party server-side collection with bot filtering. It is the only way the numbers describe real people.

You are not measuring users. You are measuring a guess.

Here is the mistake almost everyone makes. They ask "how do websites track users" and they stop once they understand the cookies and pixels. They treat a populated analytics dashboard as the truth. It is not the truth. It is an estimate with a third of the real people missing and a third of the visible "people" being machines.

So the better question - the one to actually sit with - is not how your site tracks users. It is how accurately. Of the "users" in your analytics this month, how many are real humans, and how would you even prove it? If you cannot answer that, you are not measuring your audience. You are measuring a guess, and steering a business on it.


Live traffic quality

Updated just now

Visits · last 24h

487
Real users
35873.5%
Bots · auto-filtered
12926.5%

Without filtering, 26.5% of your reported traffic is bot noise inflating dashboards and draining ad spend.

Don't trust your analytics!

Make confident, data-driven decisions withactionable ad spend insights.

Setup in 2 minutes
No credit card