A store rating below four stars is not a cosmetic problem. For most consumer categories, 4.0 is the line at which a listing stops being judged on its merits and starts being filtered on its number — and below it, every install gets more expensive. We have written elsewhere about why the threshold behaves the way it does. This is the other half: what the apps around that line actually look like when you read the data instead of the intuition.
So we took a sample of 100 consumer apps on Google Play — across finance, tools, productivity, travel, communication, education, shopping and social — and looked only at what is publicly visible: the rating, the size of the review base, and the themes inside the negative reviews. No client data, no names. Just the public storefront, aggregated.
1 · The danger zone is quiet
We built this sample to study apps near the line, so the honest caveat comes first: we are not claiming that three in four apps on the store sit below 4.0. What is worth noticing is the shape. Among the struggling majority, almost none were in crisis territory. They were not at two stars with users in revolt. They sat in a tight band between 3.5 and 3.9 — high enough that no dashboard turns red, low enough that the listing is being taxed on every visit.
That is the part that makes sub-4.0 drift expensive. A 2.5-star app knows it has a problem; someone owns it. A 3.7-star app looks broadly fine in every internal review, and the cost — fewer of the people who see the listing choosing to install — never appears as a line item. The median app in our sample was a 3.9: one tenth of a star from the line, and paying for it daily.
2 · A large review base does not protect you
The common assumption is that scale is a buffer — that an app with hundreds of thousands of ratings has earned enough goodwill to sit comfortably. The data did not support it. Apps with 100,000 or more reviews averaged 3.90; apps with smaller bases averaged 3.85. A rounding difference. Size bought no meaningful lift.
If anything, scale cuts the other way. A rating is a slow-moving average, and the larger the base, the more inertia it carries: on a hundred thousand reviews, a year of fresh five-star sentiment barely moves the number, because every new review is diluted by the hundred thousand before it. Big review bases do not float above the line on their own. They sit exactly where their history put them, and they are the hardest to move once they slip — which is precisely why the apps that most need a deliberate intervention are often the largest ones.
3 · What is actually dragging the number
Reading the negative reviews by theme, the drivers were not evenly spread. Averaged across the sample, two themes dominated the complaint mix — and they were not the ones teams usually reach for first.
Narrowing to the apps that actually sat below 4.0, the picture sharpens. For more than a third of them, the single largest complaint was technical — crashes and bugs (the top theme for 36% of sub-4.0 apps). Pricing led for another 23%, and missing features for 19%. Together, those three accounted for roughly four in five of the struggling apps.
The useful read is not "fix bugs." It is that a sub-4.0 rating is almost always a few specific, identifiable signal clusters — not a diffuse sense that users dislike the app. The number looks like a vague verdict. Underneath, it is usually two or three nameable problems carrying most of the weight, and a long tail that only looks loud.
What it means
None of this is a recipe. The point of reading the data this way is the opposite of a recipe: it is that the rating is a lagging indicator of specific causes, and the causes are legible if you separate them from the noise. The apps in this sample were not bad products. Most were competent apps a few tenths of a star below where installs convert, carrying a review base too large to drift back up on its own, with a complaint mix concentrated enough to act on.
That is the whole shape of the problem RIENVOR works on: a slow number with fast consequences, sitting just below a line, owned by no one — and recoverable once someone reads which complaints are actually moving it and which only look like they should.
Method & limits
Sample: 100 consumer apps on Google Play, drawn across eight categories and selected to sit near or below the 4.0 line — so it is a study of apps around the threshold, not a representative census of the store. Figures are from publicly visible storefront data (rating, review-base size, and complaint themes derived from public reviews) captured in mid-2026. Complaint themes are classified into eight buckets; percentages are the average share of negative reviews per theme. No client or engagement data is used, and no individual app is named. Ratings move over time; these are a point-in-time read.