-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marketshare: According to what/who/how? #190
Comments
On the UA string, at least Chrome, Edge, Firefox, Opera, Safari, Samsung Internet, and UC Browser can be distinguished by the UA string. AIUI Brave and Vivaldi cannot be distinguished from Chrome. I think the hardest problem here is which sites' logs count as representative for the global population. Wikipedia comes to mind but is it alone representative? If we had stats from many big sites, a general picture would emerge, but could we turn it into concrete numbers that would be acceptable for use in the definition of what features are broadly available? It seems fraught, but not impossible. |
FWIW, even if we had this data, I think we ought not use it as the direct input to what features are broadly available. An obvious problem is that usage fluctuates, and we don't want features to flicker in and out of Baseline as this happens. Some level of indirection is needed. One possibility is per-browser models of release uptake or "decay" of old releases. It probably looks like S-curve adoption of the new and exponential decay for old browser versions. And each browser vendor is in the best position to define this for their own browser. Given this, we could model when a feature is available to 95% of each browser's users. In practice it would mean >95% overall availability, but without having to use a weight (market share) for each browser. |
I would argue that if this can easily happen then that definition of baseline is kind of broken or not especially useful.
Is it conceivable that we could come up with a way that doesn't use specific regular sites or UA strings? Idk, I don't have a real suggestion but both of those seem/always have seemed to me problematic. I feel like there aren't "representative sites" and browsers (especially the ones that don't control overly large marketshare) sometimes (more than I realized) have to UA-string lie to sites that are popular. |
You can test actual features in a client side script and gather stats on passes/failures of that. Each test would be minimal (e.g. Distribution could be an iframe that anyone can add to their sites. It wouldn't be unbiased, but it would have different biases that maybe are less bad. |
Yes, this is why I don't consider the seemingly simple "available to 95% of users" an option. (Nobody has proposed it, I'm not arguing against anyone here.) The current definition of "2+ major releases" has other downsides, but not flip-flopping. |
I feel like we're understanding that differently somehow. I was suggesting that we'd have to define it in such a way, still using real numbers, that it didn't flip-flop easily. But, actually reversing isn't a bug, necessarily at some point, right? It could happen over time, even with the releases based version and if it did we should admit that. |
Are you thinking of features that are removed from the platform, or are there other cases where a widely supported feature is no longer widely supported? The only condition I can think of is when a browser without support gains a lot of users fast, but is there a concrete case? |
I guess it can (very rarely) happen when a feature is removed that had been shipped in all major browsers, in which case yes, we'd have to remove the Baseline indicator for a feature. Much harder when it comes to the "Baseline 24" named feature set, that is supposed to be a fixed feature set, discussed in #176 Re share, browser versions are only proxies for what is in the end market share, and I agree that we're lacking actual reliable data on that. Maybe one first step would be to document the current state and the related issues? And maybe there are multiple problems to solve here, one of marketshare per browser (which might be impossible in the end), but also versions used as a share of a given browser, as @foolip said, which might be much more tractable. |
Maybe it shouldn't be a fixed set? Nuance between "Passes baseline requirements since 2024" and "Passes baseline requirements of 2024" |
Yes, that's definitely one way to handle it and maybe the only real option, but it removes some of the utility that comes from a fixed set. Let's continue this conversation in #176 |
As far as I know, the most widely used data source for browser version market share is StatCounter. https://gs.statcounter.com/browser-version-market-share This is the datasource that caniuse.com uses. Browserslist, in turn, pulls its data from caniuse.com, so it's StatCounter all the way down. But that's why Browserslist and Caniuse allow you to submit your own data from Google Analytics. |
Is anyone aware of any large web properties that make their browser stats publicly available? |
https://radar.cloudflare.com/adoption-and-usage This has some bits of info.
@JakeChampion we're interested in stats on browser version usage and uptake of new versions. As far as I know no stats are gathered for polyfill.io |
That's correct |
As @romainmenke mentioned, Browserlist doesn't provide per version data for mobile browsers (not sure why), so caniuse.com doesn't use that as the source. @Fyrd would you mind sharing where mobile usage data on caniuse.com comes from? |
The issue about mobile Android Chrome versions is filed here. Fyrd/caniuse#3518 As I said:
|
Last year Akamai launched RUMArchive a public BigQuery dataset of their traffic (anonymised), including Browser and version used. For example, this query (1.94 GB so pretty cheap) gets the latest browser versions for all of April: SELECT DEVICETYPE, USERAGENTFAMILY, USERAGENTVERSION, SUM(BEACONS) AS BEACONCOUNT
FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE date >= '2023-04-01' AND date < '2023-05-01'
GROUP BY DEVICETYPE, USERAGENTFAMILY, USERAGENTVERSION
ORDER BY BEACONCOUNT DESC Which gives this result:
Full data for that query here: https://docs.google.com/spreadsheets/d/14KiJdLG5iEtYoUp1E_mpLBxHBAsvUPbOZlNK6RKj6H8/edit#gid=1816182137 Probably best reaching out to @nicjansma if you want more details (or to verify my query!). |
@tunetheweb that's very cool, thank you! Pardon my clumsy SQL, but I managed to query the version breakdown of Chrome and Safari mobile: WITH versions AS (
SELECT USERAGENTVERSION, SUM(BEACONS) AS VERSIONBEACONS
FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE date >= '2023-04-01' and date < '2023-05-01' and USERAGENTFAMILY = 'Chrome Mobile' # or 'Mobile Safari'
GROUP BY USERAGENTVERSION
), total as (SELECT SUM(VERSIONBEACONS) FROM versions)
SELECT USERAGENTVERSION, VERSIONBEACONS, (100 * VERSIONBEACONS / total[0]) AS PERCENTAGE FROM versions, total
ORDER BY VERSIONBEACONS DESC The top 10 rows for Chrome Android:
And Safari iOS:
According to this data then, Chrome 111+112 make up 90%, and Chrome 106-112 make up 95%. For Safari iOS, 15+16 make up 95%, and I've filed rum-archive/rum-archive#16 about minor versions here. |
Counting how many releases are needed to cover 95% of users on a bunch of browsers, according to the above RUMArchive data:
I don't know anything about what kinds of users are over- and underrepresented in this data, but it's a data point at least. |
In dfabulich/baseline-calculator#7 I've reported that something seems quite strange with the Firefox version breakdown that caniuse has, which I believe are from statcounter. I think a useful exercise (for all browsers) would be to compare version breakdown between sources (rumarchive and statcounter currently) and look for differences in the shape of the distribution, in particular in how fat the long tail is. This makes a huge difference to any availability calculation, and has a bigger impact the closer to 100% we want to get. |
Keep in mind that rum archive has usage data, not users. Each user has multiple hits. |
https://en.wikipedia.org/wiki/Usage_share_of_web_browsers has some good information. From there I found https://analytics.usa.gov/ which at first glance seems like it should be representative of the USA, but the site itself says that 97.7% of traffic is international, which seems strange. But it does say "Visitor Locations Right Now" and also "Realtime data may not be accurate" so maybe it's a temporary hickup. |
I've taken a look at the statcounter data (via caniuse) in similar way to #190 (comment) to see how many versions are needed to reach 95% of each browser individually, directly from the data and ignoring caniuse features in the analsysis. I've sent dfabulich/baseline-calculator#10 and created a spreadsheet to to explore, and found that you need these version ranges to get to 95%:
For Chrome and Firefox, these ranges are much wider than in #190 (comment). I don't have any theory of why and I don't know which data source is closer to the truth. |
I'm going to close this now-historic issue relating to writing the definition of Baseline. But I'll try to recap a little bit: Ultimately, the Baseline definition did not rely on market share directly, though we used discussions like this one to come up with a definition of Baseline (widely available) that covers an overwhelming majority of browser usage share, no matter how you make that measurement. There's probably more interesting work to be done in this area, but I don't know of any plans to do so or active work in this area, with one exception: RUMArchive reports on Baseline's current usage share at https://rumarchive.com/insights/#baseline. All that said, if I've missed something and there's still interest in this topic, please ask and I'll reopen it. Thank you! |
Separating this from #174, tho I expect the best answer to 174 could rely on it.
In fact, there are almost countless decisions which are made (even, for example, in terms of selling ads or default search) in large part based on how many users a browser has. So, we also wind up with lots of people trying their honest best to service users and make good decisions, but they have just no hope of really knowing. Without facts that we can agree to and trust it seems impossible for anyone to make good decisions here.
As relates to baseline in particular, there are an almost infinite number of ways to slice where one could draw the baseline - but discussing whether any of them are really 'good' really requires agreement on a trustable data source.
It's not clear to me that such a thing really exists. What we cite today, as I understand it, involves trackers, "popular sites" lists and UA String inspection (which is literally full of lies, especially for browsers which would be undercounted and especially for popular sites). Most of those are things that I personally would like less of, so I don't think these methods are going to get more useful, only less. I could be wrong! I would love to understand how I am if so!
I'm not sure there has ever been a truly great way to know this. Of course, a web browser could collect some fairly easy data point itself, sending a ping once a week or something if it is used but then who would believe a browser self-reporting? I think it's definitely important that we have a way to measure what "outliers" are too - not just how many chrome, safari and firefox versions there are. Maybe an "everything else" is ok, but it would be great to have a way to really count these
The text was updated successfully, but these errors were encountered: