Measuring the Prevalence of Single Sign-On Providers #
This project aims to understand the prevalence of logins on websites towards the goal of large scale measurement of user-gated content on the Web.
We develop two techniques to measure the number of first-party and third-party login mechanisms, the latter supported by Single Sign-On (SSO), on the top 10K websites based on the Chrome User Experience Report (CrUX) and find that:
51% of the top 10K have a login, and more than half of those (30% of the top 10K) offer 3rd-party SSO login.
The most popular SSO providers are Google, Facebook, and Apple. These three enable sign-in for 47% of all sites with login and 24% of the top 10K sites.
This page covers our research, code, and dataset.
Research Paper #
We published our initial results at ACM IMC 2023 in Montréal, Canada:
@inproceedings{10.1145/3618257.3624841,
author = {Ardi, Calvin and Calder, Matt},
title = {The Prevalence of Single Sign-On on the Web: Towards the Next
Generation of Web Content Measurement},
year = {2023},
isbn = {9798400703829},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3618257.3624841},
doi = {10.1145/3618257.3624841},
abstract = {Much of the content and structure of the Web remains
inaccessible to evaluate at scale because it is gated by user
authentication. This limitation restricts researchers to examining only
a superficial layer of a website: the landing page or public,
search-indexable pages. Since it is infeasible to create individual
accounts across thousands of webpages, we examine the prevalence of
Single Sign-On (SSO) on the web to explore the feasibility of using a
few accounts to authenticate to many sites. We find that 58\% of the top
10K websites with logins are accessible with popular 3rd-party SSO
providers, such as Google, Facebook, and Apple, indicating that
leveraging SSO offers a scalable solution to access a large volume of
user-gated content.},
booktitle = {Proceedings of the 2023 ACM on Internet Measurement
Conference},
pages = {124–130},
numpages = {7},
keywords = {web measurement, web authentication, top lists, single
sign-on},
location = {<conf-loc>, <city>Montreal QC</city>,
<country>Canada</country>, </conf-loc>},
series = {IMC '23}
}
The code and data used in the paper can be found at https://github.com/webmeasurements/imc2023-sso.
Documentation for the code (TODO
) and dataset (TODO
) will be found
here.
Applications #
TODO
Contributing #
TODO
Contact #
- Calvin Ardi (calvin@isi.edu)
- Matt Calder (mjc2317@columbia.edu)