This article discusses the challenges of analyzing data that follows a Power Law distribution and presents a technique called the “Log-Log approach” to detect Power Laws in real-world data. It also introduces the Maximum Likelihood method as a more mathematically sound approach to estimating the parameters of a Power Law distribution. The article provides example code using the powerlaw Python library to fit Power Laws to social media data and compares the results to alternative distributions. The author concludes that Power Laws can be a useful framework for analyzing certain types of data, but caution should be exercised in interpreting the results.
Breaking down a Maximum Likelihood-based approach with example code
Introduction
In this article, we will discuss how to detect Power Laws from real-world data using a Maximum Likelihood-based approach. We will provide a concrete example using social media data.
Power Laws and Gaussian Distributions
Power Laws and Gaussian distributions are two types of distributions that have opposite statistical properties. Power Laws are driven by rare events, while Gaussians are not. Standard statistical tools like regression and mean may give incorrect results when analyzing Power Laws.
The Log-Log Approach
The Log-Log approach is a popular way to fit a Power Law to real-world data. It involves taking the logarithm of the Power Law’s probability density function (PDF) to transform it into a linear equation. By generating a histogram of the data and plotting it on a log-log plot, we can determine if the data follows a Power Law distribution.
Limitations of the Log-Log Approach
The Log-Log approach has limitations. The slope estimation can have systematic errors, regression errors can be hard to estimate, the fit may appear good even if the distribution does not follow a Power Law, and the fit may not obey basic conditions for probability distributions.
The Maximum Likelihood Approach
The Maximum Likelihood approach is a mathematically sound method for inferring the best parameters for a model given some data. It involves obtaining a likelihood function, which quantifies the probability of the data given a particular model, and maximizing the likelihood with respect to the model parameters.
Example Code: Fitting Power Laws to Social Media Data
To demonstrate the approach, we will use the powerlaw Python library to determine if data from various social media channels follow a Power Law distribution. We will generate artificial data from Pareto and Log Normal distributions and fit a Power Law to each sample. We will also apply the approach to real-world data, including monthly followers gained on Medium, earnings from YouTube videos, and daily impressions on LinkedIn posts.
Conclusion
Detecting Power Laws in real-world data can help avoid incorrect analyses and misleading conclusions. By implementing the Maximum Likelihood-based approach, companies can evolve with AI, stay competitive, and redefine their sales processes and customer engagement. For more information, connect with us at hello@itinai.com and explore our AI Sales Bot solution at itinai.com/aisalesbot.