The Green Sheet Online Edition
January 25, 2016 • Issue 16:01:02
FTC takes on big data
A 50-page report released Jan. 6, 2016, by the Federal Trade Commission challenges business owners and consumers to evaluate the benefits and risks of data analytics. Big Data: A Tool for Inclusion or Exclusion? poses tough questions about how data is collected and analyzed and where it is stored at the end of its useful life. The report stated that big data's lifecycle typically spans four stages: collection, compilation and consolidation, analysis, and use.
"Big data's role is growing in nearly every area of business, affecting millions of consumers in concrete ways," said FTC Chairwoman Edith Ramirez. "The potential benefits to consumers are significant, but businesses must ensure that their big data use does not lead to harmful exclusion or discrimination."
The issues covered in the FTC report were initially explored in a workshop of the same title held Sept. 15, 2014, and have remained at the forefront of FTC efforts to enforce best practices in big data usage in conformance with the Fair Credit Reporting Act.
Following are topics covered in the 2014 workshop and subsequent 2016 report:
How are organizations using big data to categorize consumers?
- What benefits do consumers gain from these practices? Do these practices raise consumer protection concerns?
- What benefits do organizations gain from these practices? What are the social and economic impacts, both positive and negative, from the use of big data to categorize consumers?
- How do existing laws apply to such practices? Are there gaps in the legal framework?
Are companies appropriately assessing the impact of big data practices on low income and underserved populations? Should additional measures be considered?
Big data benefits, risks
The FTC solicited opinions from the public pertaining to data analytic trends in business and private sectors, particularly in the areas of health care, education and credit scoring. The commission also cited numerous ways in which big data can help underserved communities improve access to services in health care, education, employment, and alternative forms of credit and nonbank financing.
The report also took a cautionary stance regarding how inaccurate profiles and biases used in data analytics and credit reporting can impact and marginalize individuals and groups. These outcomes can extend from basic denial of credit and privacy issues to cybercriminals targeting vulnerable consumers.
The FTC urges business owners to adhere to regulatory guidelines issued by the Fair Credit Reporting Act, the FTC Act and equal opportunity laws that govern the use of big data. The report provides guidance on how to assess levels of compliance with these laws.
The following four policy questions cited in the report are designed to help companies examine potential biases and determine their level of compliance with legal and ethical guidelines related to big data usage:
- How representative is your data set? Companies should consider whether their data sets are missing information about certain populations and take steps to address issues of underrepresentation and overrepresentation. For example, if a company targets services to consumers who communicate through an application or social media, it may be neglecting populations that are not as tech-savvy.
- Does your data model account for biases? Companies should consider whether biases are being incorporated at both the collection and analytics stages of big data's life cycle, and where biases exist, develop strategies to overcome them. For example, if a company has a big data algorithm that only considers applicants from top-tier colleges to help them make hiring decisions, they may be incorporating previous biases in college admission decisions.
- How accurate are your predictions based on big data? Companies should remember that while big data is very good at detecting correlations, it does not explain which correlations are meaningful. A prime example that demonstrates the limitations of big data analytics is Google Flu Trends, a machine-learning algorithm for predicting the number of flu cases based on Google search terms.
While, at first, the algorithms appeared to create accurate predictions of where the flu was more prevalent, it generated highly inaccurate estimates over time. This could be because the algorithm failed to take into account certain variables. For example, the algorithm may not have taken into account that people would be more likely to search for flu-related terms if the local news ran a story on a flu outbreak, even if the outbreak occurred halfway around the world.
- Does your reliance on big data raise ethical or fairness concerns? Companies should assess the factors that go into an analytics model and balance the predictive value of the model with fairness considerations. For example, one company determined that employees who live closer to their jobs stay at these jobs longer than those who live farther away. However, another company decided to exclude this factor from its hiring algorithm because of concerns about racial discrimination, particularly since different neighborhoods can have different racial compositions.
Notice to readers: These are archived articles. Contact names or information may be out of date. We regret any inconvenience.