Big Data Speaks Loudly and Carries a Big Stick

Jan 06 2015

“Speak softly and carry a big stick” connotes a policy of beginning gently but holding a decisive weapon in reserve. Big Data doesn’t do that. When Big Data ‘speaks,’ it tends to blurt out its conclusion and can have an immediate impact, deserved or undeserved, because Big Data is not based on careful statistical sampling and is not aimed at determining causation. Big Data correlates masses of good, bad and indifferent data, ie, it can be ‘messy’ and its correlations are not necessarily accurate relative to a desired question. So what will happen when ‘the data’ indicates an outcome that people believe or act on because the ‘data says so?’ That is the topic of this blog.

Consider (if only by analogy) the data-driven-results of ‘civic apps’ using a smart phone camera, global positioning system and truncated email functions to snap/send locations of potholes, graffiti or other city ‘fix-it’ needs. Boston reportedly used such an app to allow residents to pinpoint potholes so that the city could devote its resources to repairing, instead of finding, potholes. The streets of higher income neighborhoods became passable again, much to the dismay of lower income neighborhoods. The city assumed it was doing its job based on the data, but the data was not coming from a statistical sampling of the entire city. In hindsight, a causation look indicated that more potholes were reported in higher income areas because more of those residents could afford smart phones or cars; causation might also be influenced by neighborhood leadership or cohesion, eg, a neighborhood forming a ‘watch group’ will file more reports than a less watchful neighborhood. If the goal were to evenly allocate city repair resources, following the data without a corresponding review of factors producing the data can miss the mark.

What produces most Big Data outcomes/results? Algorithms.

We’ll take a look at those in two parts. Part I is below and Part II will be our next blog.

Part I

1. Legal Protection for Algorithms
May an algorithm be protected by intellectual property laws such as law for patents, copyrights or trade secrets?

Yes, all three intellectual property structures are candidates for a proprietary claim of ownership of that algorithm, or for making it available to others under a license (public or private). Our next blog in this series will focus on the particulars of what specific intellectual property protections are available.

An algorithm can also be protected as proprietary by contract. When looking at such a contract, consider what it ought to contain beyond protection for the algorithm itself.

2. Legal Nature of an Algorithm
Lessons can be learned about the nature of an algorithm from litigation concerning search engines. In Search King v. Google Technology, the algorithm used by Google for its search engine produced a ‘PageRank’ intended to correlate a particular web site to search queries. The PageRank was derived from a combination of factors determined by Google. Google did not sell PageRanks, and the ranked sites had no power to determine their individual ranking (other than by trying to ‘game’ the system) or even whether they were included on Google’s search engine at all. Search King, a stranger to Google, charged Search King’s clients a fee for locating highly-ranked sites receptive to advertising on their sites, and in turn compensated those sites with a portion of fees earned by Search King. Search King alleged Google purposefully and maliciously decreased PageRanks assigned to web sites involving Search King, when Google learned that a Search King affiliate was competing with Google and profiting by selling advertising space on highly ranked sites. Search King sued for tortious interference with contractual relations. That tort requires malicious and wrongful interference that is not justified, privileged, or excusable.

Google’s defense was simple: it could not be liable because its PageRanks were subjective opinions/speech protected by the First Amendment notwithstanding the fact that Google’s founder held a patent on the PageRank system and that Google advertised the search engine results as being objective facts. The court framed the questions as these:

“First, are PageRanks constitutionally protected opinions? Second, if PageRanks fall within the scope of protection afforded by the First Amendment, is the publication of PageRanks per se lawful under Oklahoma law, thereby precluding tort liability premised on the intentional and even malicious manipulation of PageRanks by Google? The Court answers both questions in the affirmative.” [For more exciting (and relevant) text from the Court, see Endnote[i]].

This is not a lone case, ie, just as some data itself can be viewed as a building block of free speech and be protected by the First Amendment (see our previous blog), so can algorithms reflect opinions of a speaker. A 2014 illustration is Zhang v. Baidu.Com, Inc., 10 F. Supp. 3d 433 (S.D.N.Y. 2014), where New York residents and supporters of a democracy movement in China sued a Chinese search engine that was suppressing in the U.S. search engine results about that movement. The court held that the First Amendment protected the search company:

“[T]here is a strong argument to be made that the First Amendment fully immunizes search-engine results from most, if not all, kinds of civil liability and government regulation. . . . The central purpose of a search engine is to retrieve relevant information from the vast universe of data on the Internet and to organize it in a way that would be most helpful to the searcher. In doing so, search engines inevitably make editorial judgments about what information (or kinds of information) to include in the results and how and where to display that information (for example, on the first page of the search results or later).”

The court noted a vigorous academic counter-argument to the effect that search engine algorithms are not expressive enough to qualify as speech, but the argument failed at least in the circumstances of the case, ie, the exercise of editorial judgment, objectionable or not.

None of this means that algorithms escape all legal restrictions. For example, the First Amendment does not preclude laws prohibiting speakers from shouting ‘Fire’ in a crowded movie theatre; defamation and libel laws are alive and well; speakers engaging in deceptive practices will find that the First Amendment won’t save them from laws prohibiting deception; and the federal Fair Credit Reporting Act (FCRA) (see No. 4 below) is an example of a law regulating data reports that, we assume for now, passes muster under First Amendment scrutiny. Our point is that some algorithms can be speech or opinions and, frankly, flat out subjective, ie, Big Data is not about mere statistical sampling. The scope and development of First Amendment protection, or lack of protection, will be developed as use of Big Data increases.

3. Algorithm Transparency
Notwithstanding the nature of, or protections for algorithms, there will be demands for information about what they are actually doing. For example, if the algorithm purports to identify “job applicants not suited for job type X” and you are (a) the applicant who wants but is denied a job X, or (b) the regulator in charge of preventing discriminatory hiring practices, you may want to argue that the algorithm is wrong or discriminatory. If you are the employer using the algorithm, you will want to be able to prove the opposite or require the algorithm-provider to do so. If you are the algorithm provider, you may want to protect a trade secret in the algorithm or preclude knowledge of how it works so that no one can game it and distort results.

This is where successful juggling of the multi-faceted nature of algorithms will be needed. For example, the good news for providers of algorithms is that the First Amendment might prohibit lawmakers from attempting to chill the speech of the algorithm. The bad news is that the provider’s customers (such as employers) might be unwilling to contract for use of the algorithm absent transparency or assurances about what it will do and how it goes about it. Similarly, the provider of the algorithm or a data analytic service may want assurances of its own from customers, such as an agreement to restrict particular uses or alterations of the data.

4. Furnishing Reports on Individuals Based on the Algorithm or Data
FCRA applies to consumer reporting agencies (CRA) and to users of ‘consumer reports’ issued by the CRA. The traditional example of a CRA is a company like Experian or TransUnion, ie, the national companies that issue credit reports to lenders or background checks to employers for use in determining eligibility of an applicant for credit, insurance, employment, or for certain other purposes for which a ‘consumer report’ may be furnished under FCRA. Any business that reports information for a FCRA-covered purpose might be a ‘consumer reporting agency’ whether or not it realizes that it is one.

With the advent of Big Data and data brokers, the FTC has begun to allege through enforcement actions that some data brokers are, actually and already, noncompliant consumer reporting agencies.[ii][iii] Debate has also begun over the policy question of whether and how FCRA will be expanded to deal with some of the Big Data impacts on individuals.

For example, if John Eager applies for a job, the employer may not order a background check without providing him a set disclosure and obtaining his consent. If the employer nixes John for the job based on information in the background check, the employer must give John a notice stating that its ‘adverse action’ (nixing John) was based on information in a ‘consumer report’ from a named consumer reporting agency. That allows John to contact the CRA and challenge any erroneous information. The debate centers around whether the statute should be expanded to cover situations where ‘Big Data’ indicated that John was not appropriate for job X.

From the discussion above, the concept of “speak softly and carry a big stick” is more appropriately modified when working with Big Data to that of “Let the Big Data reveal trends, but be prepared to also reveal your algorithms transparently.” Just as you were obliged in your ‘old school’ math classes to ‘show your work,’ quickly the Big Data audience is learning to demand transparency to the algorithm, no matter what the intellectual property protection of that algorithm.

Our next installment of this Big Data series will unpack the specific intellectual property assets to be found in the Big Data box, and also address another aspect of algorithms in conjunction with the Big Data watershed of processing information.

[i] “Here, the process, which involves the application of the PageRank algorithm, is objective in nature. In contrast, the result, which is the PageRank – or the numerical representation of relative significance of a particular web site – is fundamentally subjective in nature. This is so because every algorithm employed by every search engine is different, and will produce a different representation of the relative significance of a particular web site depending on the various factors, and the weight of the factors, used to determine whether a web site corresponds to a search query. In the case at bar, it is the subjective result, the PageRank, which was modified, and which forms the basis for Search King’s tort action.” Search King Inc. v. Google Technology, Inc.. 2003 U.S. Dist. LEXIS 27193 (W.D. Okla. 2003) (emphasis added.)

[ii] See eg., U.S. v. Instant Checkmate, Inc., SD Cal., No. 3:14-cv-00675-H-JMA, 3/28/14 and U.S. v. InfoTrack Information Services, Inc., ND Ill., No. 1:14-cv-02054, 3/25/14). In each case, the FTC alleged that the companies, which sold public record information about consumers to users such as prospective employers and landlords, did not take reasonable steps to make sure that the information in their reports was accurate or that their customers had a FCRA permissible purpose to have them, and that each operated as consumer reporting agencies without complying with FCRA.

[iii] For a discussion of FCRA and its application beyond traditional consumer reporting Agencies, see Chapter 11.06[11] of Holly K. Towle and Raymond T. Nimmer, The Law of Electronic Commercial Transactions (2003-2015 LexisNexis).