Princeton computer science professor Edward Felten has posted on his Web site a summary of a study he and Princeton student Sauhard Sahi conducted involving BitTorrent, the peer-to-peer network protocol. Felten and Sahi summarize their study as an investigation into what types of files are available on the system:

BitTorrent is popular because it lets anyone distribute large files at low cost. Which kinds of files are available on BitTorrent? Sauhard Sahi, a Princeton senior, decided to find out. Sauhard’s independent work last semester, under my supervision, set out to measure what was available on BitTorrent. This post, summarizing his results, was co-written by Sauhard and me.

Sahi and Felten chose a random sample of files available “via the trackerless variant of BitTorrent, using the Mainline DHT. The sample comprised 1021 files. He classified the files in the sample by file type, language, and apparent copyright status.” The summary does not clearly identify the time frame (either in length of time, or the time of year) in which Sahi and Felten performed the study.

Summary of the Study Summary

In summary, Sahi and Felten concluded that nearly half the files (46 percent) in the study comprised of non-adult movies and “shows.” (We presume the scholars mean shows — either dramatic serials or game shows — that appear on television.) These category of content would include what the Copyright Act of 1976 defines in Section 101 as “motion pictures” (”Motion pictures are audiovisual works consisting of a series of related images which, when shown in succession, impart an impression of motion, together with accompanying sounds, if any.”) Adult films and computer games and software each accounted for 14 percent of the total files; music accounted for another 10 percent of the files.

The part of the Sahi-Felten study summary that seemed to garner the most attention was the section entitled “Apparent Copyright Infringement.” Wrote the scholars:

Our final assessment involved determining whether or not each file seemed likely to be copyright-infringing. We classified a file as likely non-infringing if it appeared to be (1) in the public domain, (2) freely available through legitimate channels, or (3) user-generated content. These were judgment calls on our part, based on the contents of the files, together with some external research.

Overall, we classified ten of the 1021 files, or approximately 1%, as likely non-infringing, This result should be interpreted with caution, as we may have missed some non-infringing files, and our sample is of files available, not files actually downloaded. Still, the result suggests strongly that copyright infringement is widespread among BitTorrent users.

In other words, the pair have drawn a preliminary conclusion that 99 percent of the files in this BitTorrent study infringed U.S. copyright law.

It is virtually impossible to discuss this study or its conclusion without reviewing the final paper, the data, and the data analysis that lead to the conclusions about “Apparent Copyright Infringement.” We and another reader have requested to review that information. We also specifically asked to see the coding sheets, the variables, and a closer look at the variable operationalizations; upon a second glance at the summary, we also would like to review the study design, particularly its sampling design.

(By the way, none of these requests are abnormal for social science studies. It is possible a reviewer may not request coding sheets, for example, but if coding schema are integral to variable operationalizations, then requesting the coding schema is not abnormal either.)

Our Questions

Still, we present some preliminary comments about the summary, and ask some questions about it. (We presume a forthcoming paper will presents the study, its data, and findings in more detail).

First, we would like to know both the time frame and the time span that the study captured. The time frame would determine time of day and time zone; the time frame would identify whether the study spanned the entire summer, a month, a week, or a day. Both are important in terms of measurement and potential data skew, especially if there is only a single temporal element captured and that temporal element is not compared to a second, third, or fourth temporal element.

Also, we would be interested in knowing whether this study was a longitudinal study, or a snapshot of activity; if it is the latter, both the time frame and time span become much more important.

Second, we hope the final paper identifies why the scholars chose “the trackerless variant of BitTorrent, using the Mainline DHT” as the data source, and what were the reasons for excluding other BitTorrent data sources.

Third, we find the scholars’ operationalization of copyright infringement to be interesting. On this issue, the scholars wrote the following:

Our final assessment involved determining whether or not each file seemed likely to be copyright-infringing. We classified a file as likely non-infringing if it appeared to be (1) in the public domain, (2) freely available through legitimate channels, or (3) user-generated content. These were judgment calls on our part, based on the contents of the files, together with some external research.

Based upon the information in the summary, this operationalization of copyright infringement could be problematic for practical and theoretical reasons because it could skew the findings, or fail to provide proper context. In order to determine why we find this problematic, consider our rationale.

The actual definition of copyright infringement in the Copyright Act of 1976 (Section 501(a)) states the following

Anyone who violates any of the exclusive rights of the copyright owner as provided by sections 106 through 122 or of the author as provided in section 106A(a) … is an infringer of the copyright or right of the author, as the case may be.

Effectively, this means that any time any person other than the copyright owner or its authorized agent invokes or uses any of the exclusive rights of reproduction, derivative work/adaptation, distribution, public performance or public display, that person is infringing per Section 501(a). As we have outlined in our sister publication Core Copyright, this use or invocation occurs every minute, of every hour of every day under the current legal regime.

This finding of infringement, of course, is subject to a raft of limitations or compulsory licenses in Sections 107 through 122. These limitations and licenses may mean that a de facto finding of infringement — which, too, is common and virtually automatic under the current legal regime — ultimately falls away, leaving the alleged infringer without legal liability, for reasons of public or economic policy.

The Importance of Operationalizing Infringement

But let’s return to the finding of infringement using the definition in Section 501(a) using the movies as an example. Since copyright infringement is a strict liability issue (i.e roughly meaning liability without fault), this essentially means that anytime anyone posts a file on a BitTorrent system — even a digital movie or music file ripped from their own collections — there is, arguably, an infringement because

(a) the person who owns the source disc from which the movie or music file was ripped is likely not the person that owns any of the Section 106 exclusive rights in the disc (per Section 202); and
(b) therefore has no authority to distribute that file on a digital network.

(The first sale limitation in Section 109 may or may not apply. We will presume for the sake of this argument that it is inapplicable. We also forestall any discussion of reproducing the movies into a digital format in order to get the digital file onto the BitTorrent network in the first place; that activity — which almost certainly occurs by circumventing a digital copy protection technology — likely would violate the Digital Millennium Copyright Act.)

This means that from a legal standpoint, it is possible that any file on such a distributed peer-to-peer network is an infringement under Section 501(a), regardless of whether or not the person who uploads the file owns the source disc. (Again, an ultimate and determinative finding of liability would be subject to the limitations and compulsory licenses in Sections 107 through 122 of the current Act.)

How does the legal definition of infringement affect the scholars’ operationalization of infringement in their study?

First, it could affect the study in a significant way if it does not take into account a variable for actual ownership of the source material from which the traded digital file was ripped. This matters, in turn, because the first sale doctrine may be an applicable limitation. (Again, more analysis would need to be done, but it’s worth an investigation.)

Second, if you can determine, operationalize, and make a variable for source ownership, then the study can probe deeper into what type of infringement is really at issue. Again, the issue is not whether or not there is infringing activity occurring on the network; by virtue of the way Congress wrote the infringement statute, infringement is occurring. (See our reasoning above.) Any normative arguments about the realism of applying that statute in that way in a digital networked economy are worthwhile, but will not be addressed in this specific article.

Context, Evidence-Based Findings & Scientific Method

But what we do not yet know is what type of infringement is occurring in this study. And here we distinguish between technical infringements (i.e. people who post stuff they own in disc form, but are trading, lending, or making available in digital form, without knowing what they are doing is, technically, a violation of Section 501(a)) or rogue, behavioral infringement (i.e. people who post stuff they never have rightfully purchased or possessed, and who never intend to buy the source material and merely wants to get stuff for free).

This distinction is critical for several reasons. First, identifying this factor through an operationalized variable and applicable statistical analysis would help begin to classify what type of behavior is behind the infringing activity. In turn, this is important because it begins to strike at the fit between normal behavior and legal standards. It is the common “speed limit” theory of law: if all people are traveling safely at 65 in a 55 m.p.h. zone, why write a speeding ticket? In contrast, if some are traveling at 95 in a 55 m.p.h., is there any good reason not to write a speeding ticket, regardless of the level of traffic?

Second, this distinction is critical because of a phenomenon that already has begun to occur. For example, there are some who may will point to this study as evidence that BitTorrent especially — and peer-to-peer networking, more broadly — is rife with illegal (”piratical”) activity that threatens the livelihood of creators and the companies that help manufacture, distribute, and own the discs that hold the source content (and own the content as well).

Indeed, one commentator already has issued a reflexive and impetuous claim that attempts to link the summary’s findings to a broader policy issue about net neutrality. “Valuable information to keep in mind while debating net neutrality rules and ISPs’ right to manage their networks and fight piracy,” wrote Ben Sheffner of Copyrights and Campaigns last week. In this quote and subsequent responses to reader comments, Sheffner suggested that Internet service providers have a duty restrict infringing traffic on their network, and that this duty should manifest itself in a three-strikes/graduated response policy that has been adopted nationwide in France and is beginning to be adopted in other European Union countries.

(There is plenty of background available on three strikes/graduated response. This article by Canadian attorney Barry Sookman outlines an argument in favor of three-strikes/graduated response. Last year, Sheffner gave his take on what he views as the distinction between “graduated response” from “three-strikes.” EFF posted in November about the Anti-Counterfeiting Trade Agreement (ACTA), which has been negotiated in secret, and allegedly includes a three-strikes provision that would affect U.S. law. Michael Geist did a five-part series (1, 2, 3, 4, 5) about ACTA in January, and wrote a separate column about three strikes.)

It is all the more convenient and useful for an advocacy-driven argument in favor of graduated response that “evidence” of BitTorrent’s transmissions would come from someone like Edward Felten because of his credentials and history. As a tenured computer science professor at Princeton, Felten’s work receives a default presumption of validity and prestige. Additionally, Felten had a high-profile experience with U.S. copyright law in 2000, when the recording industry lobby used the DMCA to squelch a scientific paper Felten and fellow scholars wanted to present about circumventing digital encryption on music files. Contextualizing all this information, an advocate could presume that Felten is hostile to copyright law because of this experience, and that publication of this type of result, on this type of paper, with this type of subject matter helps prove beyond a reasonable doubt — along with this Ivy League credentials — that BitTorrent (and by extension, peer-to-peer networks) are dens of copyright iniquity.

But drawing such correlations at this point — with respect to the summary, the resulting paper (which has not yet been vetted, reviewed or published), or Felten’s perceived or actual personal or professional biases — is premature and careless. At this point, no one can state definitively that the Sahi-Felten study provides any correlation between the level of infringing files and the BitTorrent network because no one has nearly enough information based exclusively upon the summary they presented. We cannot say whether Sahi and Felten considered the issues we have raised, or intentionally chose not to address them because they were deemed to be outside the scope of their study. On the basis of the summary alone, we cannot draw even an indirect correlation between this study summary and any need (or even a lack of need) for a three-strikes approach in the United States.

This is why it is important to read — and understand — the design, the variables, the operationalizations, the data collection methods, the statistical analyses in a final, peer-reviewed paper before rendering impulsive opinions about potential applicability to a major policy issue. Further, one needs to know enough about statistical analysis and research design to determine whether there is a skew, whether that skew may have been intentional, and if that skew negatively influences the study’s results. Finally, we need to hear what Sahi and Felten say about the study’s scope, and directions for further research. No matter how well-designed and presented, every study has some limitation, if only because scientific research is not static. Scientists typically live with, and explain, such limitations.

Jumping past this investigation and analysis may be considered acceptable within the context of litigation advocacy, where the objective is to win a specific objective for one’s client. But it is intellectually sloppy from a scientific and empirical perspective. As law professor Justin Hughes once wrote, “[T]he historian or the scientist is trained to research, to explain, and, we hope, to get to the bottom of things. The lawyer — hence, most legal academics— prepares just enough precedent to convince.”

Empiricism and science are the standards from which Sahi and Felten presented their research summary, and those are the standards any resulting final paper must meet. Our questions above are presented from the perspective of social science. Further, research and empirical support — not blind, unilateral advocacy — should be the bases upon which any information policy (especially three-strikes) should be proposed and promulgated.

We can say with a strong level of confidence, however, that the way the current statutes are written, it would have been shocking if anything significantly less than 100% of the files on BitTorrent were technical infringements of copyright law. That reality — and the gap between it and societal norms — is worth continued study.

© Copyright 2010, Copycense. Twitter: @copycense

Technorati Tags: , , , , , , ,

In light of the U.S. Navy’s rescue of Capt. Richard Phillips on Easter, many news outlets understandably are interested in writing about piracy. Interestingly, some news outlets have raised an important question about “piracy” as a term: in light of the ongoing (and newly news-worthy) threat of violence at high sea, should “piracy” continue to be used to mean theft of works that are protected by copyright or other forms of intellectual property?

Stephen J. Dubner, a co-author of The New York Times‘ “Freakonomics” blog, was one of the first to pose the question openly. In his April 13 post, Dubner even asked his audience to suggest substitutes. Dubner followed with a second post on April 17 to anoint “downlifting” as the linguistic successor to “piracy.” In the meantime, the Washington Post and The Guardian (UK) followed with their own takes on “piracy” language.

It seems each of these publications, however, may have been beaten to the punch by Jenny Kakasuleff and the Indianapolis Liberal Examiner. Kakasuleff’s post was the first we saw this year to question using “piracy” within the context of intellectual property, and the timeline on her post suggests she addressed this before Dubner by about 10 hours. Better yet, her lede is flat-out entertaining:

“When I heard that “piracy” was the latest buzz word to light up the world wide web, I thought for sure Lars Ulrich had summoned Congress to bellyache about how fans like Metallica’s music so much that they–gasp–download it for their listening pleasure. But alas, all the hype was nothing more than a U.S. Navy showdown with three rogue pirates on a lifeboat, armed with AK-47’s and a hostage. Limewire lives to see another day.”

Source: http://tinyurl.com/c3f3oc

Of course, regular Copycense readers have known for quite some time that we never use “piracy” as a proxy for IP theft. We wrote about this in these virtual pages in an April 2007 post entitled Dismantling the “Piracy” Frame. Today, we re-post some of that that writing:

“Since at least late 2005, Copycense assiduously has avoided using the word “piracy” as a synonym for allegedly illegal uses of protected intellectual property. Since then, whenever the term has appeared in this publication, it usually appears in quotes (i.e. “piracy”). There are several reasons for our care. First, since Copycense reports on the intersection of business, law, and technology, it is unusual that we would report on anything remotely related to “acts of robbery and depredation upon the high seas.”

“Second, as we have shown here, the term “piracy” has nothing to do with copyright or any other form of intellectual property, much less the allegedly illegal taking of such material. Any use of the term piracy that relates to intellectual property is wrong or an overt linguistic manipulation for political or economic advantage. We’ll concede the entertainment industry’s “piracy” frame has been artful and successful. We also know that it is wrong.

“Third, perpetuating the “piracy” frame pigeonholes intellectual property dialogue into a narrow box that considers only an owners’ rights. All intellectual property law is a delicate balance between the rights of the owner author, or inventor, and the public interest. In copyright law, for example, an owner’s exclusive rights generally are outlined in Sections 106 and 106A, while the public policy-oriented limitations (or exceptions) to those exclusive rights generally are codified in Sections 107 through 122.

“[Several publications and organizations … reinforce] the “piracy” frame through [their] reporting. The New York Times, The Wall Street Journal, The Washington Post are among them, and they continue to do so even though their coverage over the last 18 months increasingly has been critical of the entertainment industry, their lobbyists, and the overtly protectionist copyright laws those groups are responsible for proposing and ramming through a Congress that has been ignorant about the frame, too weak to stop it, or complicit in accepting it without the mildest investigation.

“But at some point the “piracy” frame must be uncovered for what it is: public relations blather. It is sexy, simple, and concededly well-designed blather, but blather nevertheless. We have committed to avoid using “piracy” except where such use is consistent with its definition (which means we will not have much need to use it at all). Instead of “piracy,” we call on journalists, editors, and bloggers to use the phrase “alleged infringement.” Unlike “piracy,” the phrase “alleged infringement” is legally accurate, simple, and suggests that accusations of unsanctioned use of copyrighted materials are subject to exceptions and a legal process by which a judge or jury may or may not hold the accused liable for infringement or damages.”

Source: http://www.copycense.com/2007/04/dismantling_the.html

As it turns out, Copycense executive editor K. Matthew Dames has been studying the intersection of framing, law, and policy extensively for more than two years. In addition to the aforementioned Copycense post, Dames first addressed framing in a September 2006 article published in Information Today magazine, and presented a paper about the meanings of piracy in September 2008 at Syracuse University. He has updated the 2008 paper, which is part of a broader study he is conducting on framing, rhetoric, and U.S. copyright policy, and it is now available on SSRN.

Finally, to answer Dubner’s question, instead of “piracy,” why don’t we call these things what they are: allegations of copyright infringement?

Related:

Copycense: Incisive IP.

Technorati Tags: , , , , , ,

We missed this story when it appeared last month, so we are commenting on it now.

A woman who was arrested on allegations she sold illegal music compact discs was jailed last month and left by law enforcement authorities in solitary confinement for more than four days. The woman, Adriana Torres-Flores, 38, of Springdale, Arkansas, was left without food, toilet facilities, or sleeping facilities. Torres-Flores said she drank her own urine to for fluids.

Torres-Flores had been arrested in December 2007 on criminal charges she was selling bootlegged compact discs at a Springdale, Arkansas flea market. Torres-Flores faces deportation proceedings because she is not a U.S. citizen.

We discovered news of Ms. Torres-Flores’ situation after we read a The New York Times last week about the bootlegged entertainment that no longer is available on Canal Street, long known as one of New York City’s major distribution points for discount goods, many of which are counterfeit. The story details an initiative Mayor Michael Bloomberg began in December 2003 with the aim of reducing the amount of counterfeit goods in the city that never sleeps.

A separate December 2003 from the Times details the results of an afternoon raid against counterfeit goods.

In both Times stories, the newspaper quotes financial estimates from trade associations — the Motion Picture Association of America in last week’s story; the International Chamber of Commerce in the 2003 story — that purport to detail the amount of money the associations’ member lose to counterfeit or bootlegged goods.

Ms. Torres-Flores’ situation is egregious because of the unusual circumstances surrounding her detention. In many other ways, however, her situation is consistent with an effort by multinational copyright industries to use municipal police to enforce and uphold the protection of their narrow interests. We wrote about this situation last year when editorializing about the Fulton County Sherriff’s involvement (with blue-jacketed representatives from the Recording Industry Association of America) in a raid of DJ Drama’s Atlanta studio.

DJ Drama and several of his colleagues were arrested in January 2007 for making mixtapes allegedly in violation of the Copyright Act of 1976.

See also:

Eric A. Taub. Off New York Streets, Film Piracy Is Online. The New York Times. April 14, 2008.

Mark Minton. Woman Forgotten 4 Days In Tiny Cell. Arkansas Democrat Gazette. March 11, 2008.

Copycense. Mix Tapes Compared to Cocaine? February 7, 2007.

Michael Wilson. 2 Chinatown Stores Raided In Counterfeit-Goods Sweep. The New York Times. Dec. 3, 2003.

Copycense™: Incisive IP.

Public Knowledge. Don’t Trust the Media to Get Copyright Right: Scrabulous Coverage Scores Few Points. Jan. 22, 2008. Often, we have taken the press to task for its frequently errant and one-sided coverage of intellectual property issues. When IP was a backwater issue, poor (and sometimes inaccurate) coverage was a problem, but was not evident. Now IP often warrants front-page, above-the-fold coverage, and the mistakes not only are evident, they are harmful. A news organization’s primary professional objective is to get it right. Writing flowery prose like Selena Roberts is optional. Talking loud and saying little like Stephen A. is optional. Whining while cashing checks like Mr. Tony is optional.

Getting it right, on the other hand, is mandatory.

Marc Fisher got it wrong in a big way last month, likely because he relied on second-hand reporting and did not do the requisite amount of fact- and document checking. Now, as PK points out, several media outlets seem to have gotten the Scrabulous/Facebook story wrong, attributing to an alleged copyright infringement problem what really is an alleged trademark infringement problem. Not only is this unacceptable, it is grossly unprofessional. If the news media can’t get it right, above all else, it’s useless.

(Editor’s Note: Copycense editors originally commented on this article in the Jan. 29, 2008, edition of Copycense Clippings.)

Copycense™: Incisive IP.

Technorati Tags:

Larry Barrett. Publishing Company Settles Software Suit With SIIA. Internetnews.com. Jan. 18, 2008. We find it interesting that while SIIA promotes that it will pay informants up to $1 million to snitch on others for alleged copyright infringement, the lobbying group (which counts among its membership Bloomberg, Dow Jones, Reed Elsevier, and Copyright Clearance Center) has paid out only $39,500, or an average of $2,821.43 per informant. This makes us wonder whether McNulty and Greggs pay Bubbles better for his information than the multinational database content industry pays for its information.

(Editor’s Note: Copycense editors originally commented on this article in the Jan. 22, 2008, edition of Copycense Clippings.)

Copycense™: Incisive IP.

Technorati Tags:

Julie Hilden. Seinfeld Sued: Will “Sneaky Chef” Author Missy Chase Lapine Succeed In Her Suit Against Jerry and Jessica Seinfeld? FindLaw. Jan. 15, 2008. We reported on Jessica Seinfeld’s cookbook back in October. Now the inevitable lawsuit (.pdf) has been filed, alleging copyright infringement and defamation, among other things. As William Patry noted in a comment about an infringement case involving the Baltimore Ravens’ logo, substantial similarity should not be enough to win an infringement lawsuit. The evidence also should show the defendant had access to the allegedly infringed work. Stay tuned.

(Editor’s Note: Copycense editors originally commented on this article in the Jan. 22, 2008, edition of Copycense Clippings.)

Copycense™: Incisive IP.

Technorati Tags:

AdRants. Ford Slaps Brand Enthusiasts, Returns Love With Legal Punch. Jan. 14, 2007. Ford, which desperately needs some love from the public, shoots itself in the foot by threatening legal action over the use of its logo in a calendar sold by a Mustang owners club. Our first reaction was “how dumb can you be?” Upon reconsideration, though, American trademark law may have required Ford to take some level of action because of potential dilution issues. The issue has been resolved now, but one has to think this issue could have been handled in a manner that would not have left Ford looking like a bully. Just because there’s a legal issue doesn’t mean the law needs to be used like a club.

(Editor’s Note: Copycense editors originally commented on this article in the Jan. 15, 2008, edition of Copycense Clippings.)

Copycense™: Incisive IP.

Technorati Tags: