Archive for February 2010
Science vs. Advocacy: Thoughts on the Felten BitTorrent Study
Princeton computer science professor Edward Felten has posted on his Web site a summary of a study he and Princeton student Sauhard Sahi conducted involving BitTorrent, the peer-to-peer network protocol. Felten and Sahi summarize their study as an investigation into what types of files are available on the system:
BitTorrent is popular because it lets anyone distribute large files at low cost. Which kinds of files are available on BitTorrent? Sauhard Sahi, a Princeton senior, decided to find out. Sauhard’s independent work last semester, under my supervision, set out to measure what was available on BitTorrent. This post, summarizing his results, was co-written by Sauhard and me.
Sahi and Felten chose a random sample of files available “via the trackerless variant of BitTorrent, using the Mainline DHT. The sample comprised 1021 files. He classified the files in the sample by file type, language, and apparent copyright status.” The summary does not clearly identify the time frame (either in length of time, or the time of year) in which Sahi and Felten performed the study.
Summary of the Study Summary
In summary, Sahi and Felten concluded that nearly half the files (46 percent) in the study comprised of non-adult movies and “shows.” (We presume the scholars mean shows — either dramatic serials or game shows — that appear on television.) These category of content would include what the Copyright Act of 1976 defines in Section 101 as “motion pictures” (“Motion pictures are audiovisual works consisting of a series of related images which, when shown in succession, impart an impression of motion, together with accompanying sounds, if any.”) Adult films and computer games and software each accounted for 14 percent of the total files; music accounted for another 10 percent of the files.
The part of the Sahi-Felten study summary that seemed to garner the most attention was the section entitled “Apparent Copyright Infringement.” Wrote the scholars:
Our final assessment involved determining whether or not each file seemed likely to be copyright-infringing. We classified a file as likely non-infringing if it appeared to be (1) in the public domain, (2) freely available through legitimate channels, or (3) user-generated content. These were judgment calls on our part, based on the contents of the files, together with some external research.
…
Overall, we classified ten of the 1021 files, or approximately 1%, as likely non-infringing, This result should be interpreted with caution, as we may have missed some non-infringing files, and our sample is of files available, not files actually downloaded. Still, the result suggests strongly that copyright infringement is widespread among BitTorrent users.
In other words, the pair have drawn a preliminary conclusion that 99 percent of the files in this BitTorrent study infringed U.S. copyright law.
It is virtually impossible to discuss this study or its conclusion without reviewing the final paper, the data, and the data analysis that lead to the conclusions about “Apparent Copyright Infringement.” We and another reader have requested to review that information. We also specifically asked to see the coding sheets, the variables, and a closer look at the variable operationalizations; upon a second glance at the summary, we also would like to review the study design, particularly its sampling design.
(By the way, none of these requests are abnormal for social science studies. It is possible a reviewer may not request coding sheets, for example, but if coding schema are integral to variable operationalizations, then requesting the coding schema is not abnormal either.)