Princeton computer science professor Edward Felten has posted on his Web site a summary of a study he and Princeton student Sauhard Sahi conducted involving BitTorrent, the peer-to-peer network protocol. Felten and Sahi summarize their study as an investigation into what types of files are available on the system:

BitTorrent is popular because it lets anyone distribute large files at low cost. Which kinds of files are available on BitTorrent? Sauhard Sahi, a Princeton senior, decided to find out. Sauhard’s independent work last semester, under my supervision, set out to measure what was available on BitTorrent. This post, summarizing his results, was co-written by Sauhard and me.

Sahi and Felten chose a random sample of files available “via the trackerless variant of BitTorrent, using the Mainline DHT. The sample comprised 1021 files. He classified the files in the sample by file type, language, and apparent copyright status.” The summary does not clearly identify the time frame (either in length of time, or the time of year) in which Sahi and Felten performed the study.

Summary of the Study Summary

In summary, Sahi and Felten concluded that nearly half the files (46 percent) in the study comprised of non-adult movies and “shows.” (We presume the scholars mean shows — either dramatic serials or game shows — that appear on television.) These category of content would include what the Copyright Act of 1976 defines in Section 101 as “motion pictures” (”Motion pictures are audiovisual works consisting of a series of related images which, when shown in succession, impart an impression of motion, together with accompanying sounds, if any.”) Adult films and computer games and software each accounted for 14 percent of the total files; music accounted for another 10 percent of the files.

The part of the Sahi-Felten study summary that seemed to garner the most attention was the section entitled “Apparent Copyright Infringement.” Wrote the scholars:

Our final assessment involved determining whether or not each file seemed likely to be copyright-infringing. We classified a file as likely non-infringing if it appeared to be (1) in the public domain, (2) freely available through legitimate channels, or (3) user-generated content. These were judgment calls on our part, based on the contents of the files, together with some external research.

Overall, we classified ten of the 1021 files, or approximately 1%, as likely non-infringing, This result should be interpreted with caution, as we may have missed some non-infringing files, and our sample is of files available, not files actually downloaded. Still, the result suggests strongly that copyright infringement is widespread among BitTorrent users.

In other words, the pair have drawn a preliminary conclusion that 99 percent of the files in this BitTorrent study infringed U.S. copyright law.

It is virtually impossible to discuss this study or its conclusion without reviewing the final paper, the data, and the data analysis that lead to the conclusions about “Apparent Copyright Infringement.” We and another reader have requested to review that information. We also specifically asked to see the coding sheets, the variables, and a closer look at the variable operationalizations; upon a second glance at the summary, we also would like to review the study design, particularly its sampling design.

(By the way, none of these requests are abnormal for social science studies. It is possible a reviewer may not request coding sheets, for example, but if coding schema are integral to variable operationalizations, then requesting the coding schema is not abnormal either.)

Our Questions

Still, we present some preliminary comments about the summary, and ask some questions about it. (We presume a forthcoming paper will presents the study, its data, and findings in more detail).

First, we would like to know both the time frame and the time span that the study captured. The time frame would determine time of day and time zone; the time frame would identify whether the study spanned the entire summer, a month, a week, or a day. Both are important in terms of measurement and potential data skew, especially if there is only a single temporal element captured and that temporal element is not compared to a second, third, or fourth temporal element.

Also, we would be interested in knowing whether this study was a longitudinal study, or a snapshot of activity; if it is the latter, both the time frame and time span become much more important.

Second, we hope the final paper identifies why the scholars chose “the trackerless variant of BitTorrent, using the Mainline DHT” as the data source, and what were the reasons for excluding other BitTorrent data sources.

Third, we find the scholars’ operationalization of copyright infringement to be interesting. On this issue, the scholars wrote the following:

Our final assessment involved determining whether or not each file seemed likely to be copyright-infringing. We classified a file as likely non-infringing if it appeared to be (1) in the public domain, (2) freely available through legitimate channels, or (3) user-generated content. These were judgment calls on our part, based on the contents of the files, together with some external research.

Based upon the information in the summary, this operationalization of copyright infringement could be problematic for practical and theoretical reasons because it could skew the findings, or fail to provide proper context. In order to determine why we find this problematic, consider our rationale.

The actual definition of copyright infringement in the Copyright Act of 1976 (Section 501(a)) states the following

Anyone who violates any of the exclusive rights of the copyright owner as provided by sections 106 through 122 or of the author as provided in section 106A(a) … is an infringer of the copyright or right of the author, as the case may be.

Effectively, this means that any time any person other than the copyright owner or its authorized agent invokes or uses any of the exclusive rights of reproduction, derivative work/adaptation, distribution, public performance or public display, that person is infringing per Section 501(a). As we have outlined in our sister publication Core Copyright, this use or invocation occurs every minute, of every hour of every day under the current legal regime.

This finding of infringement, of course, is subject to a raft of limitations or compulsory licenses in Sections 107 through 122. These limitations and licenses may mean that a de facto finding of infringement — which, too, is common and virtually automatic under the current legal regime — ultimately falls away, leaving the alleged infringer without legal liability, for reasons of public or economic policy.

The Importance of Operationalizing Infringement

But let’s return to the finding of infringement using the definition in Section 501(a) using the movies as an example. Since copyright infringement is a strict liability issue (i.e roughly meaning liability without fault), this essentially means that anytime anyone posts a file on a BitTorrent system — even a digital movie or music file ripped from their own collections — there is, arguably, an infringement because

(a) the person who owns the source disc from which the movie or music file was ripped is likely not the person that owns any of the Section 106 exclusive rights in the disc (per Section 202); and
(b) therefore has no authority to distribute that file on a digital network.

(The first sale limitation in Section 109 may or may not apply. We will presume for the sake of this argument that it is inapplicable. We also forestall any discussion of reproducing the movies into a digital format in order to get the digital file onto the BitTorrent network in the first place; that activity — which almost certainly occurs by circumventing a digital copy protection technology — likely would violate the Digital Millennium Copyright Act.)

This means that from a legal standpoint, it is possible that any file on such a distributed peer-to-peer network is an infringement under Section 501(a), regardless of whether or not the person who uploads the file owns the source disc. (Again, an ultimate and determinative finding of liability would be subject to the limitations and compulsory licenses in Sections 107 through 122 of the current Act.)

How does the legal definition of infringement affect the scholars’ operationalization of infringement in their study?

First, it could affect the study in a significant way if it does not take into account a variable for actual ownership of the source material from which the traded digital file was ripped. This matters, in turn, because the first sale doctrine may be an applicable limitation. (Again, more analysis would need to be done, but it’s worth an investigation.)

Second, if you can determine, operationalize, and make a variable for source ownership, then the study can probe deeper into what type of infringement is really at issue. Again, the issue is not whether or not there is infringing activity occurring on the network; by virtue of the way Congress wrote the infringement statute, infringement is occurring. (See our reasoning above.) Any normative arguments about the realism of applying that statute in that way in a digital networked economy are worthwhile, but will not be addressed in this specific article.

Context, Evidence-Based Findings & Scientific Method

But what we do not yet know is what type of infringement is occurring in this study. And here we distinguish between technical infringements (i.e. people who post stuff they own in disc form, but are trading, lending, or making available in digital form, without knowing what they are doing is, technically, a violation of Section 501(a)) or rogue, behavioral infringement (i.e. people who post stuff they never have rightfully purchased or possessed, and who never intend to buy the source material and merely wants to get stuff for free).

This distinction is critical for several reasons. First, identifying this factor through an operationalized variable and applicable statistical analysis would help begin to classify what type of behavior is behind the infringing activity. In turn, this is important because it begins to strike at the fit between normal behavior and legal standards. It is the common “speed limit” theory of law: if all people are traveling safely at 65 in a 55 m.p.h. zone, why write a speeding ticket? In contrast, if some are traveling at 95 in a 55 m.p.h., is there any good reason not to write a speeding ticket, regardless of the level of traffic?

Second, this distinction is critical because of a phenomenon that already has begun to occur. For example, there are some who may will point to this study as evidence that BitTorrent especially — and peer-to-peer networking, more broadly — is rife with illegal (”piratical”) activity that threatens the livelihood of creators and the companies that help manufacture, distribute, and own the discs that hold the source content (and own the content as well).

Indeed, one commentator already has issued a reflexive and impetuous claim that attempts to link the summary’s findings to a broader policy issue about net neutrality. “Valuable information to keep in mind while debating net neutrality rules and ISPs’ right to manage their networks and fight piracy,” wrote Ben Sheffner of Copyrights and Campaigns last week. In this quote and subsequent responses to reader comments, Sheffner suggested that Internet service providers have a duty restrict infringing traffic on their network, and that this duty should manifest itself in a three-strikes/graduated response policy that has been adopted nationwide in France and is beginning to be adopted in other European Union countries.

(There is plenty of background available on three strikes/graduated response. This article by Canadian attorney Barry Sookman outlines an argument in favor of three-strikes/graduated response. Last year, Sheffner gave his take on what he views as the distinction between “graduated response” from “three-strikes.” EFF posted in November about the Anti-Counterfeiting Trade Agreement (ACTA), which has been negotiated in secret, and allegedly includes a three-strikes provision that would affect U.S. law. Michael Geist did a five-part series (1, 2, 3, 4, 5) about ACTA in January, and wrote a separate column about three strikes.)

It is all the more convenient and useful for an advocacy-driven argument in favor of graduated response that “evidence” of BitTorrent’s transmissions would come from someone like Edward Felten because of his credentials and history. As a tenured computer science professor at Princeton, Felten’s work receives a default presumption of validity and prestige. Additionally, Felten had a high-profile experience with U.S. copyright law in 2000, when the recording industry lobby used the DMCA to squelch a scientific paper Felten and fellow scholars wanted to present about circumventing digital encryption on music files. Contextualizing all this information, an advocate could presume that Felten is hostile to copyright law because of this experience, and that publication of this type of result, on this type of paper, with this type of subject matter helps prove beyond a reasonable doubt — along with this Ivy League credentials — that BitTorrent (and by extension, peer-to-peer networks) are dens of copyright iniquity.

But drawing such correlations at this point — with respect to the summary, the resulting paper (which has not yet been vetted, reviewed or published), or Felten’s perceived or actual personal or professional biases — is premature and careless. At this point, no one can state definitively that the Sahi-Felten study provides any correlation between the level of infringing files and the BitTorrent network because no one has nearly enough information based exclusively upon the summary they presented. We cannot say whether Sahi and Felten considered the issues we have raised, or intentionally chose not to address them because they were deemed to be outside the scope of their study. On the basis of the summary alone, we cannot draw even an indirect correlation between this study summary and any need (or even a lack of need) for a three-strikes approach in the United States.

This is why it is important to read — and understand — the design, the variables, the operationalizations, the data collection methods, the statistical analyses in a final, peer-reviewed paper before rendering impulsive opinions about potential applicability to a major policy issue. Further, one needs to know enough about statistical analysis and research design to determine whether there is a skew, whether that skew may have been intentional, and if that skew negatively influences the study’s results. Finally, we need to hear what Sahi and Felten say about the study’s scope, and directions for further research. No matter how well-designed and presented, every study has some limitation, if only because scientific research is not static. Scientists typically live with, and explain, such limitations.

Jumping past this investigation and analysis may be considered acceptable within the context of litigation advocacy, where the objective is to win a specific objective for one’s client. But it is intellectually sloppy from a scientific and empirical perspective. As law professor Justin Hughes once wrote, “[T]he historian or the scientist is trained to research, to explain, and, we hope, to get to the bottom of things. The lawyer — hence, most legal academics— prepares just enough precedent to convince.”

Empiricism and science are the standards from which Sahi and Felten presented their research summary, and those are the standards any resulting final paper must meet. Our questions above are presented from the perspective of social science. Further, research and empirical support — not blind, unilateral advocacy — should be the bases upon which any information policy (especially three-strikes) should be proposed and promulgated.

We can say with a strong level of confidence, however, that the way the current statutes are written, it would have been shocking if anything significantly less than 100% of the files on BitTorrent were technical infringements of copyright law. That reality — and the gap between it and societal norms — is worth continued study.

© Copyright 2010, Copycense. Twitter: @copycense

Technorati Tags: , , , , , , ,

In October 2007, we wrote in these digital pages an article entitled “Should We Still ‘Free Jammie’?” The article’s title referred to a then-existing campaign to elevate Jammie Thomas-Rasset (then Jammie Thomas) to political prisoner status because she lost at trial after being sued by several record companies for copyright infringement.

The core issues in Thomas-Rasset’s second trial, which concluded last month, were substantially similar to those litigated in her first trial as defendant, in which a federal jury found her liable for copyright infringement, and awarded the copyright-owning record companies $222,000 in damages. At the time, we questioned whether the “Free Jammie” campaign was appropriate, given the facts of the case and Ms. Thomas-Rasset’s reported appellate strategy:

Technology publication ArsTechnica is reporting that Jammie Thomas’ appellate strategy will be to question the damages award first, leaving to a later date the broader (and arguably more important) issue of whether or not “making available” files violates the reproduction and distribution rights in Section 106. Ars reports that if the court decides against granting a new trial, Thomas would have 30 days to appeal the original verdict, and she could use that opportunity to argue against the “making available” doctrine, which the judge conveyed in jury instructions.

William Patry has observed that he would be “stunned if there is any room for overturning the award. There is doubt that any award within the permissible range, even the tippy-top, is subject to review. I think there may well be cases where a damage award may be constitutionally flawed, but this is not one of them.”

Still, since Thomas currently is responsible for more than $200,000 in statutory copyright infringement damages, there is little surprise that she would look to reduce that figure. The strategy, however, smells like an unfortunate case of CYA and seems narrow considering the broader stakes at hand.

Thomas and her counsel certainly knew this case would be a high profile matter, but both also had to know this case would have a significant effect on the substance and interpretation of copyright law. It is reasonable to expect any party in litigation to do only what is best for their personal and legal interests. Likewise, attorneys have an obligation to work primarily in their clients’ best interests. But in light of the context, it also seems reasonable for those interested in copyright to expect the litigants in these high profile, important cases to recognize the cases’ legal and societal issues and attempt to resolve such issues, while continuing to do what is best for the client.

The more we dig into this case, the more likely we are to conclude that Thomas was the wrong defendant to support in the wrong case. A five-minute deliberation on liability indicates Thomas’ liability never was in doubt. That fact alone suggests Thomas (as well as proponents for balanced copyright) would have been much better off had she settled for $3,000.

Good party or no, good facts or no, Thomas now is in the belly of the beast, but in ways much more significant than the $220,000 in damages she must pay. This verdict will embolden the music industry to continue its ridiculous litigation campaign; at a minimum, focusing on continuing the campaign likely will keep the industry from making the fundamental business changes it needs to make in order to provide valuable services to consumers in a vastly changed business environment. No one wins in this arrangement.

We were willing to go along with the “Free Jammie” ride so long as she and her legal team recognize they have a responsibility to litigate and resolve the broader, more significant policy issues — particularly this issue of “making available” being made a de facto seventh exclusive right in Section 106.

But if Thomas and her legal team are unable or unwilling to make a legitimate attempt to resolve this and similar broader policy issues, we cannot continue to support their cause because it seems there is no doubt she committed widescale copyright infringement and her appeal seems confined to soothe the sting of what seems to be a sound, if harsh, penalty. We would much rather support [defendants] like Tanya Anderson, who is much more the victim of RIAA’s overly aggressive and flawed litigation tactics than Thomas seems to be.

Ms. Thomas-Rasset ended up getting a reprieve, as a federal judge vacated the first jury’s verdict because of an errant jury instruction. At the time, William Patry opined that he failed to see where the $222,000 verdict could be overturned on constitutional grounds. Ms. Thomas-Rasset refused to settle the case with RIAA, leading to a retrial that occurred last month.

In the second trial, a second federal jury determined that Ms. Thomas-Rasset committed copyright infringement by downloading 24 songs onto a personal computer over which she had control (or to which she had access). The second jury then awarded the copyright owning music companies $1.92 million in willful infringement damages per Section 504(c)(2). In the wake of this second verdict — in which the jury took only five hours to deliberate (compared to five minutes in the first trial) — much of the commentary has been about potential backlash against the recording industry, the recording industry’s “capacity for evil,” and the “scapegoating” of Ms. Thomas-Rasset.

What? Are you kidding us?

As before, Jammie Thomas-Rasset is the wrong person to support against the music industry. Let us elaborate, so no one gets the idea that we’re front running only on winning cases (of which there really have not been many) or cherry picking cases or factual circumstances to create an ideal situation in which individuals can avoid purchasing music or other forms of entertainment.

We purchase a lot of media: literally, thousands of dollars per year in compact discs, DVDs, and game cartridges. Almost all of that content is material we have purchased from a retail outlet: despite having had this outlet for more than five years, we tend not to get schwag. It’s all good, though, because creators tend to get penalized when the copyright-owning company sends out promos and schwag. Since we buy what and who we like, we’re more than happy to support the artists we like through media sales and through tickets to live performances.

So merely as consumers and purchasers of copyrighted material, we have no sympathy for Jammie Thomas-Rasset. (And let’s not get into puerile arguments about privilege, net worth, disposable income or the ability to pay for music or entertainment.)

Additionally, though, we have no sympathy for Ms. Thomas-Rasset from a broader legal and policy perspective. As we mentioned in 2007, if your issue becomes a legal case that encapsulates a broader societal or political issue, we believe you and your attorneys have a broader obligation than just winning or working exclusively for a singular benefit. This is not too much to demand from Rasset-Thomas, as she has been willing to benefit all she can from martyrdom. But with her willingness to benefit from that role comes the responsibility of doing the right thing for a broader effort (especially where that effort is put forth in the name of balanced copyright). Thus far, though, it seems she has been willing to benefit from the quid, but not provide the quo.

Twice Thomas-Rasset has remained defiant after a jury’s overwhelming swift conclusions that her explanations had no credibility, explanations that effectively blamed her children for the downloading of 1,700 songs. (The latest willful infringement award is based upon 24 songs.) Given that the recording industry has shown absolutely no compunction in chasing after children and senior citizens in this litigation campaign, why would she do that?

Twice she has been to trial, and twice she has lost convincingly on a matter of significant legal, societal, and policy importance. And she has lost primarily because no one on either jury believed her; twice a jury of her peers has determined that Jammie Thomas-Rasset had absolutely no credibility.

Twice, she has benefited from legal representation that she either has yet to pay for, or has gotten free of charge.

We have scorched the music industry repeatedly in these pages for business and strategic errors, sloth, greed, and frequently (and with Congress’ help, successfully) tilting copyright law beyond what we believe (and our extensive research has proven) is a Constitutionally-mandated balance that should benefit creators, owners, and the public equally. But in its case against Thomas-Rasset, we have absolutely no problems with what the labels did and why they did it. Even though hers were civil jury trials, there can be no reasonable doubt: Jammie Thomas-Rasset was downloading and exchanging hundreds of songs without compensating the artists or the copyright owners, and she got caught.

As much as we think the music industry’s broader campaign against customers is poor business and policy, we completely support a system in which a copyright owner has the exclusive right “to do and to authorize” anything under Section 106 or Section 106A. We never have questioned that principle, or the system that supports that principle.

Jammie Thomas-Rasset has violated these principles without a legal excuse or limitation, and it seems she has played the public for foolishness by taking advantage of a broader anger at the music industry specifically, and more generally, a U.S. copyright system that has some problems and needs calibration, but still is one worth supporting.

Ms. Thomas-Rasset should pay her lawyers for their work, and the copyright owners for her infringements.

Copycense on Twitter: http://twitter.com/copycense

Copycense on FriendFeed: http://friendfeed.com/copycense

Technorati Tags: , , , , ,

CommuniK Commentary by K. Matthew Dames

The news cycle has been abuzz about digital music and iTunes‘ ascendance to a position as the country’s leading music retailer. Likewise, the mainstream press has continued to feed its desire for an iTunes-Amazon.com octagon-style retail death match, and steadily has been promoting Amazon.com’s mp3 download service as a worthy challenger to the iTunes hegemony.

(The music labels, long irritated with Steve Jobs‘ control of the legal download market, silently would approve of such a challenge.)

We don’t see what the big deal is. There are several problems with music downloads, and none of them have anything to do with three-letter acronyms that purport to “protect” the underlying content. The primary problem with downloaded music is that it sucks.

(more…)

Anna Ringstrom. Sweden to Charge Pirate Bay in Copyright Case. Yahoo! News. Jan. 28, 2008. Sweden’s involvement in enforcement efforts on the entertainment industry’s behalf is related directly to the Special 301 process and Sweden’s fear of being placed on a priority list (penalties for which include trade sanctions).

Copycense™: Incisive IP.

Technorati Tags:

Yahoo! News (via The Associated Press). MPAA Admits Mistake on Downloading Study. Jan. 23, 2008; Inside Higher Ed. Downloading by Students Overstated. Jan. 23, 2008; Association for Computing Machinery. MPAA’s Data Oops: How Will Congress React? Jan. 23, 2008; News Blog (News.com). Why Did Colleges Stay Mum on MPAA Stats? Jan. 25, 2008. We don’t think this is a mistake, actually. For several years, we have questioned as biased and invalid many of the “studies” the entertainment industry creates that purport to show a correlation between alleged infringement activity from a specific environment (i.e. file sharing networks) or population (i.e. college students). More investigation should be done into the numbers and methodology of these reports, especially since the entertainment industry parades them before Congress as evidence that it needs more restrictive intellectual property rights. If you think there is no connection between these sorts of studies and legislation like the PRO IP bill (H.R. 4279) or the HEA Reauthorization bill, think again.

(Editor’s Note: Copycense editors originally commented on this article in the Jan. 29, 2008, edition of Copycense Clippings.)

Copycense™: Incisive IP.

Technorati Tags:

Jeremy Kirk. Antipiracy Group’s Tactics Violate Swiss Law. InfoWorld. Jan. 25, 2008. This is another novel theory of the privacy issues that are raised when the music industry uses private firms to track file sharing networks for alleged copyright infringement. We first heard about this approach late last year, when the University of Oregon questioned the authority MediaSentry had to engage in investigative tracking on the RIAA’s behalf. The University argued, among other things, that MediaSentry’s tracking activities may be illegal because the Maryland-based company does not hold a investigator’s license in Oregon.

(Editor’s Note: Copycense editors originally commented on this article in the Jan. 29, 2008, edition of Copycense Clippings.)

Copycense™: Incisive IP.

Technorati Tags:

Tim Wu. Has AT&T Lost Its Mind? Slate. Jan. 16, 2008. Columbia law professor Wu rhetorically poses the obvious question in response to news that AT&T is considering proposals to filter content, ostensibly to halt alleged copyright infringement. Wu delves more deeply into the “safe harbor” provisions of Section 512 than we did when we first reported this story in last week’s Clippings, and offers some interesting thoughts about why AT&T would even consider such an effort.

(Editor’s Note: Copycense editors originally commented on this article in the Jan. 22, 2008, edition of Copycense Clippings.)

Copycense™: Incisive IP.

Technorati Tags: