Protecting and Licensing Internet Content Databases by Eric Goldman

Protecting and Licensing Internet Content Databases
Eric Goldman
Marquette University Law School

1.                  Introduction

  • The challenge of protecting non-copyrightable data in a digital era
  • Information aggregators and scraping, harvesting and extraction
    • The age-old build v. buy question; but here building means constructing a way to steal
  • For complete protection, clients need to consider law, technology and business models
    • Licensing requires foresight

2.                  Legal Protection—Copyright

  • Copyright protects original works of authorship—not facts or ideas
  • Some cases find copyright in data that is a product of judgment (e.g., CDN v. Kapes)
  • Even if individual items aren’t copyrightable, should be able to protect compilation (selection, arrangement, coordination)
  • Ways to “manufacture” copyright protection:
    • “Meta” info (classifications/taxonomies)
    • Software for formats or transfers
    • Copyright mgmt information (17 USC 1202)

3.                  Legal Protection—Hot News

  • Misappropriation of intangible information usually preempted by copyright
  • But hot news doctrine:
    • Information generated/collected at some expense
    • Information is highly time-sensitive
    • Defendant free-rides on plaintiff’s efforts
    • Defendant’s use directly competes with plaintiff
    • Free-riding reduces production incentives so as to substantially threaten production
  • Examples: Headlines, scores, weather, prices?

4.                  Legal Protection—Contract

  • Contracts can provide excellent protection (except against after-acquirers)
  • Online formation: mandatory non-leaky clickthrough
    • Bootscreen process should work
    • Other placement can work if notice and call to action done carefully
  • Subject to all standard contract defenses
    • Incapacities, unconscionable, public policy

5.                  Legal Protection—Trespass

  • Protect information by protecting the servers
  • Trespass:
    • Use or intermeddling
    • Dispossession, impairment, deprivation or harm
    • Notification and self-help?
  • Computer Fraud & Abuse Act:
    • Accessing protected computer without authorization (or in excess of authorization)
    • Taking info or causing damage
  • Proactive steps: onsite notice, email notice, robot exclusion headers, IP address blocks

6.      Non-Legal Protections

  • Anti-robot techniques:
    • IP address blocks; exclusion headers
    • Dynamically-created pages
    • Password protection
    • Monitor data served; limit amount served to any one user
  • Encryption envelopes
  • Provide custom interface rather than licensing entire database
  • Sell freshness/currency
  • Sell organizing info/implementation ease

7.                  License Grants

  • What IPs are being licensed?
    • Copyright
      • Software, entire database, taxonomy?, teaser portions?, individual items?
      • The challenge of weak collection practices
    • Trade secret
      • Software, proprietary codes, usually NOT the entire database or individual items
    • Trademark
      • Logos
  • Redistribution, co-branding, framing and content serving
  • “Derivative works” (edits, summaries, abridgements and commingling)
  • Display rules
  • Post-termination rights
    • Replacement data
    • Replacement taxonomy
    • Compliance enforcement?

8.                  Other Licensing Issues

  • Transfer protocols and service levels
    • Data dump (electronic or physical), on-demand calls or joint page serving; sales tax implications
    • Data refreshing/caching
  • Anti-scraping obligations
  • Pass-throughs to end user
    • Contract restrictions against extraction
    • Liability disclaimers
  • Indemnity
    • Being the cheese in a sandwich
    • 47 USC 230