Protecting and Licensing Internet Content Databases

Eric Goldman

Marquette University Law School

eric.goldman@marquette.edu

http://eric_goldman.tripod.com

 

1.                  Introduction

 

w   The challenge of protecting non-copyrightable data in a digital era

w   Information aggregators and scraping, harvesting and extraction

n    The age-old build v. buy question; but here building means constructing a way to steal

w   For complete protection, clients need to consider law, technology and business models

n    Licensing requires foresight

 

2.                  Legal Protection—Copyright

 

w   Copyright protects original works of authorship—not facts or ideas

w   Some cases find copyright in data that is a product of judgment (e.g., CDN v. Kapes)

w   Even if individual items aren’t copyrightable, should be able to protect compilation (selection, arrangement, coordination)

w   Ways to “manufacture” copyright protection:

n    “Meta” info (classifications/taxonomies)

n    Software for formats or transfers

n    Copyright mgmt information (17 USC 1202)

 

3.                  Legal Protection—Hot News

 

w   Misappropriation of intangible information usually preempted by copyright

w   But hot news doctrine:

n    Information generated/collected at some expense

n    Information is highly time-sensitive

n    Defendant free-rides on plaintiff’s efforts

n    Defendant’s use directly competes with plaintiff

n    Free-riding reduces production incentives so as to substantially threaten production

w   Examples: Headlines, scores, weather, prices?

 

4.                  Legal Protection—Contract

 

w   Contracts can provide excellent protection (except against after-acquirers)

w   Online formation: mandatory non-leaky clickthrough

n    Bootscreen process should work

n    Other placement can work if notice and call to action done carefully

w   Subject to all standard contract defenses

n    Incapacities, unconscionable, public policy

 

5.                  Legal Protection—Trespass

 

w   Protect information by protecting the servers

w   Trespass:

n    Use or intermeddling

n    Dispossession, impairment, deprivation or harm

n    Notification and self-help?

w   Computer Fraud & Abuse Act:

n    Accessing protected computer without authorization (or in excess of authorization)

n    Taking info or causing damage

w   Proactive steps: onsite notice, email notice, robot exclusion headers, IP address blocks

 

6.      Non-Legal Protections

 

w   Anti-robot techniques:

n    IP address blocks; exclusion headers

n    Dynamically-created pages

n    Password protection

n    Monitor data served; limit amount served to any one user

w   Encryption envelopes

w   Provide custom interface rather than licensing entire database

w   Sell freshness/currency

w   Sell organizing info/implementation ease

 

7.                  License Grants

 

w   What IPs are being licensed?

n    Copyright

w  Software, entire database, taxonomy?, teaser portions?, individual items?

w  The challenge of weak collection practices

n    Trade secret

w  Software, proprietary codes, usually NOT the entire database or individual items

n    Trademark

w  Logos

w   Redistribution, co-branding, framing and content serving

w   “Derivative works” (edits, summaries, abridgements and commingling)

w   Display rules

w   Post-termination rights

n    Replacement data

n    Replacement taxonomy

n    Compliance enforcement?

 

8.                  Other Licensing Issues

 

w   Transfer protocols and service levels

n    Data dump (electronic or physical), on-demand calls or joint page serving; sales tax implications

n    Data refreshing/caching

w   Anti-scraping obligations

w   Pass-throughs to end user

n    Contract restrictions against extraction

n    Liability disclaimers

w   Indemnity

n    Being the cheese in a sandwich

n    47 USC 230