lumendatabase org что это

Lumen Tools for Researchers

An Introduction to the Lumen Project

FAQ Contents:

What is Lumen?

Lumen is an independent research project studying cease and desist letters concerning online content. We collect and analyze requests to remove material from the web. Our goals are to educate the public, to facilitate research about the different kinds of complaints and requests for removal—both legitimate and questionable—that are being sent to Internet publishers and service providers, and to provide as much transparency as possible about the “ecology” of such notices, in terms of who is sending them and why, and to what effect.

Our database contains millions of notices, some of them with valid legal basis, some of them without, and some difficult to determine. The fact that Lumen has a notice in its database does not mean that Lumen is authenticating the provenance of that notice or making any judgment on the validity of the claims it raises.

Conceived, developed, and founded in 2002 by then-Berkman Klein Center Fellow Wendy Seltzer, the project, then called «Chilling Effects», was initially focused on requests submitted under the United States’ Digital Millennium Copyright Act. As the Internet and its usage has evolved, so has Lumen, and the database now includes complaints of all varieties, including trademark, defamation, and privacy, domestic and international, and court orders. The Lumen database grows by more than 40,000 notices per week, with voluntary submissions provided by companies such as Google, Twitter, YouTube, Wikipedia, Counterfeit Technology, Medium, Stack Exchange, Vimeo, DuckDuckGo, aspects of the University of California system, and WordPress. As of the summer of 2019, the project hosts approximately twelve million notices, referencing close to four billion URLs. In 2018, the project website was visited over ten million times by users from virtually every country in the world.

Lumen is supported by a grant from Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin.

What content can I find in the database, and where does it come from?

See here for comprehensive and up-to-date details of who sends Lumen notices, what notices they send, and what details are part of each notice. Aggregating all of these different requests to remove material facilitates the research, study and mapping of the Internet’s removal request landscape by journalists, NGOs, policy-makers, and academics.

A notice or work contains «[redacted]» – what is missing?

Lumen staff make a good faith effort to review and redact any potentially sensitive notices that the project receives in order to remove sensitive or personal information from the text of notices. Such information might include phone numbers, email addresses, or allegedly defamatory content. Further, an individual or company submitting a notice directly to the Lumen database may have decided not to share with Lumen, or to keep private, certain pieces of information in the notice. Finally, Lumen runs automated processes to remove certain sensitive information from notices and descriptions of their associated works where possible.

Please note that for DMCA notices, Lumen does not typically redact the name of the rightsholder making the request or the URL(s) of the material complained of. Without the location of the complained-of material and the complainant, the notices are meaningless from a transparency or research perspective, to say nothing of offering no insight as to possible misuse of takedown notices as a vehicle for censorship.

When a company shares copies of court orders it has received with us, Lumen typically displays those orders in the form in which they have been shared with Lumen and further, makes a good faith effort to do so in accordance with the applicable law of the jurisdiction from which the order emerged. United States court orders, unless sealed, are public documents.

See here for more details on what gets redacted from what notices.

Who is Lumen for?

Lumen is designed for casual use both by lay Internet users curious about a notice they may have encountered, perhaps in the news, or because of personal interest. (see below for more details about viewing notices) as well as by journalists, NGOs, policy-makers, academics, and other legal researchers conducting more in-depth and focused research or studying larger trends about content removal online.

Lumen is not intended to be, is not set up to to be, and should not be used as, part of the work-flow of any particular business model. Companies interested in takedown notices regarding them or their clients that have been sent to other platforms would be best served contacting those platforms directly for more information. If you or your organization are interested in conducting journalistic, academic, legal, or policy-focused research of your own, or have further ideas about we might improve the database and its interfaces, email us at team@lumendatabase.org.

Viewing a Notice

For non-researchers, Lumen currently offers access to one full notice per email address every twenty-four (24) hours. Submitting an email address through the request form will provide a 1-time use URL for that particular notice that will display the full contents of the notice. Access through this URL will last for 24 hours. See here for more details.

How does it work?

Most users will find that the web interface will suffice for browsing and discovery within the database. However, for those that need to access larger swaths of data for their research, or for those interested in submitting copies of takedown notices to Lumen, we offer our API. Read on for further information.

BASIC FACTS ABOUT THE API AND DATABASE

Contents

API Documentation

The documentation for the Lumen API can be found here.

Formatting

When a query or request is submitted to the database, the system will return a response with a list of JSON-encoded attributes. Learn more about JSON (JavaScript Object/Open Notation) here. This format is designed to be “machine readable,” and not necessarily useful to a human reader in its raw form. However, there are many tools for rendering JSON output into a friendlier form, and we recommend finding one that works for you.

Example JSON Request: Example Successful JSON Output:

The Lumen database accepts dates in a variety of formats but always outputs dates in Unix Time, which is the number of seconds elapsed since the beginning of the Unix epoch. This can be quite confusing at first, and we recommend using a Unix Timestamp conversion tool (like this one here) to transform these raw date outputs into something a human can understand.

Searching the Database

Most users will find that the web interface will suffice for browsing and discovery within the database. However, for those that need to access larger swaths of data or create automated processes to digest data trends, we offer our new API.

Searching the database, whether through the web interface or with the API, is done via full-text search. The default search is to search all possible notice fields and facets. Searches can also refined based on specific slices of the database or on specific facets of the data. See the documentation for the applicable notice parameters and metadata.

QUERYING THE DATABASE WITH THE API

Contents

Getting an API Key

Basic search from the command line

To query the database, use your preferred tools for HTTP «get» requests. There are a number of options available, so pick one depending on your research needs.
Examples include:

Example search query for Batman where

is the database field or facet that is the object of the search.

Here’s a search query for star where term is the parameter.

Running these search queries through the API will allow you to search for some period of time, as well as download search results for use and reuse in applications. A complete list of searchable parameters can be found here.

Requesting a List of Topics

The database classifies notices into one or more topics, more of which may be added over time. Certain topics are categorized as subtopics of a larger, comprehensive root topic. For example, like “DMCA,” “fair use,” and “anti-circumvention” all fall under “Copyright.” Each topic has a unique numerical ID in the database. To request a list of topics, use the following command.

This command will return results with three pieces of information: 1) the topic’s unique ID number, 2) the name of the topic, and 3) either the ID number of the parent topic or null if the topic is a root topic.

idintegerThe unique ID used for the topic_ids array during notice creation
namestringThe topic name
parent_idintegerThe parent topic_id of this topic, or «null» if this is a root topic.

Searching the notices

On the web interface, above a certain number of hits your search results will be paginated. By default, results are sorted by descending relevance. Full-text search results contain the same data as an individually-requested notices, with the addition of a score field that articulates the result relevance to the query term; higher numbers are more relevant. Terms are joined with an ‘OR’ by default.

Источник

Lumendatabase org что это

Every takedown notice and removal request in the Lumen database is there because one of the parties involved with the sending or receipt of the notice (usually, but not always, its receipt) has chosen to share a copy of that notice with Lumen. That being the case, Lumen has only the information the sharing entity has chosen to share with Lumen regarding that notice, and different companies and individuals have chosen to share different types of information. On this page are more details regarding what information a given notice contains, on both a source-by-source and notice type-by-notice type basis.

Who is Involved With A Notice

The recipient is to whom the notice or request or order was directed, the person or company that the sender thinks has the ability to take the material down. Examples are Google, Twitter, DuckDuckGo, Medium and others.

Often, the sender of a notice is also the person or company whose rights are at stake. For example, an artist might send a DMCA complaint, or a person might send a court order regarding defamation. However, larger rightsholders sometimes use third parties, such an agency or lawyer, to manage their copyrights, and in that case it will be those third parties who send the notice. For such notices, the person or entity whose rights are at issue will be listed as the principal, while the agent who actually sent the notice will be listed as the sender, and as having sent «on behalf of» the principal. For notices where the principal and the sender are the same, they will not be listed separately. You can see an example of a notice with both principal and sender here.

The submitter of a notice is the person or company, almost always the latter, responsible for sharing a copy with Lumen. Most often, but not always, the submitter is the notice’s recipient. The submitter is listed separately for clarity, even if the same as the recipient or sender.

General Redactions Performed by Lumen

Lumen makes a good faith effort to redact out all personally identifying information (“PII”) contained within notices other than the name of the sender or rightsholder, and the country of origin of the notice. Our automatic redaction processes seek to identify and remove the following:

Lumen also makes a good faith effort to not display the street addresses of individuals who are the Senders or Recipients of notices if that information has been included in a notice. Lumen will generally remove such information, as well as other PII, on request if it is inadvertently included in notice fields by a notice Sender.

Lumen generally does NOT remove the names of the individual or entity who holds the right(s) at issue that the notice is seeking to exercise. This is typically the notice’s Sender and/or Principal, but sometimes only the Principal, in the case of notices sent by a 3rd party, such as a lawyer or agency.

If the Sender of a notice is such a 3rd party individual, Lumen makes a good faith effort to redact out the Sender’s name. Lumen does not generally redact out the names of 3rd party companies, law firms or other agencies.

Types of notices found on Lumen and the redactions performed on them

DMCA notices

In general, when Lumen receives a DMCA notice, it displays the notice as it was received by Lumen, subject only to the automatic redactions as described above. This means Lumen displays the name of the rights-holder making the request in the form that it originally appeared on the notice as sent. Lumen also generally displays the URL location(s) of the material complained of, although truncated to the top-level domain of the URL text. We include complained-of URLs as raw text only, and use robots.txt to request that the URLs, as well as the entire content of notice pages, not be indexed by search engines. Without the location of the complained-of material and the identity of the complainant, the notices are meaningless from a public transparency or research perspective, to say nothing of offering no insight as to possible misuse of takedown notices as a vehicle for censorship or other ends.

Defamation notices

Lumen displays defamation notices that it receives in the form that it receives them, but makes a good faith effort to remove any allegedly defamatory language that may have been included, and will generally remove inadvertently included allegedly defamatory language on the request of the notice Sender. In addition to these general redactions by Lumen, different companies that share notices with Lumen share different aspects of Defamation notices, see below for further details.

Private information

Lumen displays private information notices that it receives in the form that it receives them, but makes a good faith effort to remove any PII that was included, ( see above) and will remove any remaining PII on the request of the notice Sender.

Court Orders

Orders from United States courts are publicly available documents, and Lumen therefore generally shares them in the form in which they were received, with no redactions. For court orders issued from jurisdictions other than the United States Lumen makes a good faith effort to redact them in a way that matches the relevant legal restrictions of the jurisdiction in question.

“Counterfeit” notices

These notices are those sent seeking the removal of material having to do with counterfeit goods. Lumen treats these in the same manner as the DMCA notices it receives.

Other notices

For notices in any categories other than those listed above, Lumen performs redactions as if the notice was a DMCA notice.

Notice submitters

Different submitters submit notices within different notice categories. It is possible that the details of a a notice may qualify for more than one category, but each notice will fall only within the single category chosen by the submitter when sharing with Lumen. For example, a court order might be regarding defamation.

Most of the submitters with which Lumen works submit only notices they have received under the guidelines of the United States’ Digital Millennium Copyright Act (“DMCA”) but some submit other categories of notices, including notices having to do with trademark and patent, court orders for removal from both U.S. domestic and foreign courts, notices having to do with private information or allegedly defamatory content, as well as other types. Lumen is never the original sender or recipient of these notices, and for any given notice, has only the information that the submitter in question chose to share with it.

Lumen does not have any additional information regarding a notice or its provenance beyond what is visible on the Lumen website, and any further questions about a given notice, including requests for retraction or changes, should be directed to the notice’s sender or recipient or both

Automattic/Wordpress

Automattic is the parent company for wordpress.com and Tumblr, and the host of many WordPress blogs. It shares with Lumen copies of the non-DMCA removal requests it receives from governments or agencies.regarding content it hosts. You can see an example of an Automattic notice here: https://www.lumendatabase.org/notices/10783920

Cloudflare, Inc.

Cloudflare, Inc. is an American web infrastructure and website security company that provides content delivery network services, DDoS mitigation, Internet security, and distributed domain name server services. Cloudflare’s services sit between a website’s visitor and the Cloudflare user’s hosting provider, acting as a reverse proxy for websites. Cloudflare shares with Lumen copies of DMCA notices and court orders sent directly to Cloudflare regarding their products.

Counterfeit Technology

Counterfeit Technology is a rights management company that sends DMCA notices on behalf of its clients. It shares with Lumen copies of the DMCA notices it sends to others. You can see an example of a Counterfeit Technology notice here: https://www.lumendatabase.org/notices/12506014

DuckDuckGo

DuckDuckGo «is an internet search engine that emphasizes protecting searchers’ privacy and avoiding the filter bubble of personalized search results. DuckDuckGo distinguishes itself from other search engines by not profiling its users and by showing all users the same search results for a given search term,» and that shares with Lumen copies of the DMCA requests it receives.

Also of note, when DuckDuckGo receives a DMCA notification regarding a disputed text or thumbnail search result, it directs the complainant who sent the DMCA to Yahoo! Search or Bing, with the following response:

Thank you for your email. If you are requesting removal of content in DuckDuckGo search results, please note that we do not store or host the content. We syndicate that content from industry partners, including Verizon Media/Oath/Yahoo! and Microsoft/Bing. You should direct your request to Yahoo! at https://policies.oath.com/us/en/oath/ip/index.html and/or to Bing at https://www.microsoft.com/info/Search.html. DuckDuckGo search results will automatically and promptly (within about 10 days) reflect their takedown actions. Thank you.

Google

Google is Lumen’s largest submitter by both total volume of notices and number of possible notice types. Below is an up-to-date list of the types of notices Google has chosen to share with Lumen, arranged by notice type ( DMCA, Defamation, Circumvention, etc.) and a list of the various Google product (Search, Blogger, Images, etc.) regarding which Google may receive notices that it then shares with Lumen.

Types of Notices that Google shares with Lumen

A list of Google Products regarding which Lumen may have related notices

Relevant Google Notice Types

DMCA NOTICES

As described above, DMCA notices from Google, like all DMCA notices on Lumen, are published in the form in which they are originally sent. Lumen makes a good faith effort to redact out personally identifying information other than the Sender’s name, which will be displayed in the form in which it was shared with Lumen.

CIRCUMVENTION

Circumvention notices that Lumen receives from Google are published in the form in which they are originally sent. Lumen makes a good faith effort to redact out personally identifying information other than the Sender’s name, which will be displayed in the form in which it was shared with Lumen.

COUNTERFEIT (as of June 2020)

Google’s help page on this topic describes these notices as follows:
“Upon notice, Google will remove web pages selling counterfeit goods from our search results. Counterfeit goods contain a trademark or logo that is identical to or substantially indistinguishable from the trademark of another. They mimic the brand features of the product in an attempt to pass themselves off as a genuine product of the brand owner.

This policy only applies to specific web pages selling counterfeit goods and does not apply to non-counterfeit trademark issues. Failure to limit complaints under this policy to web pages selling counterfeit goods may result in restrictions on sending future complaints.«

COURT ORDERS

Google shares with Lumen copies of court orders for removal that it receives, from both U.S. domestic and foreign courts. As described above, U.S Court orders, unless sealed, are public records, and those court orders are presented on Lumen in the form in which Lumen receives them from Google. If Google is prohibited from sharing the contents of a court order it has received, that will be explicitly stated in the text of the notice.

Lumen makes a good faith effort to redact foreign court orders that are shared with Lumen in a manner according to whatever local law is applicable in the order’s jurisdiction of origin. These redactions will typically include names, addresses, and other forms of PII, and may include aspects of the URLs in question. What other redactions are present within a document may vary on a country-by-country basis.

DEFAMATION NOTICES

As a matter of Google’s internal policies, Google does not share with Lumen the names of the Senders of Defamation notices. Further, if the name of a Sender/Principal occurs in some form within one of the complained-of URLs that are the subject of the notice, or within any text within the notice, that text will be redacted out as well.

For instance, an original URL of

would be shared by Google with Lumen as

Further, Google does not share with Lumen the text entered by a complainant in the

“In order to ensure specificity, please quote the exact text…”

field on Google’s webform.

Text entered by a complainant in the

“Please explain in detail why you believe the content on the above URLs is unlawful, citing specific provisions of law wherever possible.”

field of the webform is shared by Google with Lumen and displayed as part of the notice on Lumen as is, subject only to Lumen’s automatic redaction processes.

Please note that if the subject of the alleged defamation is not the sender of the notice, that subject’s name will not be automatically redacted from the notice. Please notify Lumen if the name of the allegedly defamed person is still present within a notice.

Notices sent to Google from Senders in countries other than the United States are typically governed by the local law of the country of origin. In such cases, in addition to Lumen’s standard good faith effort redactions, redactions are made according to whatever local law is applicable in the order’s jurisdiction of origin. These redactions will include names, addresses, and other forms of PII, and may include aspects of the URLs in question. What other redactions are present within a document may vary on a country-by-country basis as well as according to Google’s internal policy decisions, to which Lumen is not privy.

OTHER TYPES OF NOTICES

*** Please note that Google does NOT, at this time, share with Lumen notices that it receives from EU citizens as part of the so-called “Right to be forgotten (“RTBF”) or notices submitted through Google’s form for reporting sexually explicit images, which can be found here.*** https://support.google.com/websearch/troubleshooter/3111061#ts=2889054%2C2889099

Google sometimes indicates, via reference to a “placeholder” notice on Lumen, when it has received requests to remove material about which it cannot share more information.

Источник

Lumendatabase org что это

lumendatabase org что это lumendatabase org что это

The Lumen Database collects and analyzes legal complaints and requests for removal of online materials, helping Internet users to know their rights and understand the law. These data enable us to study the prevalence of legal threats and let Internet users see the source of content removals.

Automated Submissions and Search Using the API

The main Lumen Database instance has an API that allows individuals and organizations that receive large numbers of notices to submit them without using the web interface. The API also provides an easy way for researchers to search the database. Members of the public can test the database, but will likely need to request an API key from the Lumen team to receive a token that provides full access. To learn about the capabilities of the API, you can consult the API documentation.

You can customize behavior during seeding (db:setup) with a couple of environment variables:

Sample user logins

The seed data creates logins of the following form:

You will need to do some setup before the first time you run this:

It will default to using the number of processors parallel_tests believes to be available, but you can change this by setting ENV[‘PARALLEL_TEST_PROCESSORS’] to the desired number.

Use rubocop and leave the code at least as clean as you found it. If you make linting-only changes, it’s considerate to your code reviewer to keep them in their own commit.

Here are all the environment variables which Lumen recognizes. Find them in the code for documentation.

Most of these are optional and have sensible defaults (which may vary by environment).

The application requires a mail server, in development it’s best to use a local SMTP server that will catch all outgoing emails. Mailcatcher is a good option.

You can search the database and, if you have a contributor token, add to the database using our API.

Lumen Database is licensed under GPLv2. See LICENSE.txt for more information.

Copyright (c) 2016 President and Fellows of Harvard College

Источник

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *