<?xml version="1.0" encoding="UTF-8"?>
<rfc category="exp" consensus="true" docName="draft-liao-aipref-autoctl-ext-00" ipr="trust200902" sortRefs="true" submissionType="IETF" symRefs="true" tocInclude="true" version="3" xmlns:xi="http://www.w3.org/2001/XInclude">
  <front>
    <title abbrev="automation-preferences-ext">Protocol Extension for Advanced Automation Control</title>
    <seriesInfo name="Internet-Draft" value="draft-liao-aipref-autoctl-ext-00"/>
    <author fullname="Liao Peiyuan">
      <organization>Condé Nast</organization>
      <address>
        <postal>
          <country>United States of America</country>
        </postal>
        <email>peiyuan_liao@condenast.com</email>
      </address>
    </author>
    <date day="8" month="April" year="2025"/>
    <area>Applications</area>
    <workgroup>AI Preferences</workgroup>
    <keyword>Automation Preferences</keyword>
    <keyword>Automation Control</keyword>
    <keyword>Advanced Web Automation</keyword>
    <abstract>
      <t>This document specifies extensions to the automation-preferences.txt protocol,
      providing advanced controls for server-side automation permissions. It builds upon
      the core specification by adding sophisticated features such as rate limiting,
      automation technology restrictions, API permissions, session requirements, and
      HTML asset annotations. These extensions enable content providers to exercise
      more granular control over automated interactions while maintaining backward
      compatibility with implementations of the core protocol.</t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>The latest revision of this draft can be found at <eref target="https://datatracker.ietf.org/doc/draft-liao-aipref-autoctl-ext/"/>.
      Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-liao-aipref-autoctl-ext/"/>.</t>
      <t>Discussion of this document takes place on the
      AI Preferences Working Group mailing list (<eref target="mailto:ai-control@ietf.org"/>),
      which is archived at <eref target="https://mailarchive.ietf.org/arch/browse/ai-control/"/>.
      Subscribe at <eref target="https://www.ietf.org/mailman/listinfo/ai-control/"/>.</t>
    </note>
  </front>
  
  <middle>
    <section anchor="introduction">
      <name>Introduction</name>
      <t>This document extends the automation-preferences.txt protocol defined in
      "Protocol for Basic Automation Control"
      <xref target="CORE-SPEC"/> by introducing advanced directives and capabilities for more
      sophisticated control over automated interactions. These extensions address
      complex automation scenarios while maintaining backward compatibility with
      implementations of the core specification.</t>
      
      <t>The extensions defined in this document enable content providers to exercise
      more granular control over automated access, including rate limiting,
      specific technology restrictions, API usage policies, session validation
      requirements, and asset-level annotation methods. These capabilities are
      designed to complement the basic controls provided by the core specification,
      offering a progressive path to more comprehensive automation management.</t>
      
      <section anchor="relationship-to-core-specification">
        <name>Relationship to Core Specification</name>
        <t>This document builds upon the core specification without modifying its
        requirements. All directives and mechanisms defined in the core specification
        remain valid and are not redefined here. This document assumes familiarity
        with the core specification and uses its terminology and concepts throughout.</t>
        
        <t>The extensions defined in this document are OPTIONAL for both servers and
        clients. Implementations that support only the core specification are
        considered compliant with the automation-preferences.txt protocol, though they
        will not benefit from the advanced controls defined here.</t>
        
        <t>When both core and extended directives are present in an automation-preferences.txt
        file, parsers that do not support the extensions defined in this document
        MUST ignore the unrecognized directives, as specified in the core
        specification's extension mechanism.</t>
      </section>
    </section>
    
    <section anchor="conventions-and-terminology">
      <name>Conventions and Terminology</name>
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
      NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
      "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
      described in BCP&#xa0;14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
      appear in all capitals, as shown here.</t>
      
      <t>This document uses the terminology defined in the automation-preferences.txt
      protocol <xref target="CORE-SPEC"/>. The following additional terms are introduced in this document:</t>
      
      <ul spacing="normal">
        <li>
          <t><strong>Rate limiting</strong>: Constraints on the frequency or concurrency of automated
          requests to prevent excessive server load.</t>
        </li>
        <li>
          <t><strong>Automation technology</strong>: Specific tools or frameworks used for automation,
          such as headless browsers or browser automation protocols.</t>
        </li>
        <li>
          <t><strong>XHR/Fetch</strong>: XMLHttpRequest or Fetch API calls performed programmatically.</t>
        </li>
        <li>
          <t><strong>Session validation</strong>: Mechanisms to verify that automated requests are part
          of a legitimate user session.</t>
        </li>
        <li>
          <t><strong>Asset annotation</strong>: Metadata embedded within HTML documents to specify
          automation policies for individual content elements.</t>
        </li>
      </ul>
    </section>
    
    <section anchor="extended-protocol-specification">
      <name>Extended Protocol Specification</name>
      <t>This section defines additional directives that extend the automation-preferences.txt
      protocol. These directives may be used alongside the core directives in any
      group within the automation-preferences.txt file.</t>
      
      <section anchor="rate-limiting">
        <name>Rate Limiting</name>
        <t>Rate limiting directives specify constraints on the frequency and concurrency
        of automated requests to prevent excessive server load. The following directives
        are defined:</t>
        
        <ul spacing="normal">
          <li>
            <t><tt>RequestLimit</tt>: Specifies the maximum number of requests allowed within a
            time period, expressed as a count followed by a time unit (e.g., "60/minute").
            Supported time units are "second", "minute", "hour", and "day".</t>
          </li>
          <li>
            <t><tt>ConcurrentLimit</tt>: Specifies the maximum number of concurrent connections
            allowed from a single client.</t>
          </li>
        </ul>
        
        <t>Example:</t>
        <figure><artwork><![CDATA[
RequestLimit: 60/minute
ConcurrentLimit: 5
        ]]></artwork></figure>
        
        <t>Rate limiting directives apply to all requests within the scope of the group,
        regardless of HTTP method. If no rate limiting directives are specified,
        clients SHOULD NOT assume any specific rate limits, but SHOULD implement
        reasonable self-throttling to avoid overloading the server.</t>
      </section>
      
      <section anchor="automation-technology-restrictions">
        <name>Automation Technology Restrictions</name>
        <t>Automation technology directives specify whether specific automation tools or
        frameworks are permitted. The following directives are defined:</t>
        
        <ul spacing="normal">
          <li>
            <t><tt>AllowCDP</tt>: Boolean value indicating whether the use of Chrome DevTools
            Protocol (CDP) is permitted.</t>
          </li>
          <li>
            <t><tt>AllowHeadless</tt>: Boolean value indicating whether the use of headless
            browsers is permitted.</t>
          </li>
          <li>
            <t><tt>AllowSelenium</tt>: Boolean value indicating whether the use of Selenium
            WebDriver is permitted.</t>
          </li>
          <li>
            <t><tt>AllowPuppeteer</tt>: Boolean value indicating whether the use of Puppeteer
            is permitted.</t>
          </li>
          <li>
            <t><tt>AllowPlaywright</tt>: Boolean value indicating whether the use of Playwright
            is permitted.</t>
          </li>
        </ul>
        
        <t>Example:</t>
        <figure><artwork><![CDATA[
AllowCDP: false
AllowHeadless: false
AllowSelenium: false
AllowPuppeteer: false
AllowPlaywright: false
        ]]></artwork></figure>
        
        <t>If an automation technology directive is not specified, clients SHOULD NOT
        assume that the use of that technology is permitted. Implementations SHOULD
        respect these directives when applicable, even if the specific detection
        methods may vary.</t>
      </section>
      
      <section anchor="api-and-xhr-permissions">
        <name>API and XHR Permissions</name>
        <t>API and XHR permission directives specify rules for API usage and automated
        use of XMLHttpRequest, Fetch, or AJAX. The following directives are defined:</t>
        
        <ul spacing="normal">
          <li>
            <t><tt>APIAutomation</tt>: Indicates how API endpoints may be accessed by automated
            clients. Valid values are:</t>
            <ul spacing="normal">
              <li>
                <t><em>none</em>: No API automation is permitted.</t>
              </li>
              <li>
                <t><em>with-key-only</em>: API automation is permitted only with proper authentication.</t>
              </li>
              <li>
                <t><em>open</em>: API automation is generally permitted.</t>
              </li>
            </ul>
          </li>
          <li>
            <t><tt>AllowXHR</tt>: Indicates how XMLHttpRequest or Fetch API may be used by
            automated clients. Valid values are:</t>
            <ul spacing="normal">
              <li>
                <t><em>none</em>: No XHR/Fetch automation is permitted.</t>
              </li>
              <li>
                <t><em>read-only</em>: Only GET requests are permitted via XHR/Fetch.</t>
              </li>
              <li>
                <t><em>open</em>: XHR/Fetch automation is generally permitted.</t>
              </li>
            </ul>
          </li>
          <li>
            <t><tt>DisallowFetchFrom</tt>: Comma-separated list of URL patterns from which
            automated XHR/Fetch requests are prohibited. Wildcards MAY be used.</t>
          </li>
        </ul>
        
        <t>Example:</t>
        <figure><artwork><![CDATA[
APIAutomation: with-key-only
AllowXHR: read-only
DisallowFetchFrom: /account/*, /checkout/*, /admin/*
        ]]></artwork></figure>
        
        <t>If API and XHR permission directives are not specified, clients SHOULD assume
        the most restrictive value (i.e., "none" for APIAutomation and AllowXHR).</t>
      </section>
      
      <section anchor="session-requirements">
        <name>Session Requirements</name>
        <t>Session requirement directives specify whether automated requests must be part
        of a legitimate user session. The following directives are defined:</t>
        
        <ul spacing="normal">
          <li>
            <t><tt>RequireHumanInitiatedSession</tt>: Boolean value indicating whether automated
            requests must be part of a session that was initiated by a human user.</t>
          </li>
          <li>
            <t><tt>SessionValidation</tt>: Specifies the method used to validate sessions. Valid
            values are:</t>
            <ul spacing="normal">
              <li>
                <t><em>cookie-based</em>: Sessions are validated using HTTP cookies.</t>
              </li>
              <li>
                <t><em>token-based</em>: Sessions are validated using authentication tokens.</t>
              </li>
              <li>
                <t><em>oauth</em>: Sessions are validated using OAuth.</t>
              </li>
              <li>
                <t><em>none</em>: No session validation is required.</t>
              </li>
            </ul>
          </li>
          <li>
            <t><tt>SessionTTL</tt>: Specifies the maximum time-to-live for a session, expressed
            as a duration (e.g., "30m", "2h", "1d").</t>
          </li>
          <li>
            <t><tt>RequireUserAgent</tt>: Boolean value indicating whether automated requests
            must include a valid User-Agent header.</t>
          </li>
        </ul>
        
        <t>Example:</t>
        <figure><artwork><![CDATA[
RequireHumanInitiatedSession: true
SessionValidation: cookie-based
SessionTTL: 1h
RequireUserAgent: true
        ]]></artwork></figure>
        
        <t>If session requirement directives are not specified, clients SHOULD NOT assume
        any specific session requirements, but SHOULD include a valid User-Agent header
        in all requests.</t>
      </section>
      
      <section anchor="html-asset-annotation">
        <name>HTML Asset Annotation</name>
        <t>In addition to a site-level automation-preferences.txt file, automation preferences
        MAY be embedded directly within HTML documents to annotate individual
        assets. This mechanism enables content creators to specify fine-grained
        automation policies for particular content items.</t>
        
        <t>Authors SHOULD use structured data markup using JSON-LD embedded in a <tt>&lt;script&gt;</tt>
        element. The JSON object SHOULD use a defined type (e.g., "AutomationPolicyAnnotation")
        and include relevant fields that mirror those used in automation-preferences.txt.</t>

        <t>Note that unlike site-wide directives, asset-level annotations SHOULD NOT include
        HTTP method restrictions, request limits, or concurrency limits, as these concepts
        apply to endpoints and services rather than to individual content assets.</t>
        
        <t>Example:</t>
        <figure><artwork><![CDATA[
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "AutomationPolicyAnnotation",
  "automationPolicy": "limited",
  "allowCDP": false,
  "allowHeadless": false,
  "automationPurpose": {
    "require": true,
    "allowed": [[PLACEHOLDER_PURPOSE1], [PLACEHOLDER_PURPOSE2]],
    "disallowed": [[PLACEHOLDER_PURPOSE3]]
  },
  "contactEmail": "automation-policy@example.com"
}
</script>
        ]]></artwork></figure>
        
        <t>When both a automation-preferences.txt file and HTML asset annotations are present,
        the more specific rule (typically the HTML annotation) SHALL be applied to
        the corresponding content asset. Clients supporting HTML asset annotations
        SHOULD parse and respect these annotations when present.</t>
        
        <t>The annotation schema MAY include any directives defined in the core or
        extension specifications. Fields in the annotation SHOULD use camelCase naming
        to align with JSON-LD conventions, while maintaining semantic equivalence to
        the corresponding directives in the automation-preferences.txt file.</t>
      </section>
    </section>
    
    <section anchor="backward-compatibility">
      <name>Backward Compatibility</name>
      <t>The extensions defined in this document maintain backward compatibility with
      implementations of the core specification. This compatibility is achieved
      through the following mechanisms:</t>
      
      <ul spacing="normal">
        <li>
          <t>All directives defined in this document are OPTIONAL. Implementations that
          support only the core specification can safely ignore these directives, as
          specified in the core specification's extension mechanism.</t>
        </li>
        <li>
          <t>The extensions do not modify or override the behavior of any directives
          defined in the core specification.</t>
        </li>
        <li>
          <t>Extended directives enhance but do not replace core functionality.</t>
        </li>
      </ul>
      
      <t>Implementations supporting these extensions SHOULD degrade gracefully when
      interacting with servers or clients that support only the core specification:</t>
      
      <ul spacing="normal">
        <li>
          <t>Servers supporting extensions SHOULD still process all core directives
          correctly, even if extended directives are also present.</t>
        </li>
        <li>
          <t>Clients supporting extensions SHOULD still honor all core directives, even
          if they do not recognize extended directives in a file.</t>
        </li>
        <li>
          <t>When HTML asset annotations are not supported by a client, the client SHOULD
          fall back to the site-level automation-preferences.txt file for guidance.</t>
        </li>
      </ul>
      
      <t>This approach ensures that the introduction of extensions does not break
      existing implementations while providing a path for enhanced functionality.</t>
    </section>
    
    <section anchor="implementation-and-enforcement">
      <name>Implementation and Enforcement</name>
      <t>Servers implementing the extensions defined in this document SHOULD:</t>
      
      <ul spacing="normal">
        <li>
          <t>Employ detection mechanisms (e.g., CDP fingerprinting, headless browser
          detection) to identify automated clients using specific technologies.</t>
        </li>
        <li>
          <t>Implement rate limiting according to the specified directives.</t>
        </li>
        <li>
          <t>Validate sessions as required by the session requirement directives.</t>
        </li>
        <li>
          <t>Process HTML asset annotations when interpreting automation policies for
          specific content.</t>
        </li>
        <li>
          <t>Respond with appropriate HTTP status codes for non-compliant requests,
          such as:</t>
          <ul spacing="normal">
            <li>
              <t>429 Too Many Requests for rate limit violations.</t>
            </li>
            <li>
              <t>403 Forbidden for unauthorized automation technology use.</t>
            </li>
            <li>
              <t>401 Unauthorized for missing or invalid authentication.</t>
            </li>
          </ul>
        </li>
      </ul>
      
      <t>Clients supporting these extensions SHOULD:</t>
      
      <ul spacing="normal">
        <li>
          <t>Honor rate limiting directives by self-throttling requests.</t>
        </li>
        <li>
          <t>Respect automation technology restrictions by avoiding prohibited tools.</t>
        </li>
        <li>
          <t>Adhere to API and XHR permissions as specified.</t>
        </li>
        <li>
          <t>Establish and maintain valid sessions when required.</t>
        </li>
        <li>
          <t>Parse and respect HTML asset annotations when present.</t>
        </li>
      </ul>
      
      <t>Both servers and clients MAY implement additional detection and enforcement
      mechanisms beyond those explicitly described in this document, as long as they
      maintain compatibility with the specified directives.</t>
    </section>
    
    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>In addition to the security considerations mentioned in the core specification,
      the extensions defined in this document introduce the following considerations:</t>
      
      <ul spacing="normal">
        <li>
          <t><strong>Rate Limiting</strong>: Implementations of rate limiting SHOULD use secure methods
          to track request counts and prevent circumvention through IP spoofing or
          other means.</t>
        </li>
        <li>
          <t><strong>Technology Detection</strong>: Methods used to detect specific automation
          technologies MAY be circumvented by sophisticated clients. Servers SHOULD
          employ multiple detection approaches and adapt to evolving evasion techniques.</t>
        </li>
        <li>
          <t><strong>Session Validation</strong>: Session validation mechanisms SHOULD be resistant to
          replay attacks and session hijacking attempts.</t>
        </li>
        <li>
          <t><strong>HTML Asset Annotations</strong>: Parsing of JSON-LD annotations MUST be performed
          securely to prevent injection attacks or denial-of-service through malformed
          input.</t>
        </li>
      </ul>
      
      <t>The extensions provide more granular control over automated access, which can
      enhance security, but they also introduce complexity that may lead to
      misconfiguration. Implementers SHOULD carefully test and validate their
      configurations to ensure they provide the intended protections.</t>
    </section>
    
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>
    
    <section anchor="future-work">
      <name>Future Work</name>
      <t>Future enhancements to the automation-preferences.txt protocol MAY include:</t>
      
      <ul spacing="normal">
        <li>
          <t>Soliciting further feedback from browser vendors, content owners, AI model
          and automation tool developers.</t>
        </li>
        <li>
          <t>Developing reference implementations and comprehensive detection libraries.</t>
        </li>
        <li>
          <t>Formalizing the protocol in collaboration with the IETF and W3C.</t>
        </li>
        <li>
          <t>Expanding interoperability with related protocols for consistent content preference 
          signaling across the web.</t>
        </li>
        <li>
          <t>Standardizing the HTML asset annotation schema through formal registration
          with schema.org or similar organizations.</t>
        </li>
      </ul>
    </section>
  </middle>
  
  <back>
    <references>
      <name>References</name>
      
      <references anchor="normative">
        <name>Normative References</name>
        <reference anchor="RFC2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author initials="S." surname="Bradner" fullname="Scott Bradner">
              <organization>Harvard University</organization>
            </author>
            <date year="1997" month="March" />
          </front>
          <seriesInfo name="BCP" value="14" />
          <seriesInfo name="RFC" value="2119" />
          <seriesInfo name="DOI" value="10.17487/RFC2119" />
        </reference>
        <reference anchor="RFC8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author initials="B." surname="Leiba" fullname="Barry Leiba">
              <organization>Huawei Technologies</organization>
            </author>
            <date year="2017" month="May" />
          </front>
          <seriesInfo name="BCP" value="14" />
          <seriesInfo name="RFC" value="8174" />
          <seriesInfo name="DOI" value="10.17487/RFC8174" />
        </reference>
        <reference anchor="CORE-SPEC">
          <front>
            <title>automation-preferences.txt Protocol for Basic Automation Control</title>
            <author initials="P." surname="Liao" fullname="Liao Peiyuan">
              <organization>Condé Nast</organization>
            </author>
            <date year="2025" month="April" />
          </front>
          <seriesInfo name="Internet-Draft" value="draft-liao-aipref-autoctl-core-00" />
        </reference>
      </references>
      
      <references anchor="informative">
        <name>Informative References</name>
        <reference anchor="RFC9309">
          <front>
            <title>Robots Exclusion Protocol</title>
            <author initials="M." surname="Koster" fullname="Martijn Koster">
              <organization></organization>
            </author>
            <author initials="G." surname="Illyes" fullname="Gary Illyes">
              <organization>Google LLC</organization>
            </author>
            <author initials="H." surname="Zeller" fullname="Henner Zeller">
              <organization>Google LLC</organization>
            </author>
            <author initials="L." surname="Sassman" fullname="Lizzi Sassman">
              <organization>Google LLC</organization>
            </author>
            <date year="2022" month="September" />
          </front>
          <seriesInfo name="RFC" value="9309" />
          <seriesInfo name="DOI" value="10.17487/RFC9309" />
        </reference>
      </references>
    </references>
    
    <section numbered="false" anchor="sample-extended-automation-preferences-txt-file">
      <name>Sample Extended automation-preferences.txt File</name>
      <t>The following is an example of a automation-preferences.txt file that includes
      both core and extended directives:</t>
      
      <figure>
        <artwork><![CDATA[
# Automation preferences for example.com
# Version: 2.0
# Last updated: 2025-04-08

# Group 1: Applies to the entire site
Host: example.com
Scope: /
AutomationPolicy: limited
AllowedMethods: GET, HEAD
DisallowedMethods: POST, PUT, DELETE, PATCH
RequireAutomationPurpose: true
AllowedPurposes: [PLACEHOLDER_PURPOSE1], [PLACEHOLDER_PURPOSE2]
DisallowedPurposes: [PLACEHOLDER_PURPOSE3]
ContactEmail: automation-policy@example.com

# Extended directives
RequestLimit: 60/minute
ConcurrentLimit: 5
AllowCDP: false
AllowHeadless: false
AllowSelenium: false
AllowPuppeteer: false
AllowPlaywright: false
APIAutomation: with-key-only
RequireUserAgent: true
AllowXHR: read-only
DisallowFetchFrom: /account/*, /checkout/*, /admin/*
RequireHumanInitiatedSession: true
SessionValidation: cookie-based
SessionTTL: 1h

# Group 2: Specific preferences for the /admin/ path
Host: example.com
Scope: /admin/
AutomationPolicy: strict
AllowedMethods: GET
DisallowedMethods: POST, PUT, DELETE, PATCH
AllowedPurposes: [PLACEHOLDER_PURPOSE1]
DisallowedPurposes: [PLACEHOLDER_PURPOSE2], [PLACEHOLDER_PURPOSE3]

# Extended directives for admin path
RequestLimit: 10/minute
ConcurrentLimit: 2
RequireHumanInitiatedSession: true
SessionValidation: token-based
SessionTTL: 30m
        ]]></artwork>
      </figure>
    </section>
  </back>
</rfc>