Skip to content

Full-Text Search & Grep

EdgarTools provides two complementary tools for searching the text content of SEC filings:

Tool Purpose Scope
search_filings() Find filings that mention a topic All of EDGAR
filing.grep() Find exact text within a filing One filing's documents

search_filings() answers "which filings talk about this?" using SEC's full-text search index. grep() answers "where exactly does this text appear?" within a specific filing.

How this relates to other search features

  • Search & Filter — find filings by metadata (form type, date, company)
  • Advanced Search — BM25-ranked search within a single parsed document
  • This page — search filing text content across EDGAR, and grep within filings

search_filings() queries SEC's EFTS (EDGAR Full-Text Search) index — the same engine behind the search box on sec.gov. It searches the actual text inside filings, not just metadata.

Basic Usage

from edgar import search_filings

# Find filings mentioning artificial intelligence
results = search_filings("artificial intelligence", forms=["10-K"])

# Scoped to a company
results = search_filings("supply chain risk", ticker="AAPL")

# Date range
results = search_filings("tariff impact", forms=["8-K"], start_date="2024-01-01")

# Use quotes for phrase matching
results = search_filings('"exclusive license" "trade secret"', forms=["8-K"])

Each result includes relevance score, document type, and metadata:

r = results[0]
r.score           # 21.45 — relevance from EFTS
r.form            # '8-K'
r.company         # 'PyroTec, Inc.'
r.filed           # '2012-09-20'
r.file_type       # 'EX-10.05' — which document matched
r.items           # ['1.01', '2.01'] — 8-K item numbers
r.sic             # '6770' — SIC code
r.location        # 'Foster City, CA'
r.accession_number  # '0001193125-12-400000'

Filtering Results

Filter the fetched results client-side without re-querying:

results = search_filings('"going concern"', forms=["8-K", "10-K"])

# By SIC code (e.g. shell companies)
shells = results.filter(sic="6770")

# By 8-K item number
material = results.filter(items="1.01")  # Material agreements

# By relevance score
strong = results.filter(min_score=15.0)

# By document type (prefix match)
exhibits = results.filter(file_type="EX-10")  # Matches EX-10.1, EX-10.05, etc.

# By date range
recent = results.filter(start_date="2024-01-01", end_date="2024-12-31")

# By state
california = results.filter(state="CA")

# Chain filters
targeted = results.filter(sic="6770").filter(items="1.01").filter(min_score=10.0)

Sort, slice, and sample:

# Sort by score (default), date, company, or SIC
by_date = results.sort_by("filed", reverse=False)  # Oldest first
by_score = results.sort_by("score")                  # Highest relevance first

# Slice and sample
top5 = results.head(5)
last5 = results.tail(5)
random10 = results.sample(10)

# Python slicing
page = results[5:15]

Aggregations

Every search returns faceted counts — a summary of who and what matched without downloading filings:

results = search_filings('"exclusive license" "trade secret"', forms=["8-K"])

# Top entities by hit count
for a in results.aggregations.entities[:5]:
    print(f"{a.key}: {a.count} filings")

# Top SIC codes
for a in results.aggregations.sics[:5]:
    print(f"SIC {a.key}: {a.count} filings")

# Also available: .states, .forms

This is useful for exploratory analysis — understand the landscape before drilling into individual filings.

Pagination

search_filings() returns one page of results (default 20, max 100 per call). Paginate to get more:

# Get first 100 results
results = search_filings("cybersecurity incident", forms=["8-K"], limit=100)
print(f"{results.total:,} total matches, showing {len(results)}")

# Fetch the next page
page2 = results.next()  # Returns None when exhausted

# Or fetch many more at once (up to 5,000 additional)
all_results = results.fetch_more(500)  # Accumulates 500 more, rate-limited
print(f"Now have {len(all_results)} results")

Loading a Filing

Each result can load its full Filing object for deeper analysis:

r = results[0]
filing = r.get_filing()  # Loads the full Filing
tenk = filing.obj()       # Parse as TenK, EightK, etc.

Grep

grep() is the universal exact-match search for content within a filing. It searches all documents (primary filing + exhibits) by default, like grep -ri on a directory.

Every AI agent has grep semantics burned into its training. Zero learning curve.

Filing.grep()

from edgar import Company

company = Company("AAPL")
filing = company.get_filings(form="10-K").latest(1)

# Search all documents in the filing
matches = filing.grep("going concern")
print(f"{len(matches)} matches found")

for m in matches:
    print(m)
    # primary:  ...substantial doubt about the entity's ability to continue as a going concern...
    # EX-99.1:  ...the report includes a going concern qualification...

Each match includes:

m = matches[0]
m.location   # "primary", "EX-10.1", "EX-99.1", etc.
m.match      # The matched text
m.context    # Surrounding text (~100 chars each side)

Search a Specific Document

# Only the primary filing document
filing.grep("risk factor", document="primary")

# Only a specific exhibit
filing.grep("intellectual property", document="EX-10.1")

Regex Support

# Regex for flexible matching
filing.grep(r"Level\s+3", regex=True)              # "Level 3", "Level  3"
filing.grep(r"(?:right|option) of first refusal", regex=True)

Notes.grep()

Notes.search() matches note titles. Notes.grep() searches note content — the full narrative text of each note.

tenk = filing.obj()

# Search all note content
matches = tenk.notes.grep("going concern")
for m in matches:
    print(m)
    # Note 1 - Organization:  ...conditions raise substantial doubt about going concern...

# Fair value hierarchy
matches = tenk.notes.grep("Level 3")

# Regex in notes
matches = tenk.notes.grep(r"intangible\s+asset", regex=True)

Report Object grep (TenK, TenQ, EightK)

Report objects delegate to their underlying filing:

tenk = filing.obj()

# Same as filing.grep() — searches all documents
tenk.grep("going concern")

# Narrow to primary document
tenk.grep("going concern", document="primary")

Both coexist — they serve different purposes:

grep() search()
Mode Exact match (string or regex) BM25 fuzzy ranking
Returns Every match with location + context Best sections ranked by relevance
Case Case-insensitive by default Case-insensitive
Use case "Does this filing mention 'going concern'?" "What does this filing say about debt?"
Agent use Verification, due diligence checks Exploration, topic discovery

An agent checking for "Level 3" or "right of first refusal" wants grep. A human exploring "what about debt?" wants search (it also finds "borrowings", "credit facility").

Putting It Together

A typical analytical workflow uses both tools:

from edgar import search_filings, Company

# Step 1: Find filings across EDGAR
results = search_filings('"exclusive license" "trade secret"', forms=["8-K"])
print(f"{results.total:,} filings mention these terms")

# Step 2: Triage from metadata
material = results.filter(items="1.01")       # Material agreements
high_score = results.filter(min_score=15.0)    # Strong matches

# Step 3: Check who shows up most
for a in results.aggregations.entities[:5]:
    print(f"{a.key}: {a.count} filings")

# Step 4: Deep dive on an interesting hit
filing = results[0].get_filing()
tenk = Company(results[0].cik).get_filings(form="10-K").latest(1).obj()

# Step 5: Grep the 10-K for related terms
tenk.grep("going concern")
tenk.grep("Level 3")
tenk.notes.grep("intangible asset")

API Reference

search_filings()

search_filings(
    query: str,                    # Search text (supports quoted phrases)
    *,
    forms: str | list = None,      # Form type filter: "10-K", ["8-K", "10-K"]
    cik: str | int = None,         # CIK number
    ticker: str = None,            # Ticker symbol (resolved to CIK)
    start_date: str = None,        # Filing date start (YYYY-MM-DD)
    end_date: str = None,          # Filing date end (YYYY-MM-DD)
    limit: int = 20,               # Results per page (max 100)
) -> EFTSSearch

EFTSSearch

Method Returns Description
filter(...) EFTSSearch Filter by form, sic, items, file_type, min_score, dates, state
sort_by(field) EFTSSearch Sort by "score", "filed", "company", or "sic"
head(n) EFTSSearch First n results
tail(n) EFTSSearch Last n results
sample(n) EFTSSearch Random n results
next() EFTSSearch \| None Next page from EFTS
fetch_more(n) EFTSSearch Fetch up to n more results (max 5,000)
.aggregations EFTSAggregations Faceted counts (.entities, .sics, .states, .forms)
.total int Total matches on EFTS server
.empty bool True if no results

EFTSResult

Field Type Description
accession_number str Filing accession number
form str Form type
filed str Filing date (YYYY-MM-DD)
company str Company name
cik str CIK number
score float EFTS relevance score
file_type str Document type that matched (EX-10.1, 8-K, etc.)
file_description str Human-readable document description
document_id str Specific document filename within the filing
items list[str] 8-K item numbers
sic str Primary SIC code
location str Business location
state str Business state code
get_filing() Filing Load the full Filing object

grep()

# On Filing
filing.grep(
    pattern: str,              # Text to search for
    *,
    regex: bool = False,       # Treat pattern as regex
    document: str = None,      # "primary", "EX-10.1", etc.
) -> GrepResult

# On Notes
notes.grep(
    pattern: str,
    *,
    regex: bool = False,
) -> GrepResult

# On TenK, TenQ, EightK (delegates to filing.grep)
tenk.grep(pattern, *, regex=False, document=None) -> GrepResult

GrepResult / GrepMatch

GrepResult is list-like (len(), iteration, indexing, bool()).

GrepMatch field Type Description
location str "primary", "EX-10.1", note title, etc.
match str The matched text
context str Surrounding text (~100 chars each side)