Full-Text Search & Grep
EdgarTools provides two complementary tools for searching the text content of SEC filings:
| Tool | Purpose | Scope |
|---|---|---|
search_filings() |
Find filings that mention a topic | All of EDGAR |
filing.grep() |
Find exact text within a filing | One filing's documents |
search_filings() answers "which filings talk about this?" using SEC's full-text search index.
grep() answers "where exactly does this text appear?" within a specific filing.
How this relates to other search features
- Search & Filter — find filings by metadata (form type, date, company)
- Advanced Search — BM25-ranked search within a single parsed document
- This page — search filing text content across EDGAR, and grep within filings
Full-Text Search
search_filings() queries SEC's EFTS (EDGAR Full-Text Search) index — the same engine behind
the search box on sec.gov. It searches the actual text inside filings, not just metadata.
Basic Usage
from edgar import search_filings
# Find filings mentioning artificial intelligence
results = search_filings("artificial intelligence", forms=["10-K"])
# Scoped to a company
results = search_filings("supply chain risk", ticker="AAPL")
# Date range
results = search_filings("tariff impact", forms=["8-K"], start_date="2024-01-01")
# Use quotes for phrase matching
results = search_filings('"exclusive license" "trade secret"', forms=["8-K"])
Each result includes relevance score, document type, and metadata:
r = results[0]
r.score # 21.45 — relevance from EFTS
r.form # '8-K'
r.company # 'PyroTec, Inc.'
r.filed # '2012-09-20'
r.file_type # 'EX-10.05' — which document matched
r.items # ['1.01', '2.01'] — 8-K item numbers
r.sic # '6770' — SIC code
r.location # 'Foster City, CA'
r.accession_number # '0001193125-12-400000'
Filtering Results
Filter the fetched results client-side without re-querying:
results = search_filings('"going concern"', forms=["8-K", "10-K"])
# By SIC code (e.g. shell companies)
shells = results.filter(sic="6770")
# By 8-K item number
material = results.filter(items="1.01") # Material agreements
# By relevance score
strong = results.filter(min_score=15.0)
# By document type (prefix match)
exhibits = results.filter(file_type="EX-10") # Matches EX-10.1, EX-10.05, etc.
# By date range
recent = results.filter(start_date="2024-01-01", end_date="2024-12-31")
# By state
california = results.filter(state="CA")
# Chain filters
targeted = results.filter(sic="6770").filter(items="1.01").filter(min_score=10.0)
Sort, slice, and sample:
# Sort by score (default), date, company, or SIC
by_date = results.sort_by("filed", reverse=False) # Oldest first
by_score = results.sort_by("score") # Highest relevance first
# Slice and sample
top5 = results.head(5)
last5 = results.tail(5)
random10 = results.sample(10)
# Python slicing
page = results[5:15]
Aggregations
Every search returns faceted counts — a summary of who and what matched without downloading filings:
results = search_filings('"exclusive license" "trade secret"', forms=["8-K"])
# Top entities by hit count
for a in results.aggregations.entities[:5]:
print(f"{a.key}: {a.count} filings")
# Top SIC codes
for a in results.aggregations.sics[:5]:
print(f"SIC {a.key}: {a.count} filings")
# Also available: .states, .forms
This is useful for exploratory analysis — understand the landscape before drilling into individual filings.
Pagination
search_filings() returns one page of results (default 20, max 100 per call). Paginate to get more:
# Get first 100 results
results = search_filings("cybersecurity incident", forms=["8-K"], limit=100)
print(f"{results.total:,} total matches, showing {len(results)}")
# Fetch the next page
page2 = results.next() # Returns None when exhausted
# Or fetch many more at once (up to 5,000 additional)
all_results = results.fetch_more(500) # Accumulates 500 more, rate-limited
print(f"Now have {len(all_results)} results")
Loading a Filing
Each result can load its full Filing object for deeper analysis:
r = results[0]
filing = r.get_filing() # Loads the full Filing
tenk = filing.obj() # Parse as TenK, EightK, etc.
Grep
grep() is the universal exact-match search for content within a filing. It searches all
documents (primary filing + exhibits) by default, like grep -ri on a directory.
Every AI agent has grep semantics burned into its training. Zero learning curve.
Filing.grep()
from edgar import Company
company = Company("AAPL")
filing = company.get_filings(form="10-K").latest(1)
# Search all documents in the filing
matches = filing.grep("going concern")
print(f"{len(matches)} matches found")
for m in matches:
print(m)
# primary: ...substantial doubt about the entity's ability to continue as a going concern...
# EX-99.1: ...the report includes a going concern qualification...
Each match includes:
m = matches[0]
m.location # "primary", "EX-10.1", "EX-99.1", etc.
m.match # The matched text
m.context # Surrounding text (~100 chars each side)
Search a Specific Document
# Only the primary filing document
filing.grep("risk factor", document="primary")
# Only a specific exhibit
filing.grep("intellectual property", document="EX-10.1")
Regex Support
# Regex for flexible matching
filing.grep(r"Level\s+3", regex=True) # "Level 3", "Level 3"
filing.grep(r"(?:right|option) of first refusal", regex=True)
Notes.grep()
Notes.search() matches note titles. Notes.grep() searches note content — the full
narrative text of each note.
tenk = filing.obj()
# Search all note content
matches = tenk.notes.grep("going concern")
for m in matches:
print(m)
# Note 1 - Organization: ...conditions raise substantial doubt about going concern...
# Fair value hierarchy
matches = tenk.notes.grep("Level 3")
# Regex in notes
matches = tenk.notes.grep(r"intangible\s+asset", regex=True)
Report Object grep (TenK, TenQ, EightK)
Report objects delegate to their underlying filing:
tenk = filing.obj()
# Same as filing.grep() — searches all documents
tenk.grep("going concern")
# Narrow to primary document
tenk.grep("going concern", document="primary")
grep vs search
Both coexist — they serve different purposes:
grep() |
search() |
|
|---|---|---|
| Mode | Exact match (string or regex) | BM25 fuzzy ranking |
| Returns | Every match with location + context | Best sections ranked by relevance |
| Case | Case-insensitive by default | Case-insensitive |
| Use case | "Does this filing mention 'going concern'?" | "What does this filing say about debt?" |
| Agent use | Verification, due diligence checks | Exploration, topic discovery |
An agent checking for "Level 3" or "right of first refusal" wants grep. A human exploring "what about debt?" wants search (it also finds "borrowings", "credit facility").
Putting It Together
A typical analytical workflow uses both tools:
from edgar import search_filings, Company
# Step 1: Find filings across EDGAR
results = search_filings('"exclusive license" "trade secret"', forms=["8-K"])
print(f"{results.total:,} filings mention these terms")
# Step 2: Triage from metadata
material = results.filter(items="1.01") # Material agreements
high_score = results.filter(min_score=15.0) # Strong matches
# Step 3: Check who shows up most
for a in results.aggregations.entities[:5]:
print(f"{a.key}: {a.count} filings")
# Step 4: Deep dive on an interesting hit
filing = results[0].get_filing()
tenk = Company(results[0].cik).get_filings(form="10-K").latest(1).obj()
# Step 5: Grep the 10-K for related terms
tenk.grep("going concern")
tenk.grep("Level 3")
tenk.notes.grep("intangible asset")
API Reference
search_filings()
search_filings(
query: str, # Search text (supports quoted phrases)
*,
forms: str | list = None, # Form type filter: "10-K", ["8-K", "10-K"]
cik: str | int = None, # CIK number
ticker: str = None, # Ticker symbol (resolved to CIK)
start_date: str = None, # Filing date start (YYYY-MM-DD)
end_date: str = None, # Filing date end (YYYY-MM-DD)
limit: int = 20, # Results per page (max 100)
) -> EFTSSearch
EFTSSearch
| Method | Returns | Description |
|---|---|---|
filter(...) |
EFTSSearch |
Filter by form, sic, items, file_type, min_score, dates, state |
sort_by(field) |
EFTSSearch |
Sort by "score", "filed", "company", or "sic" |
head(n) |
EFTSSearch |
First n results |
tail(n) |
EFTSSearch |
Last n results |
sample(n) |
EFTSSearch |
Random n results |
next() |
EFTSSearch \| None |
Next page from EFTS |
fetch_more(n) |
EFTSSearch |
Fetch up to n more results (max 5,000) |
.aggregations |
EFTSAggregations |
Faceted counts (.entities, .sics, .states, .forms) |
.total |
int |
Total matches on EFTS server |
.empty |
bool |
True if no results |
EFTSResult
| Field | Type | Description |
|---|---|---|
accession_number |
str |
Filing accession number |
form |
str |
Form type |
filed |
str |
Filing date (YYYY-MM-DD) |
company |
str |
Company name |
cik |
str |
CIK number |
score |
float |
EFTS relevance score |
file_type |
str |
Document type that matched (EX-10.1, 8-K, etc.) |
file_description |
str |
Human-readable document description |
document_id |
str |
Specific document filename within the filing |
items |
list[str] |
8-K item numbers |
sic |
str |
Primary SIC code |
location |
str |
Business location |
state |
str |
Business state code |
get_filing() |
Filing |
Load the full Filing object |
grep()
# On Filing
filing.grep(
pattern: str, # Text to search for
*,
regex: bool = False, # Treat pattern as regex
document: str = None, # "primary", "EX-10.1", etc.
) -> GrepResult
# On Notes
notes.grep(
pattern: str,
*,
regex: bool = False,
) -> GrepResult
# On TenK, TenQ, EightK (delegates to filing.grep)
tenk.grep(pattern, *, regex=False, document=None) -> GrepResult
GrepResult / GrepMatch
GrepResult is list-like (len(), iteration, indexing, bool()).
| GrepMatch field | Type | Description |
|---|---|---|
location |
str |
"primary", "EX-10.1", note title, etc. |
match |
str |
The matched text |
context |
str |
Surrounding text (~100 chars each side) |