The Intern Who Lived on Coffee and Typos
Let me tell you about Barry. Barry was our first intern. His primary job was to read the 100+ inbound lead emails we got every day and manually copy the contact info, company name, and project details into a spreadsheet.
Barry was a good kid, but he was human. Some days, the spreadsheet was a work of art. Other days, it looked like he’d typed it with his elbows after three espressos and a night of collegiate debauchery. Names were misspelled. Phone numbers had extra digits. Entire rows were just… missing.
Every error Barry made was a potential lead lost, a follow-up missed, a tiny crack in the foundation of our business. We weren’t mad at Barry. We were mad at the process. A process so mind-numbingly boring it could sedate a bull.
This lesson is about firing Barry. Not the person, but the *job*. We’re going to build a tireless, flawless, infinitely fast robot to do his work. A robot that never gets tired, never makes a typo, and runs 24/7 for pennies.
Why This Matters
Computers speak in structure. They love clean rows, columns, and predictable data formats like JSON. Humans, on the other hand, speak in chaos. We send long, rambling emails, messy invoices, and resumes full of fluff.
The single biggest bottleneck in business automation is translating human chaos into computer-friendly structure. This workflow isn’t just a cool party trick; it’s the fundamental building block for any serious automation.
This automation replaces:
- Manual data entry from emails, PDFs, or support tickets.
- Expensive, slow data processing teams.
- Error-prone copy-pasting that kills productivity and loses revenue.
Master this, and you’ve mastered the “intake system” for 90% of business information. You can turn a firehose of messy text into a clean, orderly database that other automations can actually use.
What This Tool / Workflow Actually Is
We are going to use Claude 3 (via its API) as a hyper-intelligent data parser. Think of it less as a creative genius and more as the world’s most diligent, rule-following bureaucrat.
What it does:
We give the AI two things: a piece of messy, unstructured text (like an email) and a very strict template (a JSON schema). The AI’s only job is to read the text and fill out our template. It finds the name, the email, the company, and puts them in the exact boxes we told it to.
What it does NOT do:
This system doesn’t “understand” your business strategy or make decisions. It won’t call the lead for you. It’s a specialized tool for one thing and one thing only: transforming unstructured text into structured data. It’s the first, most crucial gear in a much larger machine.
Prerequisites
I know some of you are allergic to code. Relax. If you can copy and paste, you can do this. I promise.
- An Anthropic API Key. This is just a password that lets your code talk to the Claude 3 AI. Go to the Anthropic Console, sign up, and find your API key in the settings. It’s free to get started, and they give you some starting credits.
- Python installed on your machine. If you don’t have it, just search “install Python” and follow the instructions from the official Python website. It’s simpler than setting up a new TV.
- The ability to stay calm. We’re writing maybe 15 lines of code. You’ve got this.
Step-by-Step Tutorial
Let’s build our robot data-entry clerk. Open up a plain text editor (like VS Code, Sublime Text, or even Notepad).
Step 1: Install the Anthropic Library
This is a one-time setup. Open your computer’s terminal or command prompt and type this command. This installs the special toolkit we need to talk to Claude.
pip install anthropic
Step 2: Create a Python File
Create a new file and save it as extractor.py. The .py part is important.
Step 3: The Code – Let’s Go
Copy and paste the following code into your extractor.py file. We’ll go through it piece by piece so you know *why* it works.
import anthropic
import json
# 1. SETUP: Put your API key here
client = anthropic.Anthropic(
api_key="YOUR_ANTHROPIC_API_KEY",
)
# 2. THE CHAOS: This is the messy email we want to parse
messy_email_text = """
Hi there,
My name is Sarah Johnson and I'm the marketing director at Innovate Corp.
We're looking for help building a new e-commerce website. Our budget is around $25,000.
You can reach me at sarah.j@innovatecorp.com to discuss further.
Best,
Sarah
"""
# 3. THE RULES: This is our strict template for the AI
json_schema = {
"type": "object",
"properties": {
"contact_name": {"type": "string", "description": "Full name of the contact person."},
"company_name": {"type": "string", "description": "Name of the company."},
"contact_email": {"type": "string", "description": "Email address of the contact."},
"project_summary": {"type": "string", "description": "A one-sentence summary of their request."},
"budget_usd": {"type": "number", "description": "The project budget, as a number."}
},
"required": ["contact_name", "contact_email", "project_summary"]
}
# 4. THE MAGIC: We send everything to Claude and demand JSON back
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
tools=[ # This is the key part for forcing structured output
{
"name": "extract_lead_data",
"description": "Extract lead information from an email body.",
"input_schema": json_schema
}
],
tool_choice={"type": "tool", "name": "extract_lead_data"}, # Force it to use our tool
messages=[
{
"role": "user",
"content": messy_email_text
}
]
)
# 5. THE RESULT: Print the clean, structured data
extracted_data = response.content[0].input
print(json.dumps(extracted_data, indent=2))
Step 4: Understanding The Pieces
#1. SETUP: This is where you paste your secret API key. Keep it safe!#2. THE CHAOS: This is our sample input. In a real system, this text would come from an email server or a web form, but for now, we’re just pasting it in.#3. THE RULES: This is the most important part. We define ajson_schema. We’re telling the AI *exactly* what fields we want (contact_name,budget_usd, etc.), what type of data each field should be (string, number), and which ones are required. This strictness is what makes the output so reliable.#4. THE MAGIC: We call the Claude 3 model. The critical parts aretoolsandtool_choice. This is a special feature designed for this exact purpose. We’re not just asking a question; we’re giving it a tool named `extract_lead_data` and forcing it to use that tool. The “input” for the tool is our schema. This is how we guarantee it replies with JSON that matches our rules.#5. THE RESULT: We grab the structured data from the response and print it out.
Step 5: Run It!
Replace "YOUR_ANTHROPIC_API_KEY" with your actual key. Save the file. Go back to your terminal, navigate to where you saved the file, and run:
python extractor.py
You should see this beautiful, clean, structured output:
{
"contact_name": "Sarah Johnson",
"company_name": "Innovate Corp",
"contact_email": "sarah.j@innovatecorp.com",
"project_summary": "Looking for help building a new e-commerce website.",
"budget_usd": 25000
}
Look at that. Chaos turned into perfect, computer-readable code. Barry is officially obsolete.
Complete Automation Example
Let’s use a slightly trickier example. Imagine you get customer support tickets. They’re often filled with emotion and irrelevant details.
The Goal: Extract the customer’s account ID, the product they’re talking about, and a category for their issue (Billing, Technical, Shipping).
Step 1: The Messy Input
Update the messy_email_text variable in your script with this:
messy_email_text = """
Hey, I am SO frustrated. My order for the 'SuperWidget 5000' still hasn't arrived. My account is user-12345. I was charged on my credit card last week but the tracking number doesn't work. Can someone PLEASE help me with this shipping problem!?
"""
Step 2: The New Rules
Update the json_schema to match our new requirements. Notice we can even tell it the *only* allowed values for `issue_category`.
json_schema = {
"type": "object",
"properties": {
"account_id": {"type": "string", "description": "The customer's unique account identifier."},
"product_name": {"type": "string", "description": "The specific product mentioned in the ticket."},
"issue_category": {
"type": "string",
"description": "The general category of the support issue.",
"enum": ["Billing", "Technical", "Shipping", "Other"]
}
},
"required": ["account_id", "issue_category"]
}
Don’t forget to update the name of the tool in the API call to something like extract_ticket_data to keep things clean!
Step 3: Run It Again
Run python extractor.py. You’ll get this:
{
"account_id": "user-12345",
"product_name": "SuperWidget 5000",
"issue_category": "Shipping"
}
The AI ignored the customer’s frustration and all the extra words. It just read the text, followed our rules, and filled out the form. Flawlessly.
Real Business Use Cases (MINIMUM 5)
This exact same pattern can be used everywhere:
- Recruiting Agency: Automatically parse resumes (as text) to extract candidate name, email, years of experience, and key skills into a candidate database.
- Law Firm: Feed in a contract clause and extract the parties involved, effective dates, and liability cap into a summary document.
- E-commerce Store: Analyze product reviews to extract the product SKU, star rating, and a summary of pros and cons mentioned by the customer.
- Financial Analyst: Pull key numbers from a quarterly earnings report transcript, like revenue, net income, and forward-looking guidance, into a spreadsheet-ready format.
- Healthcare Provider: (With HIPAA compliance) Process anonymized patient intake forms to extract symptoms, duration, and medical history into a structured preliminary report.
Common Mistakes & Gotchas
- Vague Schemas: If your schema is lazy, your output will be lazy. Don’t just say `”details”`. Be specific: `”contact_phone_number”`, `”project_deadline”`. The more specific your rules, the more reliable the result.
- Forgetting Edge Cases: What happens if the text doesn’t contain a company name? The AI might guess or fail. A good schema is planned for this. You can make fields non-required, and instruct the AI to return `null` if a value is not found.
- Ignoring Model Choice: For simple extraction, a smaller, faster model like Claude 3 Haiku might be cheaper and just as effective. For complex legal documents, the power of Opus (like we used) is worth it. Test and see what works.
- Not Handling Errors: Sometimes, the API will be down or your key will be wrong. A real production script needs a `try…except` block to catch these errors instead of just crashing. We’ll cover that in a later lesson.
How This Fits Into a Bigger Automation System
What we built today is the front door of your automation factory. It’s the worker who opens the mail and sorts it into the right piles.
The clean JSON it produces is the fuel for everything else. That JSON object can be:
- Sent to a CRM API to automatically create a new lead in HubSpot or Salesforce.
- Inserted into a Google Sheet or Airtable to build a real-time dashboard of inbound requests.
- Passed to an email sending service like Resend or SendGrid to send an automated, personalized acknowledgement.
- Used as input for a second AI agent whose job is to take the structured data and draft a response.
- Logged in a database to track business intelligence and analytics over time.
Without this first step—turning chaos into structure—none of the fancy downstream automations are possible.
What to Learn Next
Okay, Professor. You’ve turned a messy email into clean JSON that just prints to your screen. Cool. So what? It’s still trapped inside your terminal.
You’ve built the engine, but you haven’t connected it to the wheels. The real power comes when this process runs automatically, without you even touching the keyboard.
In our next lesson, we’re connecting the engine to the transmission. We will take this exact script and hook it up to a trigger. We’ll build a system that automatically reads a new email *as it arrives*, runs our Claude 3 extractor, and dumps the clean data into a Google Sheet in real-time.
By the end of the next lesson, you will have a fully functional, 24/7 lead capture system. Get ready to build the rest of the factory.
“,
“seo_tags”: “AI automation, Claude 3, structured data extraction, JSON, business process automation, python tutorial, natural language processing, API”,
“suggested_category”: “AI Automation Courses

