image 159

AI Data Extraction: A Guide to Claude’s Tool Use API

The Intern Who Cried “Copy-Paste”

I once had a client, a small e-commerce shop, drowning in success. Their problem? Every order confirmation, customer query, and supplier update was a unique, unstructured email. The founder, let’s call him Dave, had hired an intern specifically to read these emails and manually type the details into a spreadsheet. Name, order number, issue, urgency… you get the picture.

After two weeks, the intern quit. Not with a dramatic farewell, but with a quiet, defeated Slack message: “My soul can no longer handle the copy-paste.”

Dave’s spreadsheet was a disaster. Typos, missed emails, order numbers in the ‘name’ column. It was pure chaos, costing him time, money, and sanity. He was running a business, not a data entry summer camp. This, my friends, is the quiet nightmare that every growing business faces: the unstructured data apocalypse.

Why This Matters

This isn’t just about saving an intern’s soul. Manual data entry is the termite in the foundation of your business. It’s slow, expensive, and riddled with human error. Every mistake costs you something: a delayed order, an angry customer, a missed sales lead.

The workflow we’re building today replaces that chaos. It’s a precision robot that reads any piece of messy text—an email, a PDF transcript, a customer review—and instantly pulls out the exact information you need, perfectly structured every single time. This isn’t an upgrade; it’s a paradigm shift. You go from a system that breaks under pressure to one that scales effortlessly. You’re not hiring more people to do boring work; you’re building an automated data pipeline.

What This Tool / Workflow Actually Is

We’re using a feature in Anthropic’s Claude 3 API called Tool Use. Now, don’t let the name confuse you. We aren’t giving the AI a hammer and nails.

What it is: Tool Use is a way to force Claude to respond in a perfectly structured format that you define. Think of it like handing an assistant a very specific form and saying, “Read this 10-page report and fill out ONLY the fields on this form. Nothing else. Just the facts, in these exact boxes.” The “form” is a schema we define, and the “filled-out form” is a clean JSON object a machine can read.

What it is NOT: It is NOT the AI autonomously deciding to run code or use external software. It doesn’t *execute* anything. It simply packages the data so another part of your system *can* execute something. It’s the ultimate data prepper.

Prerequisites

I know some of you are allergic to code. Don’t worry, this is pure copy-paste territory. Here’s what you’ll need.

  1. An Anthropic API Key: Go to the Anthropic website, create an account, and get your API key. You get some free credits to start, which is more than enough for this lesson.
  2. A way to run Python: If you don’t have Python set up, don’t panic. You can use a free online tool like Replit. Just create an account and start a new Python project.
  3. Five minutes of focused attention: Turn off your notifications. Close the 17 browser tabs you have open. Let’s build something real.

That’s it. If you can copy text from a webpage, you can do this.

Step-by-Step Tutorial

We’re going to build a system that reads a messy customer support email and extracts the key details into a structured format.

Step 1: Set Up Your Environment

First, we need to install the official Anthropic Python library. Open your terminal or the shell in Replit and run this command:

pip install anthropic

Next, you need to set your API key. The best way is as an environment variable so you don’t accidentally paste it into your code. In your terminal/shell, do this (replace `YOUR_API_KEY` with your actual key):

export ANTHROPIC_API_KEY="YOUR_API_KEY"

If you’re using Replit, look for a ‘Secrets’ tab on the left. You can add it there. The key will be `ANTHROPIC_API_KEY` and the value will be your key.

Step 2: Define Your Tool (The Data ‘Form’)

This is the magic step. We tell Claude exactly what our “form” looks like. We want to extract a user’s name, email, urgency level, and a summary. We define this using a JSON schema.

In your Python script, create a variable for this tool definition. Don’t be intimidated by the syntax; just notice how it clearly defines the fields we want.

tool_definition = {
    "name": "extract_customer_info",
    "description": "Extracts key information from a customer email.",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {
                "type": "string",
                "description": "The full name of the customer."
            },
            "email": {
                "type": "string",
                "description": "The customer's email address."
            },
            "urgency": {
                "type": "string",
                "description": "The urgency of the issue, categorized as Low, Medium, or High.",
                "enum": ["Low", "Medium", "High"]
            },
            "summary": {
                "type": "string",
                "description": "A one-sentence summary of the customer's problem."
            }
        },
        "required": ["name", "email", "urgency", "summary"]
    }
}

Why this works: The `name` and `description` tell the AI the tool’s purpose. The `properties` block defines our desired fields. The `description` inside each property is CRITICAL—it’s your instruction to the AI on *how* to fill that field. The `required` array tells the AI it MUST return all these fields.

Step 3: Craft the Prompt and Make the API Call

Now we write the Python code to send the messy email text to Claude, along with our tool definition. This code tells the AI: “Read this text, and use the `extract_customer_info` tool to structure your answer.”

import anthropic
import json

# This assumes you set your API key as an environment variable
client = anthropic.Anthropic()

# The messy email we want to process
messy_email_text = """
Hi team,

My name is Sarah Connor and my server is on fire, literally. Our entire billing system is down. THIS IS URGENT. Please help us immediately. My contact is sarah.c@sky.net.

Thanks,
Sarah
"""

# The actual API call
response = client.messages.create(
    model="claude-3-opus-20240229", # Opus is best for complex tool use
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": f"Please extract the information from this email: {messy_email_text}"
        }
    ],
    tools=[tool_definition] # Here's where we pass our 'form'
)

print("API Response:")
print(response)
Step 4: Parse the Output

If you run the code above, you’ll get a big, complicated response object. We don’t care about most of it. The gold is hidden inside a `tool_calls` block. The final step is to extract it.

Let’s add a few lines to our script to find and print only the structured data we care about.

# Find the tool call in the response
tool_call = None
for content_block in response.content:
    if content_block.type == 'tool_use':
        tool_call = content_block
        break

if tool_call:
    extracted_data = tool_call.input
    print("\
✅ Extracted Structured Data:")
    print(json.dumps(extracted_data, indent=2))
else:
    print("\
❌ No tool call found in the response.")
Complete Automation Example

Let’s put it all together. Here is the complete, final script you can run. It takes the messy email about Sarah Connor’s fiery server and outputs perfect JSON.

import anthropic
import json

# 1. Initialize the client (assumes ANTHROPIC_API_KEY is set)
client = anthropic.Anthropic()

# 2. Define the 'form' (our tool schema)
tool_definition = {
    "name": "extract_customer_info",
    "description": "Extracts key information from a customer email.",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {
                "type": "string",
                "description": "The full name of the customer."
            },
            "email": {
                "type": "string",
                "description": "The customer's email address."
            },
            "urgency": {
                "type": "string",
                "description": "The urgency of the issue, categorized as Low, Medium, or High.",
                "enum": ["Low", "Medium", "High"]
            },
            "summary": {
                "type": "string",
                "description": "A one-sentence summary of the customer's problem."
            }
        },
        "required": ["name", "email", "urgency", "summary"]
    }
}

# 3. The messy, unstructured input text
messy_email_text = """
Hi team,

My name is Sarah Connor and my server is on fire, literally. Our entire billing system is down. THIS IS URGENT. Please help us immediately. My contact is sarah.c@sky.net.

Thanks,
Sarah
"""

# 4. Make the API call, passing the text and the tool definition
print("Calling Claude API...")
response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": f"Please extract the information from this email: {messy_email_text}"
        }
    ],
    tools=[tool_definition]
)

# 5. Find and parse the structured output
tool_call = None
for content_block in response.content:
    if content_block.type == 'tool_use':
        tool_call = content_block
        break

if tool_call:
    extracted_data = tool_call.input
    print("\
✅ Extracted Structured Data:")
    # Pretty-print the JSON output
    print(json.dumps(extracted_data, indent=2))
else:
    print("\
❌ No tool call found in the response.")

Expected Output:

✅ Extracted Structured Data:
{
  "name": "Sarah Connor",
  "email": "sarah.c@sky.net",
  "urgency": "High",
  "summary": "The customer's server is on fire and their entire billing system is down."
}

Look at that. Perfect, clean, machine-readable data. Ready to be sent to a CRM, a ticketing system, or a database. No intern required.

Real Business Use Cases
  1. E-commerce Order Issues: A clothing store gets emails like “My order #12345 hasn’t arrived.” The automation extracts `order_number`, `customer_name`, and `issue_type` (e.g., ‘shipping’, ‘refund’) to automatically create a support ticket.
  2. Financial Document Processing: A wealth management firm receives PDF bank statements. After an OCR tool converts the PDF to text, this automation extracts `account_number`, `statement_period`, `total_deposits`, and `total_withdrawals` for analysis.
  3. Sales Lead Qualification: A SaaS company gets leads from a ‘Contact Us’ form that just has one big text box. The automation extracts `company_name`, `contact_person`, `company_size`, and `pain_point` to route the lead to the correct sales rep.
  4. Product Feedback Analysis: A software company scrapes reviews from G2 or Capterra. This automation processes each review to pull out `feature_requested`, `sentiment` (Positive/Negative/Neutral), and `user_role` (e.g., ‘Admin’, ‘End-User’).
  5. Legal Contract Review: A law firm needs to quickly analyze contracts. This automation can extract key clauses like `effective_date`, `termination_clause`, `governing_law`, and `liability_limit` for faster initial review.
Common Mistakes & Gotchas
  • Being too vague in your schema: The descriptions in your `input_schema` are your instructions to the AI. `”description”: “The user’s name”` is good. `”description”: “name”` is bad. Be specific.
  • Using the wrong model: Forcing structured output is a complex task. Claude 3 Opus is your best bet. Haiku or Sonnet might work for very simple schemas, but Opus is more reliable.
  • Not handling the ‘no data’ case: What if an email doesn’t contain a name? The AI might fail to fill the field. Your downstream code needs to be able to handle missing data or an empty response.
  • Forgetting the `required` field: If you don’t specify which fields are `required`, the AI might skip them if it feels like it. Always define what’s non-negotiable.
  • Trying to extract everything at once: Don’t create a tool with 50 fields. It’s better to have multiple, smaller, more focused tools. One for customer info, another for order details, etc.
How This Fits Into a Bigger Automation System

This tool is a foundational building block. It’s the ‘translation layer’ that converts the messy, unpredictable human world into the clean, structured language that machines understand. Think of it as the loading dock for your data factory.

  • Input: The unstructured text can come from anywhere. An email automation in Make/Zapier, a webhook from a form, a transcription from a voice agent, or the output of a web scraper.
  • Processing: This Claude workflow is the ‘refinery’. It takes the raw material and turns it into pure, valuable data (JSON).
  • Output: The structured JSON can then be sent anywhere. Add a new row in Airtable, create a new lead in HubSpot, send a Slack notification to the support team, or even pass it to *another* AI agent for the next step in a process. This is the first link in a powerful chain.
What to Learn Next

Fantastic. You’ve officially built a robot that can read and understand text better than a bored intern. You’ve turned unstructured chaos into structured clarity.

But now what? You have this perfect little JSON object. Are you just going to stare at it?

In the next lesson in this course, we’re going to take this structured data and build a ‘Router Agent’. This new agent will look at the JSON and make a decision. If `”urgency”: “High”`, it will page an on-call engineer. If `”urgency”: “Low”`, it will create a ticket in a backlog. We’re moving from just *structuring* information to *acting* on it intelligently.

You’ve built the eyes and ears. Next, we build the brain.

“,
“seo_tags”: “Claude API, AI Data Extraction, Structured Data, JSON, Business Automation, Python, Anthropic, AI for Beginners, Tool Use”,
“suggested_category”: “AI Automation Courses

Leave a Comment

Your email address will not be published. Required fields are marked *