MIRA
Guide

How to Structure Your Data the Right Way

Good data structure is the foundation of reliable analysis. This guide covers what structured data is, what makes it useful, and the common mistakes that silently break your results.

What Is Structured Data?

Structured data is information organised into rows and columns where every value has a defined type and meaning. It lives in tables, spreadsheets, or databases — as opposed to unstructured data like emails, PDFs, or chat logs.

Each column represents one attribute — a name, a date, an amount. Each row represents one entity — a customer, an order, a transaction. When data follows this pattern consistently, any tool — from a simple spreadsheet formula to an AI query engine — can read it reliably.

A well-structured orders table
order_idcustomer_nameorder_dateamountstatus
1001Alice Johnson2025-03-15249.00shipped
1002Bob Chen2025-03-1689.50delivered
1003Carol Smith2025-03-17412.00processing
1004David Park2025-03-1867.25shipped

One fact per cell. Consistent types in every column. A stable ID for every row.

What Makes Structured Data “Good”?

Not all tables are created equal. A spreadsheet with rows and columns can still be a nightmare to analyse if the data inside doesn't follow clear rules. Here's what separates usable data from data that looks structured but isn't.

One fact per cell

Each cell should contain exactly one piece of information. No combined values like “John Smith / London” or “$500 (paid)”. If you need both a name and a city, those are two separate columns.

Consistent data types

Every value in a column should be the same type — all dates, all numbers, all text. A date column that contains “March 15”, “2025-03-16”, and “last Tuesday” will break any tool that tries to sort or filter it.

Stable, descriptive column names

Use clear names like order_date or customer_id — not “Date (Q1)”, “Column F”, or names that change between exports. Column names are the contract between your data and every tool that reads it.

One table per sheet

Each file or sheet should contain exactly one structured table starting at row 1, column A. No side-by-side tables, no floating summaries in the corner, no gaps. If you have two different datasets, they belong in two separate sheets or files.

No hidden logic

Summary rows, merged cells, colour-coded meaning, and notes embedded in data cells — these are invisible to any automated tool. If information matters, it belongs in its own column, not in formatting.

Honest missing values

When a value is missing, it should be genuinely empty — not a zero, a dash, or “N/A”. A zero in a revenue column means the customer paid nothing. An empty cell means you don't know. These are very different things.

Spot the Difference

The same data can be stored in ways that look similar but behave completely differently when you try to analyse it. Here's a side-by-side comparison.

Problematic
Name / CityDateRevenue
John Smith / LondonMarch 15$1,200
Jane Doe / NYC2025-03-16
Acme Corplast Tuesday0
TOTAL$1,200
  • Combined name and city in one column
  • Inconsistent date formats
  • Dash used instead of null for missing revenue
  • Summary row mixed into data rows
Clean
customer_namecityorder_daterevenue
John SmithLondon2025-03-151200.00
Jane DoeNew York2025-03-16
Acme Corp2025-03-180.00
  • Name and city are separate columns
  • All dates in ISO format
  • Null means unknown; 0 means zero
  • No summary rows — totals are computed, not stored

One Table Per Sheet — No Exceptions

One of the most common problems isn't bad data inside a table — it's multiple tables scattered across a single sheet. A summary block in the corner, a lookup table pasted next to the main data, or two unrelated datasets stacked with a blank row between them. It might look organised to a human, but to any automated tool it's unreadable.

Every file or sheet should contain exactly one table that starts at row 1, column A, with headers in the first row and data immediately below. If you have a separate reference table or a summary, move it to its own sheet or file.

Scattered
1
2
3
4
5
6
7
8
9
10
11
12
A
B
C
D
E
F
Orders
Totals
id
name
amount
region
sum
1001
Alice
249.00
West
338.50
1002
Bob
89.50
East
89.50
1003
Carol
412.00
Returns
order_id
reason
refund
1001
damaged
249.00
1003
wrong item
412.00
  • Multiple tables on one sheet
  • Summary block floating to the side
  • Tools can't tell where one table ends and another begins
Clean
orders
returns
totals
1
2
3
4
5
6
7
8
9
10
11
12
A
B
C
D
E
F
order_id
customer
city
date
amount
status
1001
Alice
London
2025-03-15
249.00
shipped
1002
Bob
New York
2025-03-16
89.50
delivered
1003
Carol
Berlin
2025-03-17
412.00
processing
1004
David
Tokyo
2025-03-18
67.25
shipped
1005
Emma
Sydney
2025-03-19
183.00
delivered
1006
Frank
London
2025-03-20
55.75
shipped
1007
Grace
Paris
2025-03-21
310.00
processing
1008
Henry
New York
2025-03-22
142.50
delivered
1009
Iris
Berlin
2025-03-23
98.00
shipped
1010
Jack
Tokyo
2025-03-24
225.00
processing
  • One table per sheet, starting at row 1
  • Separate datasets in separate sheets
  • Any tool can read each sheet unambiguously

Why It Matters for Analysis

Structured data isn't an academic exercise — it directly determines what your data tools can and can't do. Here's what well-structured data enables.

01

Filter, sort, and group without cleanup

When every column has a consistent type and meaning, you can immediately filter orders by date, group revenue by region, or sort customers by spend — no manual reformatting required.

02

Join tables reliably using ID columns

Stable, unique identifiers like customer_id or order_id let you connect tables together. This is how you link a customer to their orders, or an order to its line items — the foundation of any real analysis.

03

AI and query tools get accurate answers

AI tools read what's in your data, not what you meant. When your data says a customer's revenue is 0 but you meant “unknown”, the AI will include that zero in its averages. Clean structure means accurate results.

04

Errors become visible, not silent

Messy data doesn't cause obvious errors — it causes subtly wrong answers. A misformatted date gets quietly excluded from a filter. A summary row gets counted as a real transaction. You get a result that looks right but isn't.

How Tables Connect: A Simple Example

In practice, your data will live across multiple tables. The power of structured data is that these tables link together through shared ID columns — so you can ask questions that span across them.

customers
customer_idnameregion
C-201Alice JohnsonWest
C-202Bob ChenEast
customer_id
orders
order_idcustomer_idamount
1001C-201249.00
1002C-20289.50

The customer_id column exists in both tables, creating a reliable link between a customer and their orders.

A Quick Checklist

Before you connect your data to any analytics tool, run through these checks. They're simple, but they prevent the majority of data quality issues.

Every column has one meaning. If a column contains both names and addresses, split it.
Data types are consistent. Dates are all dates. Numbers are all numbers. No mixing.
Column names are stable and descriptive. They don't change between exports or contain dates/quarters.
No summary rows or merged cells. Aggregations belong in reports, not in your raw data.
Missing data is actually empty. Remove placeholder dashes, “N/A” strings, and misleading zeros.
Every table has a unique ID column. This is what lets tables link to each other and what prevents duplicate rows.