13 Mins Read
31 January 2026SQL Basics: Introduction to Databases and Queries
In today's data-driven world, the ability to manipulate and retrieve information efficiently isn't just a technical skill—it's a superpower. Whether you're analyzing customer behavior, managing inventory, or building the next great app, Structured Query Language (SQL) serves as the universal key to unlocking the treasures hidden within databases.
SQL (pronounced "sequel" or "S-Q-L") has remained the gold standard for database management since the 1970s, outlasting countless programming trends. Why? Because it's intuitive, powerful, and standardized across virtually every relational database system—from MySQL and PostgreSQL to Microsoft SQL Server and Oracle.
This comprehensive guide will transform you from a complete beginner to someone who can confidently create tables, insert data, and write complex queries. By the end, you'll understand not just how to write SQL commands, but why they work the way they do.
Author
Arthur P.
Category
PostgreSQL
What Is a Database?
Before diving into SQL syntax, let's understand the foundation: databases. A database is an organized collection of structured information stored electronically in a computer system. Think of it as a digital filing cabinet where data is stored in a way that makes it easily accessible, manageable, and updatable.
Relational vs. Non-Relational Databases
Databases generally fall into two categories:
Relational Databases (RDBMS): Store data in tables with predefined relationships. They use SQL and include systems like MySQL, PostgreSQL, and SQL Server. These are ideal for structured data with clear relationships (e.g., e-commerce platforms, banking systems).
Non-Relational Databases (NoSQL): Store data in flexible formats like JSON documents, key-value pairs, or graphs. Examples include MongoDB and Redis. These excel with unstructured data or rapid scaling needs.
For this guide, we'll focus exclusively on relational databases, as SQL is their native language.
Understanding SQL: The Language of Data
SQL was developed by IBM researchers in the early 1970s based on Edgar F. Codd's relational model. It became an ANSI (American National Standards Institute) standard in 1986 and an ISO standard in 1987.
SQL consists of several sub-languages:
DDL (Data Definition Language): Defines database structure
DML (Data Manipulation Language): Manages data within tables
DCL (Data Control Language): Controls access permissions
TCL (Transaction Control Language): Manages transactions
Setting Up Your Environment
To follow along with the examples, you'll need a database system. Here are free options for beginners:
MySQL Community Edition (Industry standard, excellent documentation)
PostgreSQL (Advanced features, strict standards compliance)
SQLite (File-based, perfect for learning, no installation required)
SQL Server Express (Microsoft's free version)
For this tutorial, we'll use standard SQL that works across all these platforms, with notes where syntax might vary slightly.
Part 1: Data Definition Language (DDL)
DDL commands define and modify database structures. Let's build a sample database for an Online Bookstore to demonstrate these concepts.
Creating a Database
First, we need a container for our tables:
CREATE DATABASE BookstoreDB;
-- Switch to the new database (syntax varies by system)
USE BookstoreDB; -- MySQL/SQL Server
-- OR
-- \c BookstoreDB -- PostgreSQLCreating Tables
Tables are the heart of relational databases. Each table represents an entity (like a book or customer), and columns represent attributes.
-- Authors table
CREATE TABLE Authors (
author_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
birth_date DATE
);
-- Books table
CREATE TABLE Books (
book_id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(200) NOT NULL,
isbn VARCHAR(13) UNIQUE NOT NULL,
price DECIMAL(10, 2) CHECK (price > 0),
publication_year INT,
author_id INT,
stock_quantity INT DEFAULT 0,
FOREIGN KEY (author_id) REFERENCES Authors(author_id)
);
-- Customers table
CREATE TABLE Customers (
customer_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
phone VARCHAR(20),
registration_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Orders table
CREATE TABLE Orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT,
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
total_amount DECIMAL(10, 2),
status VARCHAR(20) DEFAULT 'Pending',
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);Key Constraints Explained:
PRIMARY KEY: Unique identifier for each row (cannot be null, must be unique)
FOREIGN KEY: Establishes relationships between tables
NOT NULL: Column must have a value
UNIQUE: No duplicate values allowed
DEFAULT: Sets a default value if none is provided
CHECK: Ensures data meets specific criteria (e.g., price must be positive)
Modifying Tables
Business requirements change, and so must your database structure:
-- Adding a new column
ALTER TABLE Books
ADD COLUMN genre VARCHAR(50);
-- Modifying column data type
ALTER TABLE Books
MODIFY COLUMN price DECIMAL(12, 2); -- MySQL syntax
-- OR
ALTER TABLE Books
ALTER COLUMN price TYPE DECIMAL(12, 2); -- PostgreSQL syntax
-- Adding a constraint
ALTER TABLE Customers
ADD CONSTRAINT chk_email CHECK (email LIKE '%@%');
-- Removing a column
ALTER TABLE Books
DROP COLUMN genre;
-- Renaming a table
RENAME TABLE OldTableName TO NewTableName; -- MySQL
-- OR
ALTER TABLE OldTableName RENAME TO NewTableName; -- PostgreSQLRemoving Objects
Use these with extreme caution in production environments:
-- Delete a table (removes structure and data permanently)
DROP TABLE IF EXISTS TempData;
-- Delete entire database
DROP DATABASE IF EXISTS TestDB;Part 2: Data Manipulation Language (DML)
Now that we have a structure, let's populate it with data and learn to manipulate it.
INSERT: Adding Data
-- Inserting single row
INSERT INTO Authors (first_name, last_name, email, birth_date)
VALUES ('George', 'Orwell', 'gorwell@example.com', '1903-06-25');
-- Inserting multiple rows (more efficient)
INSERT INTO Authors (first_name, last_name, email, birth_date) VALUES
('J.K.', 'Rowling', 'jk.rowling@example.com', '1965-07-31'),
('Stephen', 'King', 'sking@example.com', '1947-09-21'),
('Agatha', 'Christie', 'agatha@example.com', '1890-09-15');
-- Inserting with SELECT (copying data from another table)
INSERT INTO ArchivedBooks (book_id, title, author_id)
SELECT book_id, title, author_id
FROM Books
WHERE publication_year < 1950;
-- Inserting specific columns (others get default values)
INSERT INTO Customers (first_name, last_name, email)
VALUES ('John', 'Doe', 'john.doe@email.com');
-- registration_date will automatically use CURRENT_TIMESTAMPUPDATE: Modifying Existing Data
-- Simple update
UPDATE Books
SET price = 15.99
WHERE book_id = 1;
-- Updating multiple columns
UPDATE Books
SET price = 12.99, stock_quantity = 50
WHERE title = '1984';
-- Calculated updates
UPDATE Books
SET price = price * 1.10 -- 10% price increase
WHERE publication_year < 2020;
-- BE CAREFUL: Without WHERE, updates affect ALL rows
UPDATE Books
SET stock_quantity = 0; -- This empties your entire inventory!Critical Safety Tip: Always write your WHERE clause first when drafting UPDATE statements to avoid accidental mass updates.
DELETE: Removing Data
-- Delete specific records
DELETE FROM Orders
WHERE status = 'Cancelled'
AND order_date < DATE_SUB(CURRENT_DATE, INTERVAL 1 YEAR); -- MySQL
-- OR
AND order_date < CURRENT_DATE - INTERVAL '1 year'; -- PostgreSQL
-- Delete all records (keeps table structure)
DELETE FROM TempLogs;
-- Truncate (faster delete, cannot be rolled back in some systems, resets auto-increment)
TRUNCATE TABLE TempLogs;Part 3: The SELECT Statement (Querying Data)
SELECT is the most used SQL command, allowing you to retrieve data with precision. This is where SQL truly shines.
Basic SELECT Syntax
-- Select all columns (convenient but inefficient for large tables)
SELECT * FROM Authors;
-- Select specific columns (better performance)
SELECT first_name, last_name, email
FROM Authors;
-- Aliasing columns (renaming for display)
SELECT
first_name AS "First Name",
last_name AS "Last Name",
email AS "Email Address"
FROM Authors;
-- Eliminating duplicates
SELECT DISTINCT publication_year
FROM Books
ORDER BY publication_year;Filtering with WHERE Clause
-- Comparison operators: =, != (or <>), >, <, >=, <=
SELECT * FROM Books
WHERE price > 20.00;
-- Logical operators: AND, OR, NOT
SELECT * FROM Books
WHERE price > 15.00
AND stock_quantity > 10;
-- Pattern matching with LIKE
SELECT * FROM Authors
WHERE last_name LIKE 'K%'; -- Starts with K
SELECT * FROM Books
WHERE title LIKE '%Harry Potter%'; -- Contains phrase
-- Range queries
SELECT * FROM Books
WHERE publication_year BETWEEN 1990 AND 2000;
-- Set membership
SELECT * FROM Books
WHERE genre IN ('Mystery', 'Thriller', 'Horror');
-- NULL checking
SELECT * FROM Customers
WHERE phone IS NULL;
-- Negation
SELECT * FROM Books
WHERE genre NOT IN ('Romance', 'Poetry');Sorting Results
-- Single column sort (ascending is default)
SELECT * FROM Books
ORDER BY price;
-- Explicit direction
SELECT * FROM Books
ORDER BY publication_year DESC;
-- Multi-level sorting
SELECT * FROM Books
ORDER BY author_id ASC, publication_year DESC;
-- Sorting by column position (avoid in production code)
SELECT title, price, stock_quantity
FROM Books
ORDER BY 2 DESC; -- Sorts by price (second column)Aggregation and Grouping
Aggregate functions perform calculations on multiple rows:
-- Basic aggregates
SELECT
COUNT(*) AS total_books,
AVG(price) AS average_price,
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
SUM(stock_quantity) AS total_inventory
FROM Books;
-- Grouping data
SELECT
author_id,
COUNT(*) AS books_written,
AVG(price) AS avg_book_price
FROM Books
GROUP BY author_id;
-- Filtering groups with HAVING (WHERE filters rows, HAVING filters groups)
SELECT
author_id,
COUNT(*) AS book_count
FROM Books
GROUP BY author_id
HAVING COUNT(*) > 3; -- Only authors with more than 3 books
-- Complex grouping
SELECT
publication_year,
genre,
COUNT(*) AS count,
AVG(price) AS avg_price
FROM Books
WHERE publication_year > 2000
GROUP BY publication_year, genre
HAVING COUNT(*) >= 2
ORDER BY publication_year DESC, avg_price;Part 4: Joining Tables (Relational Power)
The true power of relational databases lies in connecting tables. JOIN operations combine rows from two or more tables based on related columns.
Types of JOINs
Let's populate our tables first:
-- Insert sample books referencing authors
INSERT INTO Books (title, isbn, price, publication_year, author_id, stock_quantity, genre) VALUES
('1984', '9780451524935', 15.99, 1949, 1, 45, 'Dystopian'),
('Animal Farm', '9780451526342', 12.99, 1945, 1, 30, 'Political Satire'),
('Harry Potter and the Sorcerer''s Stone', '9780590353427', 24.99, 1997, 2, 100, 'Fantasy'),
('Harry Potter and the Chamber of Secrets', '9780439064873', 24.99, 1998, 2, 85, 'Fantasy'),
('The Shining', '9780307743657', 18.99, 1977, 3, 25, 'Horror'),
('Murder on the Orient Express', '9780062693662', 14.99, 1934, 4, 40, 'Mystery');
-- Insert customers and orders
INSERT INTO Customers (first_name, last_name, email, phone) VALUES
('Alice', 'Johnson', 'alice@email.com', '555-0101'),
('Bob', 'Smith', 'bob@email.com', '555-0102'),
('Carol', 'White', 'carol@email.com', NULL);
INSERT INTO Orders (customer_id, total_amount, status) VALUES
(1, 40.98, 'Completed'),
(1, 15.99, 'Pending'),
(2, 24.99, 'Completed'),
(3, 63.97, 'Shipped');INNER JOIN (Most Common)
Returns only rows where there is a match in both tables:
-- Books with their authors
SELECT
b.title,
b.price,
a.first_name,
a.last_name
FROM Books b
INNER JOIN Authors a ON b.author_id = a.author_id;
-- Alternative syntax (older standard)
SELECT
b.title,
a.first_name
FROM Books b, Authors a
WHERE b.author_id = a.author_id;LEFT JOIN (LEFT OUTER JOIN)
Returns all records from the left table and matched records from the right. Unmatched right-side records show NULL:
-- All authors, even those without books in our inventory
SELECT
a.first_name,
a.last_name,
b.title
FROM Authors a
LEFT JOIN Books b ON a.author_id = b.author_id;
-- Finding authors with no books (anti-join pattern)
SELECT
a.first_name,
a.last_name
FROM Authors a
LEFT JOIN Books b ON a.author_id = b.author_id
WHERE b.book_id IS NULL;RIGHT JOIN and FULL JOIN
-- RIGHT JOIN: All books, even if author missing (data integrity check)
SELECT
b.title,
a.first_name
FROM Authors a
RIGHT JOIN Books b ON a.author_id = b.author_id;
-- FULL OUTER JOIN: All records from both (MySQL doesn't support directly)
-- Workaround in MySQL:
SELECT
a.first_name,
b.title
FROM Authors a
LEFT JOIN Books b ON a.author_id = b.author_id
UNION
SELECT
a.first_name,
b.title
FROM Authors a
RIGHT JOIN Books b ON a.author_id = b.author_id;Self JOIN and Multiple JOINs
-- Multiple joins: Orders with customer and book details
SELECT
o.order_id,
c.first_name AS customer_name,
b.title AS book_ordered,
o.total_amount
FROM Orders o
JOIN Customers c ON o.customer_id = c.customer_id
-- Assuming we had an OrderDetails table linking to Books
-- JOIN OrderDetails od ON o.order_id = od.order_id
-- JOIN Books b ON od.book_id = b.book_id
;
-- Self JOIN: Comparing books by same author
SELECT
b1.title AS book1,
b2.title AS book2,
b1.price AS price1,
b2.price AS price2
FROM Books b1
JOIN Books b2 ON b1.author_id = b2.author_id
WHERE b1.book_id < b2.book_id; -- Avoid duplicates and self-matchingPart 5: Advanced Query Techniques
Subqueries (Nested Queries)
-- Subquery in WHERE
SELECT title, price
FROM Books
WHERE price > (SELECT AVG(price) FROM Books);
-- Subquery in FROM (Derived table)
SELECT
genre,
avg_price
FROM (
SELECT genre, AVG(price) AS avg_price
FROM Books
GROUP BY genre
) AS genre_averages
WHERE avg_price > 20;
-- Correlated subquery (runs once for each row)
SELECT
b.title,
b.price,
(SELECT COUNT(*)
FROM Books b2
WHERE b2.author_id = b.author_id) AS author_book_count
FROM Books b;
-- EXISTS operator
SELECT
a.first_name,
a.last_name
FROM Authors a
WHERE EXISTS (
SELECT 1
FROM Books b
WHERE b.author_id = a.author_id
AND b.price > 25
);Common Table Expressions (CTEs)
CTEs make complex queries more readable:
WITH ExpensiveBooks AS (
SELECT * FROM Books WHERE price > 20
),
AuthorStats AS (
SELECT
author_id,
COUNT(*) AS book_count,
AVG(price) AS avg_price
FROM Books
GROUP BY author_id
)
SELECT
eb.title,
eb.price,
eb.genre,
ast.book_count
FROM ExpensiveBooks eb
JOIN AuthorStats ast ON eb.author_id = ast.author_id;Window Functions (Advanced Analytics)
-- Ranking books by price within each genre
SELECT
title,
genre,
price,
RANK() OVER (PARTITION BY genre ORDER BY price DESC) AS price_rank_in_genre,
AVG(price) OVER (PARTITION BY genre) AS genre_average,
price - AVG(price) OVER (PARTITION BY genre) AS diff_from_genre_avg
FROM Books;
-- Running totals
SELECT
order_id,
customer_id,
total_amount,
SUM(total_amount) OVER (ORDER BY order_date) AS running_total,
SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS customer_running_total
FROM Orders;Part 6: Data Integrity and Optimization
Constraints for Data Quality
-- Adding constraints after table creation
ALTER TABLE Books
ADD CONSTRAINT chk_positive_price
CHECK (price > 0);
-- Composite primary key
CREATE TABLE OrderDetails (
order_id INT,
book_id INT,
quantity INT,
price_at_purchase DECIMAL(10,2),
PRIMARY KEY (order_id, book_id),
FOREIGN KEY (order_id) REFERENCES Orders(order_id),
FOREIGN KEY (book_id) REFERENCES Books(book_id)
);
-- Cascade operations
CREATE TABLE Reviews (
review_id INT PRIMARY KEY,
book_id INT,
rating INT CHECK (rating BETWEEN 1 AND 5),
review_text TEXT,
FOREIGN KEY (book_id) REFERENCES Books(book_id)
ON DELETE CASCADE -- Deletes reviews if book is deleted
ON UPDATE CASCADE -- Updates book_id in reviews if changed in Books
);Indexes for Performance
Indexes speed up queries but slow down writes (INSERT/UPDATE/DELETE):
-- Creating indexes
CREATE INDEX idx_books_author ON Books(author_id);
CREATE INDEX idx_books_price ON Books(price);
CREATE UNIQUE INDEX idx_email ON Customers(email);
-- Composite index
CREATE INDEX idx_name ON Customers(last_name, first_name);
-- Viewing query execution plan (check if indexes are used)
EXPLAIN SELECT * FROM Books WHERE author_id = 2;
-- (Syntax varies: EXPLAIN in MySQL, EXPLAIN ANALYZE in PostgreSQL)SQL Best Practices for Beginners
Always use aliases for complex queries to improve readability
Format your code with consistent indentation and capitalization
**Never use SELECT *** in production; specify only needed columns
Use parameterized queries in applications to prevent SQL injection
Test destructive commands (UPDATE/DELETE) with SELECT first
Comment your code using -- for single lines or /* */ for blocks
Normalize your database to reduce redundancy (3NF is usually sufficient)
Backup before major changes—always
Sample Commented Query
/*
Query: Monthly Sales Report
Author: Database Team
Date: 2024
Description: Calculates revenue by month for completed orders
*/
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month, -- MySQL date formatting
COUNT(*) AS order_count,
SUM(total_amount) AS revenue,
AVG(total_amount) AS avg_order_value
FROM Orders
WHERE status = 'Completed' -- Only count finished orders
AND order_date >= '2024-01-01' -- Current year only
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;Conclusion
SQL remains the cornerstone of data management four decades after its creation because it strikes the perfect balance between simplicity and power. From basic data retrieval to complex analytical queries, the concepts covered in this guide—DDL for structure, DML for manipulation, JOINs for relationships, and window functions for analytics—provide the foundation for any data-related career.
Remember that SQL is a declarative language: you tell the database what you want, not how to get it. This abstraction allows you to focus on business logic while the database engine optimizes execution.
Next Steps:
Practice with real datasets (GitHub has excellent sample databases)
Learn your specific RDBMS's extensions (PL/pgSQL for PostgreSQL, T-SQL for SQL Server)
Explore ORMs (Object-Relational Mappers) that generate SQL programmatically
Study database design principles and normalization forms
The journey from SQL basics to mastery is a marathon, not a sprint. Start with simple SELECT statements, gradually incorporate JOINs and subqueries, and before you know it, you'll be optimizing queries and designing schemas like a seasoned database administrator.