Database Concepts and Threats
RDBMS Architecture
Tables (relations): Comprises multiple attributes or fields. Each attribute corresponds to a column in the table.
Rows (records/tuples): A singular data record in a table. Each row, representing a specific item data, holds varying data but within the same structural format.
Column (fields/attributes): Contains a set of data values of a particular type. It holds one value for each row of the database (e.g., firstname, lastname, job, etc.)
Firstname | Lastname | Job |
---|---|---|
John | Doe | IT |
Row: John, Doe, IT Column: Firstname
RDBMS Threats
Inference attacks and aggregation attacks are both techniques that can compromise user privacy, especially in the context of databases or machine learning models. However, they target different aspects and work in distinct ways.
Inference Attack
An inference attack happens when an attacker is able to deduce sensitive information by observing non-sensitive data.
Setting: We're outside the Ehra-Lessien, a notable high-speed test track used primarily by the Volkswagen Group (which includes Bugatti), and note a single object: a car under a tarp.
Observations:
-
Bugatti Uniforms: A group of people are seen wearing Bugatti uniforms. This already establishes the presence and interest of Bugatti officials or engineers at this location on this particular day.
-
Car Under Tarp: There's a car under a tarp, indicating something is being concealed. The very act of concealment in such an environment suggests something new or yet-to-be-revealed.
-
Distinctive W12 Sound: There are sounds that hint at a W12 engine, which is characteristic of Bugatti. However, the sound signature doesn't match known vehicles like the Veyron or Chiron.
Inference: Based on the collective evidence, it's reasonable to infer that Bugatti might be testing a new vehicle model or variant equipped with a W12 engine. The presence of personnel in Bugatti uniforms, combined with the concealed car and the distinctive yet unfamiliar engine sound, makes a strong case for this deduction.
Aggregation Attack
An aggregation attack is when an attacker combines non-sensitive data from various sources to reveal sensitive information that was not apparent from the individual pieces of data alone.
For a contrasting example in a similar vein:
You're a car enthusiast who loves tracking new car launches and rumors. You visit various car forums and online magazines and piece together the following bits of seemingly unrelated information:
- A car journalist subtly hints at having test-driven a new high-performance vehicle but doesn’t specify the brand.
- A Bugatti supplier mentions increased orders for certain high-performance parts around the same time.
- Some tourists near Ehra-Lessien report hearing an unusual engine sound, different from known Bugatti cars.
- A local hotel has bookings from Bugatti executives and renowned car journalists simultaneously.
By aggregating these diverse pieces of information, you surmise that Bugatti is probably set to reveal a new model soon, and certain journalists might already have had a sneak peek but are under embargo.
While both mitigate inference risks, blurring involves altering specific data to make it less precise, whereas partitioning separates data into distinct segments, restricting access based on roles.
-
Other attacks:
- SQL injection
- TOC/TOU
- Backdoor
- DoS
Candidate keys
A subset of attributes that uniquely identifies a record in a table. No two records in the same table will have identical values for all attributes forming a candidate key. This aids in distinguishing people with similar names or other similar conflicts.
Imagine a table storing details of students at a university. For identification, both the student's email address and the student ID number are unique.
Students
Table:
Student_ID (CK) | Student_Email (CK) | Full_Name | Major |
---|---|---|---|
S001 | john.doe@example.com | John Doe | IT |
S002 | jane.smith@example.com | Jane Smith | Math |
S003 | bob.lee@example.com | Bob Lee | Physics |
In this table, both Student_ID
and Student_Email
can be Candidate Keys (CK) because both are unique for each student.
Primary Key: A specific key chosen from the set of candidate keys to uniquely identify records in a table. Each table possesses only one primary key, determined by the database designer.
From the previous example, let's say the university chooses Student_ID
as the preferred way to uniquely identify students because it follows a standardized format.
Each table in a database will typically have one, and only one, primary key. This is the main way records in the table are identified. The primary key's values must be unique for each record, and a record cannot have a null (empty) value for its primary key attributes.
Foreign Keys
Utilized to reinforce the relationship between two tables through referential integrity. This ensures that if one table contains a foreign key, it corresponds to an existing primary key in the other related table.
Let's assume two tables: Students
and Courses
.
Courses
Table:
Course_ID (PK) | Course_Name |
---|---|
C001 | Computer Science |
C002 | Mathematics |
C003 | Physics |
(PK) denotes the Primary Key for the Courses
table, which is Course_ID
.
Students
Table:
Student_ID (PK) | Student_Name | Enrolled_Course_ID (FK) |
---|---|---|
S001 | John Doe | C001 |
S002 | Jane Smith | C002 |
S003 | Alice Brown | C001 |
S004 | Bob White | C003 |
In the Students
table, the column Enrolled_Course_ID
is a Foreign Key (FK). It references the Primary Key of the Courses
table, establishing a connection between a student and the course they're enrolled in.
The relationship formed by this foreign key ensures that you cannot have a student enrolled in a course that doesn't exist in the Courses
table. For instance, if you tried to insert a student enrolled in a Course_ID
of C004, it wouldn't be permitted, as there is no course with the ID of C004 in the Courses
table.
Types of Storage
Memory Types
Primary (Real) Memory
- Most direct and fastest form of storage.
- Directly accessible to the CPU.
- Consists mostly of volatile RAM.
- Fastest storage available.
Virtual Memory
- Simulates additional primary memory using secondary storage.
- Potential slowdown in performance. Often referred to as "paging".
- If low on RAM, the system uses a hard disk for direct CPU addressing. This results in slower performance but avoids crashes.
Storage Access Methods
Random Access Storage
- OS can request contents from any point in the media.
- Examples: RAM and Hard Drives.
Sequential Access Storage
- Requires scanning the entire media from the start to reach a specific address.
- Unlike random access, it reads from the start. Think of it as fast-forwarding a cassette tape.
- Example: Magnetic tape.
Storage Persistence
Volatile Storage
- Loses contents when power is removed.
- Content loss risk on power outages.
- RAM is the most common example.
Non-Volatile Storage
- Maintains its contents without power.
- Examples: Magnetic/optical media and Non-Volatile RAM (NVRAM).
Miscellaneous Storage Types
Secondary Storage
- Cheaper and long-term compared to primary memory.
- Non-volatile and for long-term use.
- Examples: CD/DVD, HDD, SSD.
Virtual Storage
- Simulates secondary storage using primary storage.
- Commonly used example: A RAM disk that appears as secondary storage but is in volatile RAM. Provides fast systems for apps but lacks recovery capability.
- Useful for quick load-ins, like in eSports tournaments.