Skip to main content

Database Concepts and Threats

RDBMS Architecture

Tables (relations): Comprises multiple attributes or fields. Each attribute corresponds to a column in the table.

Rows (records/tuples): A singular data record in a table. Each row, representing a specific item data, holds varying data but within the same structural format.

Column (fields/attributes): Contains a set of data values of a particular type. It holds one value for each row of the database (e.g., firstname, lastname, job, etc.)

Firstname Lastname Job
John Doe IT

Row: John, Doe, IT Column: Firstname

RDBMS Threats

Inference attacks and aggregation attacks are both techniques that can compromise user privacy, especially in the context of databases or machine learning models. However, they target different aspects and work in distinct ways.

Inference Attack

An inference attack happens when an attacker is able to deduce sensitive information by observing non-sensitive data.

Setting: We're outside the Ehra-Lessien, a notable high-speed test track used primarily by the Volkswagen Group (which includes Bugatti), and note a single object: a car under a tarp. 

Observations:

  1. Bugatti Uniforms: A group of people are seen wearing Bugatti uniforms. This already establishes the presence and interest of Bugatti officials or engineers at this location on this particular day.

  2. Car Under Tarp: There's a car under a tarp, indicating something is being concealed. The very act of concealment in such an environment suggests something new or yet-to-be-revealed.

  3. Distinctive W12 Sound: There are sounds that hint at a W12 engine, which is characteristic of Bugatti. However, the sound signature doesn't match known vehicles like the Veyron or Chiron.

Inference: Based on the collective evidence, it's reasonable to infer that Bugatti might be testing a new vehicle model or variant equipped with a W12 engine. The presence of personnel in Bugatti uniforms, combined with the concealed car and the distinctive yet unfamiliar engine sound, makes a strong case for this deduction.

Aggregation Attack

An aggregation attack is when an attacker combines non-sensitive data from various sources to reveal sensitive information that was not apparent from the individual pieces of data alone.

For a contrasting example in a similar vein:

You're a car enthusiast who loves tracking new car launches and rumors. You visit various car forums and online magazines and piece together the following bits of seemingly unrelated information:

  1. A car journalist subtly hints at having test-driven a new high-performance vehicle but doesn’t specify the brand.
  2. A Bugatti supplier mentions increased orders for certain high-performance parts around the same time.
  3. Some tourists near Ehra-Lessien report hearing an unusual engine sound, different from known Bugatti cars.
  4. A local hotel has bookings from Bugatti executives and renowned car journalists simultaneously.

By aggregating these diverse pieces of information, you surmise that Bugatti is probably set to reveal a new model soon, and certain journalists might already have had a sneak peek but are under embargo.


While both mitigate inference risks, blurring involves altering specific data to make it less precise, whereas partitioning separates data into distinct segments, restricting access based on roles.

  1. Other attacks:

    • SQL injection
    • TOC/TOU
    • Backdoor
    • DoS

Candidate keys

A subset of attributes that uniquely identifies a record in a table. No two records in the same table will have identical values for all attributes forming a candidate key. This aids in distinguishing people with similar names or other similar conflicts.

Imagine a table storing details of students at a university. For identification, both the student's email address and the student ID number are unique.

Students Table:

Student_ID (CK) Student_Email (CK) Full_Name Major
S001 john.doe@example.com John Doe IT
S002 jane.smith@example.com Jane Smith Math
S003 bob.lee@example.com Bob Lee Physics

In this table, both Student_ID and Student_Email can be Candidate Keys (CK) because both are unique for each student.


Primary Key: A specific key chosen from the set of candidate keys to uniquely identify records in a table. Each table possesses only one primary key, determined by the database designer.

From the previous example, let's say the university chooses Student_ID as the preferred way to uniquely identify students because it follows a standardized format.

Each table in a database will typically have one, and only one, primary key. This is the main way records in the table are identified. The primary key's values must be unique for each record, and a record cannot have a null (empty) value for its primary key attributes.

Foreign Keys

Utilized to reinforce the relationship between two tables through referential integrity. This ensures that if one table contains a foreign key, it corresponds to an existing primary key in the other related table.

Let's assume two tables: Students and Courses.

Courses Table:

Course_ID (PK) Course_Name
C001 Computer Science
C002 Mathematics
C003 Physics

(PK) denotes the Primary Key for the Courses table, which is Course_ID.

Students Table:

Student_ID (PK) Student_Name Enrolled_Course_ID (FK)
S001 John Doe C001
S002 Jane Smith C002
S003 Alice Brown C001
S004 Bob White C003

In the Students table, the column Enrolled_Course_ID is a Foreign Key (FK). It references the Primary Key of the Courses table, establishing a connection between a student and the course they're enrolled in.

The relationship formed by this foreign key ensures that you cannot have a student enrolled in a course that doesn't exist in the Courses table. For instance, if you tried to insert a student enrolled in a Course_ID of C004, it wouldn't be permitted, as there is no course with the ID of C004 in the Courses table.


Types of Storage

Memory Types

Primary (Real) Memory

  • Most direct and fastest form of storage.
  • Directly accessible to the CPU.
  • Consists mostly of volatile RAM.
  • Fastest storage available.

Virtual Memory

  • Simulates additional primary memory using secondary storage.
  • Potential slowdown in performance. Often referred to as "paging".
  • If low on RAM, the system uses a hard disk for direct CPU addressing. This results in slower performance but avoids crashes.

Storage Access Methods

Random Access Storage

  • OS can request contents from any point in the media.
  • Examples: RAM and Hard Drives.

Sequential Access Storage

  • Requires scanning the entire media from the start to reach a specific address.
  • Unlike random access, it reads from the start. Think of it as fast-forwarding a cassette tape.
  • Example: Magnetic tape.

Storage Persistence

Volatile Storage

  • Loses contents when power is removed.
  • Content loss risk on power outages.
  • RAM is the most common example.

Non-Volatile Storage

  • Maintains its contents without power.
  • Examples: Magnetic/optical media and Non-Volatile RAM (NVRAM).

Miscellaneous Storage Types

Secondary Storage

  • Cheaper and long-term compared to primary memory.
  • Non-volatile and for long-term use.
  • Examples: CD/DVD, HDD, SSD.

Virtual Storage

  • Simulates secondary storage using primary storage.
  • Commonly used example: A RAM disk that appears as secondary storage but is in volatile RAM. Provides fast systems for apps but lacks recovery capability.
  • Useful for quick load-ins, like in eSports tournaments.