Table of Contents

MongoDB

Intro

Definition: A database management system (DBMS) that uses a document-oriented database model. It is document oriented database which is used to high availability, easy scalability and high performance. It supports the dynamic schema design.

Key features:

ACID

Historically MongoDB does not support default multi-document ACID transactions (multiple-document updates that can be rolled back and are ACID-compliant). However, MongoDB provides atomic operation on a single document. MongoDB 4.0 will add support for multi-document transactions, making it the only database to combine the speed, flexibility, and power of the document model with ACID data integrity guarantees.

Normalization VS Denormalization

Normalization: Dividing up data into multiple collections with references between collections. Denormalization: embedding all of the data in a single document.

Normalization will provide an update efficient data representation. Denormalization will make data reading efficient.

Use embedded (denorm) when:

Use normalized data models:

Avoid Normalization if you have to do lookups due to is slow, specially in sharded collections

Miscelanea

Cardinality: one-to-one, one-to-many, or many-to-many.

Namespace: The concatenation of the collection name and database name => instasent-admin.sms

Documents: Data is stored in BSON documents. Documents that tend to share a similar structure are organized as collections. Advantages of documents:

-   Documents  correspond to native data types in many programming languages.
-   Embedded documents and arrays reduce need for expensive joins.

Creating a schema:

-   Combine objects into one document if you use them together. Otherwise, separate them
-   Do joins while write, and not when it is on read
-   For most frequent use cases optimize your schema
-   Do complex aggregation in the schema

Profiler: MongoDB includes a database profiler which shows performance characteristics of each operation against the database. With this profiler you can find queries (and write operations) which are slower than they should be and use this information for determining when an index is needed.

ObjectID:

-   a 4-byte value representing the seconds since the Unix epoch,
-   a 5-byte random value, and
-   a 3-byte counter, starting with a random value.

Covered query:

-   fields used in the query are part of an index used in the query, and
-   the fields returned in the results are in the same index
- 

Transaction: A logical, atomic unit of work that contains one or more SQL statements. MongoDB (prior 4) does not use traditional locking or complex transactions with rollback, as it is designed to be light weight, fast and predictable in its performance. By keeping transaction support extremely simple, performance is enhanced, especially in a system that may run across many servers.

Why are data files so large?: MongoDB does aggressive preallocation of reserved space to avoid file system fragmentation.

How does MongoDB provide consistency? MongoDB uses the reader-writer locks, allowing simultaneous readers to access any supply like a database or any collection. But always offers private access to singles writes.

Dot notation: access the elements of an array and to access the fields of an embedded document.

Indexes

Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. MongoDB automatically creates a unique index on the _id field. Indexes properties:

Indexing an array: An array field can be indexed in MongoDB. In this case, MongoDB would index each value of the array so you can query for individual items

Aggregation

Aggregations are operations that process data records and return computed results. Types:

What are the disadvantages of MongoDB?

Tips

Sharding

The procedure of splits the data-set into discrete parts. By putting a subset of data on each machine, it becomes possible to store more data and handle more load without requiring larger or more powerful machines, just a larger quantity of less-powerful machines.

Database systems with large data sets:

To address these issues of scale, database systems have two basic approaches:

enter image description here

Replication

The process of duplicating the data-set. Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server.

Replica set: is a group of servers with one primary and multiple secondary’s, servers that keep copies of the primary’s data. If the primary crashes, the secondary’s can elect a new primary from amongst themselves.

enter image description here

Oplog: capped collection that keeps a rolling record of all operations that modify the data stored in your databases. Oplog is just capped collection where MongoDB tracks all changes in its collections

Journal is a feature of underlying storage engine. Without a journal, if mongod exits unexpectedly, you must assume your data is in an inconsistent state With journaling enabled, if mongod stops unexpectedly, the program can recover everything written to the journal, and the data remains in a consistent state.

GridFs

A mechanism for storing large binary files in MongoDB. GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB. GridFS does not have issues with storing large numbers of files in the same directory. For storing and retrieving large files such as images, video files and audio files GridFS is used. By default, it uses two files fs.files and fs.chunks to store the file’s metadata and the chunks. enter image description here

Rollback: Rollback can fail if there are more than 300 MB of data or about 30 minutes of operations to roll back. In these cases, you must re-sync the node that is stuck in rollback