Definition: A database management system (DBMS) that uses a document-oriented database model. It is document oriented database which is used to high availability, easy scalability and high performance. It supports the dynamic schema design.
Historically MongoDB does not support default multi-document ACID transactions (multiple-document updates that can be rolled back and are ACID-compliant). However, MongoDB provides atomic operation on a single document. MongoDB 4.0 will add support for multi-document transactions, making it the only database to combine the speed, flexibility, and power of the document model with ACID data integrity guarantees.
Normalization: Dividing up data into multiple collections with references between collections. Denormalization: embedding all of the data in a single document.
Normalization will provide an update efficient data representation. Denormalization will make data reading efficient.
Use embedded (denorm) when:
Use normalized data models:
Avoid Normalization if you have to do lookups due to is slow, specially in sharded collections
Cardinality: one-to-one, one-to-many, or many-to-many.
Namespace: The concatenation of the collection name and database name => instasent-admin.sms
Documents: Data is stored in BSON documents. Documents that tend to share a similar structure are organized as collections. Advantages of documents:
- Documents correspond to native data types in many programming languages. - Embedded documents and arrays reduce need for expensive joins.
Creating a schema:
- Combine objects into one document if you use them together. Otherwise, separate them - Do joins while write, and not when it is on read - For most frequent use cases optimize your schema - Do complex aggregation in the schema
Profiler: MongoDB includes a database profiler which shows performance characteristics of each operation against the database. With this profiler you can find queries (and write operations) which are slower than they should be and use this information for determining when an index is needed.
- a 4-byte value representing the seconds since the Unix epoch, - a 5-byte random value, and - a 3-byte counter, starting with a random value.
- fields used in the query are part of an index used in the query, and - the fields returned in the results are in the same index -
Transaction: A logical, atomic unit of work that contains one or more SQL statements. MongoDB (prior 4) does not use traditional locking or complex transactions with rollback, as it is designed to be light weight, fast and predictable in its performance. By keeping transaction support extremely simple, performance is enhanced, especially in a system that may run across many servers.
Why are data files so large?: MongoDB does aggressive preallocation of reserved space to avoid file system fragmentation.
How does MongoDB provide consistency? MongoDB uses the reader-writer locks, allowing simultaneous readers to access any supply like a database or any collection. But always offers private access to singles writes.
Dot notation: access the elements of an array and to access the fields of an embedded document.
Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. MongoDB automatically creates a unique index on the _id field. Indexes properties:
Indexing an array: An array field can be indexed in MongoDB. In this case, MongoDB would index each value of the array so you can query for individual items
Aggregations are operations that process data records and return computed results. Types:
The procedure of splits the data-set into discrete parts. By putting a subset of data on each machine, it becomes possible to store more data and handle more load without requiring larger or more powerful machines, just a larger quantity of less-powerful machines.
Database systems with large data sets:
To address these issues of scale, database systems have two basic approaches:
Sharding or Horizontal Scaling
Shards are used to store the data.
The process of duplicating the data-set. Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server.
Replica set: is a group of servers with one primary and multiple secondary’s, servers that keep copies of the primary’s data. If the primary crashes, the secondary’s can elect a new primary from amongst themselves.
Oplog: capped collection that keeps a rolling record of all operations that modify the data stored in your databases. Oplog is just capped collection where MongoDB tracks all changes in its collections
Journal is a feature of underlying storage engine. Without a journal, if mongod exits unexpectedly, you must assume your data is in an inconsistent state With journaling enabled, if mongod stops unexpectedly, the program can recover everything written to the journal, and the data remains in a consistent state.
A mechanism for storing large binary files in MongoDB. GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB. GridFS does not have issues with storing large numbers of files in the same directory. For storing and retrieving large files such as images, video files and audio files GridFS is used. By default, it uses two files fs.files and fs.chunks to store the file’s metadata and the chunks.
Rollback: Rollback can fail if there are more than 300 MB of data or about 30 minutes of operations to roll back. In these cases, you must re-sync the node that is stuck in rollback