How does the Document Store work

How does the Document Store work

The database is needed to be able to organize information naturally. But there are several different ways to create a database. In EDP, especially relational databases are common and widely used. But there is also what is called a document oriented database. This is based on a simple table structure and documents for storing information. How does this database work and what are the benefits?

What is a Document Store?

Document-oriented databases - also known as document storage - are used to manage semi-structured data. This is data that does not follow a fixed structure, but rather supports the structure itself. On the basis of signs in semi-structured data, sequences of information can still be made. Due to the lack of a clear structure, this data is not suitable for relational databases because the information cannot be categorized into tables.

A document-oriented database creates a simple pair: a key is given a specific document. In this document, which can be formatted for example with XML, JSON or YAML, the actual information can be found. Because the database does not require a specific schema, you can also integrate different document types together in the document store. Changes to the document do not need to be communicated to the database.

How does a document-oriented database work?

Theoretically, you can put data in different formats and without a consistent scheme in a document oriented database. But in practice, people usually use file formats for documents and build information in a fixed structure. This facilitates work with information and databases. Sequences can be used to process queries better to the database, for example. In general, you can perform the same actions with document-based databases as you do with a relational system: You can enter, modify, delete, and request information.

For the action to be performed, each document receives a unique ID. How it is designed is basically not important. Both simple strings and full paths can be used to handle documents. When searching for information, the document itself is checked: so instead of searching through the appropriate data columns in the database, the data is taken directly from the document instead.

What are the pros and cons of a document oriented database?

In a classic relational database, there must be one field for each information - in each entry. If information is not available, the cell remains empty but must be created. Document-oriented databases are much more flexible: individual document structures don't have to be consistent. Even large amounts of unstructured data can be stored in a database.

It's also easier to enter new information: Although you need to enter new information points into all records in a relational database, it's enough for the document store to integrate new things into just a few records. Additional content can be added to other documents - but not necessary.

In addition, Document Store information is not distributed in several linked tables. All in one place, which can produce better performance. However, document-oriented databases can only exploit this speed advantage as long as people don't try to give them a relational element: References don't really fit the concept of document storage. However, if you try to link documents to each other, the system becomes very complex and complicated. For highly networked data sets, a relational database system is preferred.

The most popular document-oriented database

Especially for web application development, databases for documents have great importance. Due to high demand resulting from web development, many database management systems (DBMS) are now available in the market. The following options present the best known:

  • BaseX: Open source projects using Java and XML. BaseX comes with a graphical user interface.
  • CouchDB: The Apache Software Foundation releases CouchDB open source software. The database management system is written in Erlang, uses JavaScript and is used, inter alia, on Ubuntu and on the Facebook application.
  • Elasticsearch: The search engine works based on a document oriented database. For this, JSON documents are used.
  • eXist: Open source eXist DBMS runs through the Java Virtual Machine and can therefore be used independently of the operating system. On top of that, XML documents are used.
  • MongoDB: MongoDB is the most widely used NoSQL database. This software is written in C ++ and uses documents like JSON.
  • SimpleDB: Amazon has developed its own DBMS with SimpleDB (written in Erlang) for its own enterprise cloud services. To use a provider requires a fee.