More than 2.5 quintillion bytes of data are created every day, and it is only predicted to grow from there according to the last DOMO report. By the end of 2020, it is estimated that 1.7MB of data will be created every second for every person on Earth. Can you see the importance of the database?
Companies have invested literally billions of dollars in infrastructure to not only store but also search, update and delete this simply absurd amount of data.
So let’s understand what they are, what they’re for, and what the main types of databases we have at our disposal today, and find out what the ideal database is for your next project.
In the simplest of definitions, a database is just one way to store and organize information. Ideally, this information is stored in an orderly manner to assist in searches.
You can imagine a database as if it were a library of books, each book is organized according to some kind of category, either by genre, author, language in which it was written, etc.
And each library has a system of organization that depends on factors such as the size of the library, what kind of books it stores, how many visitors it receives, and so on.
Databases are very similar structures. Choosing the best type for your project and organizing your data in the best possible way will make all the difference.
Today we have two main types of databases:
To talk about these two, I need to introduce what SQL is.
SQL stands for Structured Query Language
It is a programming language used to create a connection between data stored in a Relational Database Management System (RDBMS).
Yes, SQL is a programming language, but not for general purposes like Java, PHP, Python, and others. SQL is a specific language for database communication.
An example of code in SQL would be like this, this code retrieves all names and emails from the Users table, for example.
SELECT username, email FROM Users;
Many RDBMSs use SQL (and variations of SQL) to access data in tables.
An RDBMS is a program that allows you to create, update, and manage relational databases.
MySQL, for example, is RDBMS.
A relational database is a type of database. It stores and provides access to related data. Hence the name ‘relational’. These relationships are built using tables, similar to this one:
Id | username | password | |
---|---|---|---|
1 | user1 | user1@example.com | !*&JGs1ˆ& |
2 | user2 | user2@example.com | ($%ˆgˆ8r*7 |
3 | user3 | user3@example.com | $%rtg&nd |
In the relational model, each row in the table is a record with a unique ID called the primary key.
It is from this primary key that we can create the relationships between the tables’ data.
For example, we could have a second table with addresses. And we can generate a relationship between them through the user ID.
Id | address | country | userID |
---|---|---|---|
1 | Main Av, 500 | Brazil | 1 |
2 | Second Av, 300 | Portugal | 2 |
3 | Third Av, 100 | Mozambique | 3 |
With this second table it is simple to recognize that the address with Id 1 belongs to the user with Id 1, address with Id 2 belongs to the user with Id 2, and address with Id 3 belongs to the user with Id 3.
If we wanted to retrieve user1’s address through an SQL Querry, we could do so:
SELECT * FROM Address WHERE userId = 1;
If you enjoyed this structure and method of organizing the data, I have some great news for you, there are several free options you can choose to work with. I’ll mention the main ones down here.
According to themselves “MySQL is the most popular Open Source SQL database management system”. MySQL is supported by Oracle.
As key advantages of MySQL I can mention:
In this project I showed how to connect Python to a MySQL database.
Difficulties in scalability and poor performance involving tables with a lot of data can be mentioned with some of the weaknesses of this system.
PostgreSQL is also an open-source RDBMS, but it is not controlled by any corporation.
PostgreSQL provides many of the same benefits as MySQL. It is easy to use, inexpensive, reliable and has a broad community of developers.
PostgreSQL has a typical lower performance than other RDBMSs, such as MySQL, and although it has been used more in recent years, PostgreSQL lags behind in terms of popularity.
Many of the world’s leading banks operate with Oracle’s systems, as Oracle offers a strong mix of pre-integrated, robust business software and technologies, including features designed specifically for banks.
Oracle offers a free version called OracleDB Express Edition (XE).
The disadvantage may be the high cost. But you are paying for high performance, excellent customer service, and several other benefits, the high cost may be justified.
Microsoft also offers an RDBMS. With a free entry version, which by the way has the same name as the competitor, Express.
It also focuses on large business applications with large infrastructure.
Just like Oracle’s system, it’s not open source either.
The downside is also the high cost and the fact that it is necessary to invest in Good hardware to run the latest versions.
SQLite is a popular open-source SQL database. It can store an entire collection in a single file.
One of the most significant advantages of this is that all data can be stored locally without the need to connect your database to a server. Widely used in mobile applications (Android and iOS).
NoSQL databases (also known as “not only SQL”) are not made up of tables, they store data differently from relational tables.
They appeared in the late 2000s when the cost of storage dropped dramatically.
NoSQL databases come in a variety of types based on your data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads.
A common misunderstanding is to think that NoSQL databases do not store relational data well. NoSQL can store relational data – they only store it differently than the relational model does.
When compared to an SQL database, many find that modeling relationship data in NoSQL is easier than in SQL, because the related data does not need to be split between tables.
NoSQL data models allow related data to be nested within a single data structure.
So far, four main types of NoSQL databases have emerged:
Let’s get a look at each fellow.
Document type NoSQL stores data in structures similar to JSON objects (JavaScript Object Notation). Something like this:
{
"_id" : ObjectId("5ea3f99880aeb46f07d550fe"),
"id" : 1,
"password" : "pbkdf2_qg5uny4$9mVb1OqdkvvRubzHay0I3EzODn7i37icn6GQVHLqtiY=",
"is_superuser" : true,
"username" : "debug",
"first_name" : "Debug",
"last_name" : "Everything",
"email" : 'admin@debugeverything.com',
"date_joined" : ISODate("2020-04-25T08:49:28.513Z")
}
Each document contains pairs of fields and values. The values can typically be a variety of types, including things like strings, numbers, booleans, arrays, or objects.
Due to its variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used for general-purposes.
A Key-Value NoSQL stores data as a set of Keys and Values, also known as an associative array, organized into rows.
An example of what an electronics store database might look like.
Key-value databases are great for use cases where you need to store large amounts of data but do not need to perform complex queries to retrieve it. A very common use is to store user preferences or cache.
In this type of NoSQL database, the columns in each row are contained within that row. An image to explain better:
Each column family is a container of rows in a table. The key identifies the row that consists of multiple columns.
The rows do not have to contain the same number of columns or the same information. Columns can be added to any row at any time without having to add it to other rows.
Wide columns databases are great for when you need to store large amounts of data and you can predict what your query patterns will be
This type of model is often used to store user profile data.
Nodes and relationships are the essential constituents of this type of database. A node represents an entity. A relationship represents how two nodes are associated.
The graph model use nodes that contain lists of related records. These records represent relationships with other nodes, which eliminates the (time-consuming) search and comparison operation found in the relational database.
The relationships between the nodes are predetermined, that is, they are not determined at the time of consultation.
The most famous non-relational database.
MongoDB stores data in flexible JSON-type documents, which means that the fields can vary from document to document and the data structure can change over time.
Also, you will receive benefits such as full index support, high availability via WANs and LANs, along with easy replication, horizontal scalability, rich document-based queries, flexibility in data processing and aggregation, along with adequate training, support, and consultation.
Besides everything, MongoDB is free to use.
Redis, which stands for Remote Dictionary Server, is a fast, open-source storage database with stores key-value in memory to fast access.
One of the main attractions of using Redis is its speed. Redis data is kept in memory, while other databases store data on disk or on SSDs, which allows Redis to access information much faster. Redis will only store data on disk when it is necessary to persist.
Cassandra If you are looking for high availability and scalability without compromising on performance, then the Cassandra database is right for you.
Apache Cassandra is a free and open-source distributed software that uses the wide column standard.
Cassandra is highly scalable and high performance, designed to handle large amounts of data on many servers.
Neo4j is another open-source project, native graph NoSQL database that provides an ACID compatible transactional backend for your applications.
It is referred to as native because it efficiently implements the proprietary graph model down to the storage level. This means that the data is stored exactly as you store it, and it uses pointers to navigate and cross the graph.
I believe we were able to clarify the differences between SQL and NoSQL.
Most of the projects are still based on this first model. But large companies are already migrating some of their databases to NoSQL systems.
It may become a new standard in the future, or it may continue to co-exist with SQL. Only time will tell.
Don’t forget to subscribe to receive all the new and exclusive content!
I’ll see you next time.
Leave a Reply