Types of No. SQL databases and key criteria for choosing them. This is an excerpt from Chapter 1. No. SQL for Mere Mortals by Dan Sullivan, an independent database consultant and author. In the chapter, Sullivan takes a look at the four primary types of No. SQL databases - - key- value, document, column family and graph databases - - and provides insights into which applications are best suited for each of them. He also discusses the differences between relational and No. SQL database design, and the need for coexistence between relational and No.
SQL technologies in many organizations. In relational database design, the structure and relations of entities drive design - - not so in No. SQL database design. Of course, you will model entities and relations, but performance is more important than preserving the relational model. The relational model emerged for pragmatic reasons - - that is, data anomalies and difficulty reusing existing databases for new applications. No. SQL databases also emerged for pragmatic reasons - - specifically, the inability to scale to meet growing demands for high volumes of read and write operations. In exchange for improved read and write performance, you may lose other features of relational databases, such as immediate consistency and ACID transactions (although, this is not always the case).
Throughout this book, queries have driven the design of data models. This is the case because queries describe how data will be used.
Queries are also a good starting point for understanding how well various No. SQL databases will meet your needs. You will also need to understand other factors, such as: The volume of reads and writes. Tolerance for inconsistent data in replicas. The nature of relations between entities and how that affects query patterns. Availability and disaster recovery requirements. The need for flexibility in data models.
Roopendra is passionate about researching on new technologies in DevOps and Web-Development. He has written many articles around various technologies, open source.
Latency requirements. The following sections provide some sample use cases and some criteria for matching different No. SQL database models to different requirements.
Criteria for selecting key- value databases. Key- value databases are well- suited to applications that have frequent small reads and writes along with simple data models. The values stored in key- value databases may be simple scalar values, such as integers or Booleans, but they may be structured data types, such as lists and JSON structures. Key- value databases generally have simple query facilities that allow you to look up a value by its key.
Some key- value databases support search features that provide for somewhat more flexibility. Developers can use tricks, such as enumerated keys, to implement range queries, but these databases usually lack the query capabilities of document, column family and graph databases. Key- value databases are used in a wide range of applications, such as the following: Caching data from relational databases to improve performance.
Tracking transient attributes in a Web application, such as a shopping cart. Storing configuration and user data information for mobile applications.
Storing large objects, such as images and audio files. Use cases and criteria for selecting document databases.
Document databases are designed for flexibility. If an application requires the ability to store varying attributes along with large amounts of data, then document databases are a good option.
For example, to represent products in a relational database, a modeler may use a table for common attributes and additional tables for each subtype of product to store attributes used only in the subtype of product. Document databases can handle this situation easily. Document databases provide for embedded documents, which are useful for denormalizing. Instead of storing data in different tables, data that is frequently queried together is stored together in the same document. These No. SQL databases will continue to coexist with each other ..
Additionally, document databases improve on the query capabilities of key- value databases with indexing and the ability to filter documents based on attributes in the document. Document databases are probably the most popular of the No.
SQL databases because of their flexibility, performance and ease of use. These databases are well- suited to a number of use cases, including: Back- end support for websites with high volumes of reads and writes. Managing data types with variable attributes, such as products. Tracking variable types of metadata.
Applications that use JSON data structures. Applications benefiting from denormalization by embedding structures within structures. Document databases are also available from cloud services such as Microsoft Azure Document and Cloudant's database.
Use cases and criteria for selecting column family databases. Column family databases are designed for large volumes of data, read and write performance, and high availability. Google introduced Bigtable to address the needs of its services. Facebook developed Cassandra to back its Inbox Search service. These database management systems run on clusters of multiple servers. If your data is small enough to run with a single server, then a column family database is probably more than you need - - consider a document or key- value database instead.
Column family databases are well- suited for use with: Applications that require the ability to always write to the database. Applications that are geographically distributed over multiple data centers. Applications that can tolerate some short- term inconsistency in replicas. Applications with dynamic fields. Applications with the potential for truly large volumes of data, such as hundreds of terabytes. Google demonstrated the capabilities of Cassandra running the Google Compute Engine. Google engineers deployed: 3.
Google Compute Engine virtual machines. TB Persistent Disk volumes.
Debian Linux. Datastax Cassandra 2. Data was written to two nodes (Quorum commit of 2)3. With this configuration, the Cassandra cluster reached 1 million writes per second, with 9. When one- third of the nodes were lost, the 1 million writes were sustained, but with higher latency.
Several areas can use this kind of big data processing capability, such as: Security analytics using network traffic and log data mode. Big Science, such as bioinformatics using genetic and proteomic data. Stock market analysis using trade data. Web- scale applications such as search. Social network services.
Key- value, document and column family databases are well- suited to a wide range of applications. Graph databases, however, are best suited to a particular type of problem. Use cases and criteria for selecting graph databases. Problem domains that lend themselves to representations as networks of connected entities are well- suited for graph databases. One way to assess the usefulness of a graph database is to determine if instances of entities have relations to other instances of entities. For example, two orders in an e- commerce application probably have no connection to each other.
They might be ordered by the same customer, but that is a shared attribute, not a connection. Similarly, a game player's configuration and game state have little to do with other game players' configurations. Entities like these are readily modeled with key- value, document or relational databases. Now, consider examples mentioned in the discussion of graph databases, such as highways connecting cities, proteins interacting with other proteins and employees working with other employees. In all of these cases, there is some type of connection, link or direct relationship between two instances of entities. These are the types of problem domains that are well- suited to graph databases.
Other examples of these types of problem domains include: Network and IT infrastructure management. Identity and access management. Business process management. Recommending products and services. Social networking.
- ยท After upgrading from Windows 8 Pro 64bit to Windows 8.1 Pro 64bit on my Sony Vaio Duoe 13, I could not find WPA security type while configuring Wireless.
- MS Paint, the first app you used for editing images, will probably be killed off in future updates of Windows 10, replaced by the new app Paint 3D. Microsoft lists.
From these examples, it is clear that when there is a need to model explicit relations between entities and rapidly traverse paths between entities, then graph databases are a good database option. Large- scale graph processing, such as with large social networks, may actually use column family databases for storage and retrieval. Graph operations are built on top of the database management system.
The Titan graph database and analysis platform takes this approach. Key- value, document, column family and graph databases meet different types of needs.
Evaluating the different types of DBMS products. The database management system (DBMS) is the heart of today's operational and analytical business systems. Data is the lifeblood of the organization and the DBMS is the conduit by which data is stored, managed, secured and served to applications and users. But there are many different forms and types of DBMS products on the market, and each offers its own strengths and weaknesses. Relational databases, or RDBMSes, became the norm in IT more than 3. But some shortcomings became more apparent in the Web era and with the full computerization of business and much of daily life.
Today, IT departments trying to process unstructured data or data sets with a highly variable structure may also want to consider No. SQL technologies.
Applications that require high- speed transactions and rapid response rates, or that perform complex analytics on data in real time or near real time, can benefit from in- memory databases. And some IT departments will want to consider combining multiple database technologies for some processing needs. The DBMS is central to modern applications, and choosing the proper database technology can affect the success or failure of your IT projects and systems.
Today's database landscape can be complex and confusing, so it is important to understand the types and categories of DBMSes, along with when and why to use them. Let this document serve as your roadmap. DBMS categories and models. Until relatively recently, the RDBMS was the only category of DBMS worth considering. But the big data trend has brought new types of worthy DBMS products that compete well with relational software for certain use cases. Additionally, an onslaught of new technologies and capabilities are being added to DBMS products of all types, further complicating the database landscape. The RDBMS: However, the undisputed leader in terms of revenue and installed base continues to be the RDBMS.
Based on the sound mathematics of set theory, relational databases provide data storage, access and protection with reasonable performance for most applications, whether operational or analytical in nature. For more than three decades, the primary operational DBMS has been relational, led by industry giants such as Oracle, Microsoft (SQL Server) and IBM (DB2). The RDBMS is adaptable to most use cases and reliable; it also has been bolstered by years of use in industry applications at Fortune 5. Of course, such stability comes at a cost: RDBMS products are not cheap. Support for ensuring transactional atomicity, consistency, isolation and durability - - collectively known as the ACID properties - - is a compelling feature of the RDBMS. ACID compliance guarantees that all transactions are completed correctly or that a database is returned to its previous state if a transaction fails to go through. Given the robust nature of the RDBMS, why are other types of database systems gaining popularity?
Web- scale data processing and big data requirements challenge the capabilities of the RDBMS. Although RDBMSes can be used in these realms, DBMS offerings with more flexible schemas, less rigid consistency models and reduced processing overhead can be advantageous in a rapidly changing and dynamic environment. Enter the No. SQL DBMS. The No. SQL DBMS: Where the RDBMS requires a rigidly defined schema, a No. SQL database permits a flexible schema, in which every data element need not exist for every entity. For loosely defined data structures that may also evolve over time, a No.
SQL DBMS can be a more practical solution. Another difference between No. SQL and relational DBMSes is how data consistency is provided.
The RDBMS can ensure the data it stores is always consistent. Most No. SQL DBMS products offer a more relaxed, eventually consistent approach (though some provide varying consistency models that can enable full ACID support). To be fair, most RDBMS products also offer varying levels of locking, consistency and isolation that can be used to implement eventual consistency, and many No. SQL DBMS products are adding options to support full ACID compliance. So No. SQL addresses some of the problems encountered by RDBMS technologies, making it simpler to work with large amounts of sparse data.
Data is considered to be sparse when not every element is populated and there is a lot of "empty space" between actual values. For example, think of a matrix with many zeroes and only a few actual values. But while certain types of data and use cases can benefit from the No. SQL approach, using No. SQL databases can come at the price of eliminating transactional integrity, flexible indexing and ease of querying. Further complicating the issue is that No.
SQL is not a specific type of DBMS, but a broad descriptor of four primary categories of different DBMS offerings: Key- value. Document. Wide column store. Graph. Each of these types of No. SQL DBMS uses a different data model with different strengths, weaknesses and use cases to consider.
A thorough evaluation of No. SQL DBMS technology requires more in- depth knowledge of each No.
SQL category, along with the data and application needs that must be supported by the DBMS. The in- memory DBMS: One last major category of DBMS to consider is the in- memory DBMS (IMDBMS), sometimes referred to as a main memory DBMS. An IMDBMS relies mostly on memory to store data, as opposed to disk- based storage. The primary use case for the IMDBMS is to improve performance. Because the data is maintained in memory, as opposed to on a disk storage device, I/O latency is greatly reduced. Mechanical disk movement, seek time and transfer to a buffer can be eliminated because the data is immediately accessible in memory. An IMDBMS can also be optimized to access data in memory, as opposed to a traditional DBMS that is optimized to access data from disk.
IMDBMS products can reduce overhead because the internal algorithms usually are simpler, with fewer CPU instructions. A growing category of DBMS is the multi- model DBMS, which supports more than one type of storage engine. Many No. SQL offerings support more than one data model - - for example, document and key- value. RDBMS products are evolving to support No.
SQL capabilities, such as adding a column store engine to their relational core. Other DBMS categories exist, but are not as prevalent as relational, No. SQL and in- memory: XML DBMSes are architected to support XML data, similar to No. SQL document stores. However, most RDBMS products today provide XML support. A columnar database is a SQL database system popular for optimized for business intelligence and data warehousing because it is optimized for reading a few columns of many rows at once (and is not optimized for writing data). Popular in the 1.
OO) DBMSes were designed to work with OO programming languages, similar to No. SQL document stores. Pre- relational DBMSes include hierarchical systems - - such as IBM IMS - - and network systems - - such as CA IDMS - - running on large mainframes.
Both still exist and support legacy applications. Additional considerations As you examine the DBMS landscape, you will inevitably encounter many additional issues that require consideration. At the top of that list is platform support. The predominant computing environments today are Linux, Unix, Windows and the mainframe.
Not every DBMS is supported on each of these platforms. Another consideration is vendor support. Many DBMS offerings are open source, particularly in the No. SQL world. The open source approach increases flexibility and reduces initial cost of ownership.
However, open source software lacks support unless you purchase a commercial distribution. Total cost of ownership can also be higher when you factor in the related administration, support and ongoing costs. You might also choose to reduce the pain involved in acquisition and support by using a database appliance or deploying in the cloud. A database appliance is a preinstalled DBMS sold on hardware that is configured and optimized for database applications. Using an appliance can dramatically reduce the cost of implementation and support because the software and hardware are designed to work together.
Implementing your databases in the cloud goes one step further. Instead of implementing a DBMS at your shop, you can contract with a cloud database service provider to implement your databases using the provider's service.
This is referred to as DBaa. S, or database as a service.