Running AI applications on Oracle’s Exadata platform

While artificial intelligence has been around for the last 50 years, its use cases have only recently been looked at seriously. The reason for this is artificial intelligence requires large volumes of data that have only recently become available as unstructured data, crossing the volumes of data available in relational databases.

“90% of the world’s data has been created in the last two-three years and 90% of the data are not in relational databases. They are all in big data, social media feeds. Every day tons of data are coming in through IoT and 5G will bring in even more data,” says Han Chung Heng, Senior Vice President, Systems EMEA JAPAC, Oracle.

While social media, social collaboration, smartphones, are adding to the volumes of unstructured data, the actual amount of data being mined for intelligence and insights is only 2% to 5% of the total volume, points out Chung Heng. While there is a huge amount of data that can be used for modelling, people also need to spend a huge amount of time preparing the data for modelling. This leads to the low rate of conversion for modelling of data into insights.

“What is important in artificial intelligence is that you need to analyse high volumes of data. When you have high volumes of data, you need to be able to bring everything together. With Oracle’s Exadata and Engineered Systems you can actually reduce the bottleneck from reading the data source,” points out Chung Heng.

Exadata design

Oracle’s Exadata has the unique strength of combining machine learning and artificial intelligence solutions within the same environment of computing, networking, and storage. “Our design point for Exadata is very simple. We bring storage and compute to the database so that data does not have to travel. This makes it very fast and this is our design point for Exadata. We also bring algorithms to the database,” explains Chung Heng.

The traditional way of generating insights and intelligence from data has been to make data travel to where applications and algorithms are hosted. It was all about how people are going to bring data to the algorithms and do modeling. And that means the data has to do a lot of traveling.

This puts pressure on building performance capabilities to make data travel. Exadata has been designed to bring algorithms to the database, and build compute, storage and networking around the database.

“The combination of our database and our Exadata allows us to do analytics very fast, says Chung Heng.

Auto indexing

By bringing computing, storage, networking, closer to the database, Oracle is helping to boost the efficiency of applications that are built on algorithms of machine learning and artificial intelligence. Generating insights from a database also implies that the data is well indexed. The larger and more complex the data, the longer it takes to index the database. However, once indexing has been completed, data retrieval is much faster.

“Indexing is required to help you get the information you want fast. Now for many years when you do indexing it takes a long time. It took us about 15 years of experience to index about close to 9,000 indexes. With the new database and new Exadata we are able to reduce the 15 years to around 10 hours. Obviously, they were 6,000 indexes,” adds Chung Heng.

Auto indexing is now an in-built feature in the Exadata platform and works in conjunction with the algorithms of machine learning, and artificial intelligence.

Unifying data

While large amounts of unstructured data are being generated, they also need to be consolidated into data lakes and then integrated with relational databases before algorithms can be applied to complete the contextual analysis. As an example, closed circuit TVs capture millions of images across the globe at any particular time. How do you link face recognition with biometric thumb print and identity card information?

“People are trying to pull big data into data lakes but they have not unified them. We are unifying big data and structured data into one pool. And we are using machine learning to make sense and combine it together to analyse both big data and relational data to make sense of what you have,” explains Chung Heng.

Engineered Systems from Oracle help to unify the data from structured and unstructured sources. Engineered Systems have Oracle’s Data Fusion platform that is using NoSQL and data integration tools.

A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. NoSQL databases are increasingly used in big data and real-time web applications.

Training the data

Once the multiple databases have been unified, the data needs to be trained or modelled. Training the data is a compute intensive operation and demands large amount of energy. Training of data is meant to recognise certain trends, would be very specific to an industry, and would require algorithms.

Chung Heng advocates training and modelling of the data in the cloud and not on-premises. Training or modelling of the data needs to be done only once, and therefore is better suited in the cloud. Once the data has been modelled, testing of the data and production can be done on-premises.

By moving the training of data into the cloud, end users can use the complete computing, networking, and storage power of the Exadata platform to run the use cases of artificial intelligence.

The overall process to develop an artificial intelligence use case, is to build the training and modelling of data in the cloud, then move the testing of data on premises, and as the model develops, move it into production, again on-site. For all these stages algorithms need to be embedded in the processes.

If training of the data is moved to the cloud, the full power of Exadata’s computing, networking and storage can be used for test and development and production. Reinforces Chung Heng, “This is very good since it provides for green technology innovation. Sustainability is a very important part of the process.


Key takeaways

  • 90% of the world’s data has been created in the last two-three years.
  • 90% of the data are not in relational databases.
  • What is important in artificial intelligence is that you need to analyse high volumes of data.
  • When you have high volumes of data, you need to be able to bring everything together.
  • The larger and more complex the data, the longer it takes to index the database.
  • People are trying to pull big data into data lakes but they have not unified them.
  • Engineered Systems from Oracle help to unify the data from structured and unstructured sources.
  • Chung Heng advocates training and modelling of the data in the cloud and not on-premises.

Exadata

The Oracle Exadata Database Machine is engineered to deliver better performance, cost effectiveness, and availability for Oracle databases. Exadata features a modern cloud-enabled architecture with scale-out high-performance database servers, scale-out intelligent storage servers with state-of-the-art PCI Flash, and an ultra-fast InfiniBand internal fabric that connects all servers and storage. Algorithms and protocols in Exadata implement database intelligence in storage, compute, and InfiniBand networking to deliver higher performance and capacity at lower costs than other platforms.

Exadata runs all types of database workloads including Online Transaction Processing, Data Warehousing, In-Memory Analytics as well as consolidation of mixed workloads. Simple and fast to implement, the Exadata Database Machine powers and protects your most important databases. Exadata can be purchased and deployed on premises as the ideal foundation for a private database cloud, or it can be acquired using a subscription model and deployed in the Oracle Public Cloud or Cloud at Customer with all infrastructure management performed by Oracle.

The Exadata Database Machine integrates all the hardware needed for running Oracle Database. The database servers, storage servers and network are pre-configured, pre-tuned and pre-tested by Oracle experts, eliminating the effort typically required to deploy mission critical systems. Extensive end-to-end testing ensures that all components including database software, operating system, firmware, drivers, work seamlessly together and that there are no performance bottlenecks or single points of failure that can affect the complete system.

Since all Exadata Database Machines are identically configured, customers benefit from the experience of thousands of other users that have deployed the Exadata Database Machine for their mission critical applications. Customer machines are also identical to the machines Oracle Support uses for problem identification and resolution, the machines Oracle Development uses for development and testing of the Oracle Database, and the machines Oracle uses to implement its own SaaS and PaaS public cloud services. Exadata is the most thoroughly tested and tuned platform for running Oracle Database, and is also the most supportable platform.

The Exadata Database Machine uses a scale-out architecture for both database servers and storage servers. As workloads grow, database CPUs, storage, and networking are added to grow an Exadata Database Machine in a balanced fashion to scale without bottlenecks. The architecture allows elastic, seamless expansion from small to extremely large configurations to accommodate evolving workloads of any size.

With read latency from shared database storage as low as 19 microseconds and up to 16 million SQL IOPS, a full-rack Oracle Exadata X8M has 5X lower latency than traditional storage vendors. Oracle Exadata X8M enables data loading for analytics at over 560 gigabytes per second, and more than 1 Terabyte per second data from flash. Oracle Exadata X8M is the first enterprise database platform to incorporate RoCE access to shared persistent memory for Oracle Database, giving access to 27 TB of Persistent Memory. RDMA over Converged Ethernet gives access to persistent memory, bypassing the networking software stack and storage server CPUs, freeing up resources that can be used to increase analytics and mixed-workload performance.

Oracle Exadata X8M allows mixing of different types of storage servers to meet consolidated database needs. Combine high-performance all-flash storage with persistent memory, flash-accelerated high-capacity disk storage with persistent memory, and low-cost disk storage for low-use data. Oracle Exadata X8M runs all types of database workloads, including online transaction processing, data warehousing, and in-memory analytics, as well as consolidation of mixed workloads. Exadata also uses RDMA over Converged Ethernet, RoCE. This is a network protocol that allows remote direct memory access RDMA over an Ethernet network.

Big data and NoSQL

A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called Not only SQL to emphasise that they may support SQL-like query languages, or sit alongside SQL databases.

Motivations for this approach include: simplicity of design, simpler horizontal scaling to clusters of machines which is a problem for relational databases, finer control over availability and limiting the object-relational impedance mismatch.

The data structures used by NoSQL databases that is key-value, wide column, graph, or document are different from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by NoSQL databases are also viewed as more flexible than relational database tables.