Big Time Series Analysis with JuliaDB

Dr. Josh Day of Julia Computing takes a look into the multi-indexed database of the future

The next generation of data analysis requires the next generation of tools. The most popular opensource packages for data analysis (Python’s pandas and various R packages) are designed to work with small files of basic data types, but ‘small’ and ‘basic’ do not describe the data landscape of the future. The amount of data in the world is growing exponentially, and as The Economist observes, it’s changing as it grows:

“The quality of data has changed, too. They are no longer mainly stocks of digital information – databases of names and other well-defined personal data, such as age, sex, and income. The new economy is more about analyzing rapid realtime flows of often unstructured data: the streams of photos and videos generated by users of social networks, the reams of information produced by commuters on their way to work, the flood of data from hundreds of sensors in a jet engine.”

The current generation tools therefore face a number of difficulties in analyzing the next generation of data. The first is that of scale, which can be achieved with distributed computing systems like Hadoop and Spark, but loses the ease of use that make Python and R tools attractive.

Scaling an analysis also adds costs in the form of gluing together tools that may not support the same data types or operations (e.g., Spark DataFrame to Pandas DataFrame to numpy array to scikit-learn model). Another issue for current databases is storing nonstandard data types. A database can sometimes work around unsupported types (e.g., units and currencies) by attaching metadata to a field, but the same approach is harder to apply to more complicated data like images and video. The next-generation database should therefore offer the features that are lacking in the current  generation:

  • Scalability (works equally well on Small and Big Data)
  • Ease of use (no need to glue together different formats)
  • Flexibility (stores data types that may not exist yet).

Introducing JuliaDB

JuliaDB aims to be the analytics database of the future. It is implemented entirely in Julia, a high-performance language for technical computing designed around modern technologies such as just-in-time compilation, type inference, and parallelism.

Logged-in members can download the article by clicking the link under all the “Related Posts” below. If there isn’t a link then you aren’t logged in! To log in or register visit here.

 

Related Posts

Automatic Differentiation for the Greeks The sensitivities of the value of an option to the model parameters, a.k.a. “the Greeks,” are crucial to understanding the risk of an option posit...
Software Frameworks in Quantitative Finance, Part ... We discuss a number of ongoing efforts when developing customizable software systems and frameworks for problems in Quantitative Finance. In particu...
Teraflops for Games and Derivatives Pricing Financial computing continuously demands higher computing performance, which can no longer be accomplished by simply increasing clock speed. Cluster...
The Alternating Direction Explicit (ADE) Method fo... In this article we apply the ADE method to a number of partial differential equations in option pricing using one-factor models (Black–Scholes, loca...
An Asymptotic FX Option Formula in the Cross Curre... In this article, we introduce analytic approximation formulae for FX options in the Libor market model (LMM). The method to derive the formulae is an ...
Data and Code for R Tutorial on Machine Learning: ... Non-linearity in financial market returns is commonplace and in particular in hedge fund returns (Fung/Hsieh (2001), Mitchell/Pulvino (2001)). Hedge F...
Rootless Vol Even if you started out clueless about the volatility σ , given a good enough measuring stick and fast enough hands you ought to be able to measure ...
Forecasting the Yield Curve with S-Plus Methods capable of forecasting the entire yield curve based on a time series extension of the Nelson-Siegel model Nelson and Siegel (1987) were su...
Big Time Series Analysis with JuliaDB
10-13_julia_final_may18