Big Time Series Analysis with JuliaDB

Dr. Josh Day of Julia Computing takes a look into the multi-indexed database of the future

The next generation of data analysis requires the next generation of tools. The most popular opensource packages for data analysis (Python’s pandas and various R packages) are designed to work with small files of basic data types, but ‘small’ and ‘basic’ do not describe the data landscape of the future. The amount of data in the world is growing exponentially, and as The Economist observes, it’s changing as it grows:

“The quality of data has changed, too. They are no longer mainly stocks of digital information – databases of names and other well-defined personal data, such as age, sex, and income. The new economy is more about analyzing rapid realtime flows of often unstructured data: the streams of photos and videos generated by users of social networks, the reams of information produced by commuters on their way to work, the flood of data from hundreds of sensors in a jet engine.”

The current generation tools therefore face a number of difficulties in analyzing the next generation of data. The first is that of scale, which can be achieved with distributed computing systems like Hadoop and Spark, but loses the ease of use that make Python and R tools attractive.

Scaling an analysis also adds costs in the form of gluing together tools that may not support the same data types or operations (e.g., Spark DataFrame to Pandas DataFrame to numpy array to scikit-learn model). Another issue for current databases is storing nonstandard data types. A database can sometimes work around unsupported types (e.g., units and currencies) by attaching metadata to a field, but the same approach is harder to apply to more complicated data like images and video. The next-generation database should therefore offer the features that are lacking in the current  generation:

  • Scalability (works equally well on Small and Big Data)
  • Ease of use (no need to glue together different formats)
  • Flexibility (stores data types that may not exist yet).

Introducing JuliaDB

JuliaDB aims to be the analytics database of the future. It is implemented entirely in Julia, a high-performance language for technical computing designed around modern technologies such as just-in-time compilation, type inference, and parallelism.

Logged-in members can download the article by clicking the link under all the “Related Posts” below. If there isn’t a link then you aren’t logged in! To log in or register visit here.

 

Related Posts

Automatic Differentiation for the Greeks The sensitivities of the value of an option to the model parameters, a.k.a. “the Greeks,” are crucial to understanding the risk of an option posit...
Order Statistics for Value at Risk Estimation and ... We apply order statistics to the setting of VaR estimation. Here techniques like historical and Monte Carlo simulation rely on using the k-th heaviest...
Poker as a Lottery Doyle Brunson , two-time winner of the World Series of Poker main event, has likened a poker tournament to a lottery in which more skilled players...
Building Your Wings on the Way Down Ray Bradbury famously defined “living at risk” as jumping off a cliff and building your wings on the way down. Too many financial risk managers wh...
Internal LGD Estimation in Practice Driven by a competitive market and motivated by the new Basel Capital Accord (Basel II), banks have put a lot of effort into development and impro...
Scenarios IV: Planning for Disasters and then Deal... As I write this it is the fourth anniversary of September 11, 2001. Aside from the anniversary specials airing today, there is little in the news ...
A Conditional Valuation Approach for Path-Dependen... In an effort to improve credit risk management, financial institutions have developed various measures to manage their exposure to counterparty risk. ...
Finformatics: How to Measure Really Small Things Traders in financial assets implicitly compare the trading price to the stream of dividends the assets stand to generate. Clearly, a key determina...
Big Time Series Analysis with JuliaDB
10-13_julia_final_may18