Recently, I read "Why Big Data Projects Fail" by Stephen Brobst. I can’t
agree more with his opinions which exposed the problem I’ve been worried
about. In this article, I am going to further discuss this topic to remind
the enterprises to beware of falling into such pitfall of failure.
Let’s have a look on a positive example. As a successful enterprise in
leveraging big data, how does Google make use of the big data?
1. Collect the row data, capture the contents of each website, e-mail, or
Cookie, and extract the key information.
2. Create the complex syndetic index for this information. Needless to say,
the advertisement-related index must be also created.
3. Store these indices and corresponding contents in the distributed servers.
4. When users are browsing website and searching or viewing e-mails, Google
will arrange their requests to go through a complex transla... (more)
The data computation layer in between the data persistent layer and the
application layer is responsible for computing the data from data persistence
layer, and returning the result to the application layer. The data
computation layer of Java aims to reduce the coupling between these two
layers and shift the computational workload from them. The typical
computation layer is characterized with below features:
Ability to compute on the data from arbitrary data persistence layers, not
only databases, but also the non-database Excel, Txt, or XML files. Of all
these computations, the... (more)
In Java development, we may encounter the complex set operations. Java alone
is not powerful enough to save programmers' efforts in implementing the
computation details, which is time-consuming and poor in code reuse. In view
of this, programmers usually resort to dynamic calculation script for set
SQL is surely the first kind of script that comes into most programmers'
mind. However, to their disappointments, SQL does not support the explicit
set, and is unable to represent the sets of a set, ordered set, generic set,
and only the result set can be recognized as a set... (more)
In Java development, the typical data computation problems are characterized
Long computation procedure requiring a great deal of debugging Data may from
database, or Excel/Txt Data may from multiple databases, instead of just one.
Some computation goals are complex, such as relative position computation,
and set-related computation
Just suppose a sales department needs to make statistics on the top 3
outstanding salesmen ranking by their monthly sales in every month from Jan
to the previous month, based on the order data.
Java alone is difficult to handle such computations... (more)
What is IOE? I=IBM, O=Oracle, and E=EMC. They represent the typical high-end
database and data warehouse architecture. The high-end servers include HP,
IBM, and Fujitsu, the high-end database software includes Teradata, Oracle,
Greenplum; the high-end storages include EMC, Violin, and Fusion-io.
In the past, such typical high performance database architecture is the
preference of large and middle sized organizations. They can run stably with
superior performance, and became popular when the informatization degree was
not so high and the enterprise application was simple. With the ... (more)