Discussion:
timeseries is the solution for this?
(too old to reply)
Cesar Inacio Martins
2013-04-06 00:17:17 UTC
Permalink
Hi,

This question kicks my curiosity , timeseries is a good solution for this
kind of situation? Able to "beat" the opensource solutions?

Quoting part of the question:

"Which database could handle storage of billions/trillions of records?"

"We are looking at developing a tool to capture and analyze netflow data,
of which we gather tremendous amounts of. Each day we capture about ~1.4
billion flow records..."

"We would like to be able to do fast searches (less than 10 seconds) on the
data set..."

"The idea is to keep approximately one month of data, which would be ~43.2
billion records. A rough estimate that each record would contain about 480
bytes of data, would equate to ~18.7 terabytes of data in a month, and
maybe three times that with indexes. Eventually we would like to grow the
capacity of this system to store trillions of records."

The complete question and details, follow the link.

http://dba.stackexchange.com/q/38793/16135


And now a question mine.

Innovator-C + timeseries . Anyone have experience using them on real
situation? able to saving million or billions of records?

And I dare to ask :) able to return fast searchs?

Just saying...I don't have experience with timeseries (waiting for a
bootcamp on Brazil), and at this moment I don know how exemplify any real
situation where apply this questions....


Regards

Cesar
Art Kagel
2013-04-07 00:50:51 UTC
Permalink
Well, not Innovator-C, the data volume is prohibative. Advanced Enterprise
Edition with Timeseries and Informix Warehouse Accelerator could handle the
data volume and the query time requirements but you would need lots of
hardware. Memory, processors, and disk. However, FAR less than you would
need with any other system.

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.


On Fri, Apr 5, 2013 at 8:17 PM, Cesar Inacio Martins <
Post by Cesar Inacio Martins
Hi,
This question kicks my curiosity , timeseries is a good solution for this
kind of situation? Able to "beat" the opensource solutions?
"Which database could handle storage of billions/trillions of records?"
"We are looking at developing a tool to capture and analyze netflow data,
of which we gather tremendous amounts of. Each day we capture about ~1.4
billion flow records..."
"We would like to be able to do fast searches (less than 10 seconds) on
the data set..."
"The idea is to keep approximately one month of data, which would be ~43.2
billion records. A rough estimate that each record would contain about 480
bytes of data, would equate to ~18.7 terabytes of data in a month, and
maybe three times that with indexes. Eventually we would like to grow the
capacity of this system to store trillions of records."
The complete question and details, follow the link.
http://dba.stackexchange.com/q/38793/16135
And now a question mine.
Innovator-C + timeseries . Anyone have experience using them on real
situation? able to saving million or billions of records?
And I dare to ask :) able to return fast searchs?
Just saying...I don't have experience with timeseries (waiting for a
bootcamp on Brazil), and at this moment I don know how exemplify any real
situation where apply this questions....
Regards
Cesar
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
JJ
2013-04-08 10:14:09 UTC
Permalink
Hi,

If this is "capturing continuously" netflow traffic then that would equate to:

1,400,000,000 / (24 * 60 * 60) entries a second = 16,204 a second.

Does the "480" bytes include the timestamp?

I can't see a storage limitation under Innovator C, the main issue would be whether a single CPU VP could handle the traffic, and potentially with some of the "bulk loader" stuff now available in 12.10 I would say it is plausible.
Well, not Innovator-C, the data volume is prohibative.  Advanced Enterprise Edition with Timeseries and Informix Warehouse Accelerator could handle the data volume and the query time requirements but you would need lots of hardware.  Memory, processors, and disk.  However, FAR less than you would need with any other system.
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
Hi,
This question kicks my curiosity , timeseries is a good solution for this kind of situation? Able to "beat" the opensource solutions? 
"Which database could handle storage of billions/trillions of records?"
"We are looking at developing a tool to capture and analyze netflow data, of which we gather tremendous amounts of. Each day we capture about ~1.4 billion flow records..."
"We would like to be able to do fast searches (less than 10 seconds) on the data set..."
"The idea is to keep approximately one month of data, which would be ~43.2 billion records. A rough estimate that each record would contain about 480 bytes of data, would equate to ~18.7 terabytes of data in a month, and maybe three times that with indexes. Eventually we would like to grow the capacity of this system to store trillions of records."
The complete question and details, follow the link.
http://dba.stackexchange.com/q/38793/16135
And now a question mine.
Innovator-C + timeseries . Anyone have experience using them on real situation? able to saving million or billions of records?
And I dare to ask :) able to return fast searchs?
Just saying...I don't have experience with timeseries (waiting for a bootcamp on Brazil), and at this moment I don know how exemplify any real situation where apply this questions....
Regards
Cesar
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Art Kagel
2013-04-08 10:31:02 UTC
Permalink
Hmm, I wasn't thinking in terms of Innovator-C not handling the data flow
rate, though, as you point out, with a single CPU VP it's going to be
tight. I was thinking about the likely hood that I-C would be able to
query that kind of data volume in reasonable time given that you can't get
any parallelism out of it. The server will be holding about 700GB per day
of data capture. In a week that's 4TB.

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.
Post by JJ
Hi,
1,400,000,000 / (24 * 60 * 60) entries a second = 16,204 a second.
Does the "480" bytes include the timestamp?
I can't see a storage limitation under Innovator C, the main issue would
be whether a single CPU VP could handle the traffic, and potentially with
some of the "bulk loader" stuff now available in 12.10 I would say it is
plausible.
Post by Art Kagel
Well, not Innovator-C, the data volume is prohibative. Advanced
Enterprise Edition with Timeseries and Informix Warehouse Accelerator could
handle the data volume and the query time requirements but you would need
lots of hardware. Memory, processors, and disk. However, FAR less than
you would need with any other system.
Post by Art Kagel
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.
Post by Art Kagel
On Fri, Apr 5, 2013 at 8:17 PM, Cesar Inacio Martins <
Hi,
This question kicks my curiosity , timeseries is a good solution for
this kind of situation? Able to "beat" the opensource solutions?
Post by Art Kagel
"Which database could handle storage of billions/trillions of records?"
"We are looking at developing a tool to capture and analyze netflow
data, of which we gather tremendous amounts of. Each day we capture about
~1.4 billion flow records..."
Post by Art Kagel
"We would like to be able to do fast searches (less than 10 seconds) on
the data set..."
Post by Art Kagel
"The idea is to keep approximately one month of data, which would be
~43.2 billion records. A rough estimate that each record would contain
about 480 bytes of data, would equate to ~18.7 terabytes of data in a
month, and maybe three times that with indexes. Eventually we would like to
grow the capacity of this system to store trillions of records."
Post by Art Kagel
The complete question and details, follow the link.
http://dba.stackexchange.com/q/38793/16135
And now a question mine.
Innovator-C + timeseries . Anyone have experience using them on real
situation? able to saving million or billions of records?
Post by Art Kagel
And I dare to ask :) able to return fast searchs?
Just saying...I don't have experience with timeseries (waiting for a
bootcamp on Brazil), and at this moment I don know how exemplify any real
situation where apply this questions....
Post by Art Kagel
Regards
Cesar
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Spokey Wheeler Gmail
2013-04-08 10:37:26 UTC
Permalink
Post by JJ
1,400,000,000 / (24 * 60 * 60) entries a second = 16,204 a second.
Does the "480" bytes include the timestamp?
I can't see a storage limitation under Innovator C, the main issue would be whether a single CPU VP could handle the traffic, and potentially with some of the "bulk loader" stuff now available in 12.10 I would say it is plausible.
Well, not Innovator-C, the data volume is prohibative. Advanced Enterprise Edition with Timeseries and Informix Warehouse Accelerator could handle the data volume and the query time requirements but you would need lots of hardware. Memory, processors, and disk. However, FAR less than you would need with any other system.
This question kicks my curiosity , timeseries is a good solution for this kind of situation? Able to "beat" the opensource solutions?
"Which database could handle storage of billions/trillions of records?"
"We are looking at developing a tool to capture and analyze netflow data, of which we gather tremendous amounts of. Each day we capture about ~1.4 billion flow records..."
"We would like to be able to do fast searches (less than 10 seconds) on the data set..."
"The idea is to keep approximately one month of data, which would be ~43.2 billion records. A rough estimate that each record would contain about 480 bytes of data, would equate to ~18.7 terabytes of data in a month, and maybe three times that with indexes. Eventually we would like to grow the capacity of this system to store trillions of records."
The complete question and details, follow the link.
http://dba.stackexchange.com/q/38793/16135
And now a question mine.
Innovator-C + timeseries . Anyone have experience using them on real situation? able to saving million or billions of records?
And I dare to ask :) able to return fast searchs?
Just saying...I don't have experience with timeseries (waiting for a bootcamp on Brazil), and at this moment I don know how exemplify any real situation where apply this questions....
The 12.10 License appears to introduce a restriction of 8GB of storage. So, this falls at the first hurdle.

Apart from that, ingesting the data on good commodity hardware, I would expect that you would not get more than about 7,500-8,000 rows per second in, although if you opted for SSD that might be different.

Finally, a lot depends on how you want to query the data. Aggregating multiple TS into a single virtual TS is not trivial.
Loading...