Sierra Ingestion Characterization Verification
Does the following Sierra ingestion characterization accurately represent the product?
As my understanding of this tool expands this bench mark will be grown to encompass my understanding of Sierra.
Initial conditions
* empty memory space * XML data files, each ~20K, * utilizing REST
interface
Ingestion time (defined as query-ready time - start time)
start time is the time when Ingest.java starts using flags
-s xxx -r xxx -h xxx.saffronsierra.com -sc xxx -f /x/x/x -p true
query-ready time is the time when SaffronAdmin shows all jobs as
Passed/Failed (not Working, Paused, Pending) and the ingested data
is ready to be queried by SaffronAnalyst
Characterization1: simple schema ingest
Ingestion times using a XMLParser having a XSLT stylesheet that
normalizes 5 string categories.
Mapping column,rows only.
90% of the data ingestion passing (i.e. 10% bad data).
XML TIME
1 <10 seconds
10 ~30 seconds
150 ~5 minutes 45 seconds
1200
10500
time (minutes) ingests
2:00 40-45
5:00 127
10:00 258
20:00
~26 XML ingests per minute, .5 per second
Characterization2: complex schema ingest
Ingestion times using a XMLParser having a XSLT stylesheet that
normalizes 13 categories and applies RegexExtractor and
WordnetExtractor on 2 text fields and applied Saffron "keyword
re-map" for data enhancment.
Mapping 30 row/col attributes
Having 4 memory attributes
Temporal (Yearly) time-slicing on 2 attributes.
90% of the data ingestion passing (i.e. 10% bad data).
time (minutes) ingests
2:00 55
10:00 274
20:00 365
~27 XML ingests per minute, .5 per second
Comments are currently closed for this discussion. You can start a new one.
2 Posted by Jamie Singer on 09 Feb, 2011 10:33 PM
believe this characterization is contrainted by client-side limitations, i.e. single-threaded http transfers of 12k files
Support Staff 3 Posted by David E. Young on 10 Feb, 2011 01:37 PM
Routed to developer.
4 Posted by Jamie Singer on 11 Feb, 2011 04:57 PM
This characterization inquiry can be closed.
After more experimenting with sierra I realized my numbers were constrained by my client-side application and not the sierra back end.
Support Staff 5 Posted by Yen-Min Huang on 11 Feb, 2011 07:46 PM
close issue
Yen-Min Huang closed this discussion on 11 Feb, 2011 07:46 PM.