Sierra Ingestion Characterization Verification

Jamie Singer's Avatar

Jamie Singer

09 Feb, 2011 07:14 PM via web

Does the following Sierra ingestion characterization accurately represent the product?

As my understanding of this tool expands this bench mark will be grown to encompass my understanding of Sierra.

Initial conditions
* empty memory space * XML data files, each ~20K, * utilizing REST interface

Ingestion time (defined as query-ready time - start time)
start time is the time when Ingest.java starts using flags
-s xxx -r xxx -h xxx.saffronsierra.com -sc xxx -f /x/x/x -p true query-ready time is the time when SaffronAdmin shows all jobs as Passed/Failed (not Working, Paused, Pending) and the ingested data is ready to be queried by SaffronAnalyst

Characterization1: simple schema ingest
Ingestion times using a XMLParser having a XSLT stylesheet that normalizes 5 string categories.
Mapping column,rows only.
90% of the data ingestion passing (i.e. 10% bad data).

XML TIME
1 <10 seconds
10 ~30 seconds
150 ~5 minutes 45 seconds
1200
10500

time (minutes)  ingests
2:00        40-45
5:00        127
10:00       258
20:00

~26 XML ingests per minute, .5 per second

Characterization2: complex schema ingest
Ingestion times using a XMLParser having a XSLT stylesheet that normalizes 13 categories and applies RegexExtractor and WordnetExtractor on 2 text fields and applied Saffron "keyword re-map" for data enhancment.
Mapping 30 row/col attributes
Having 4 memory attributes
Temporal (Yearly) time-slicing on 2 attributes.
90% of the data ingestion passing (i.e. 10% bad data).

time (minutes)  ingests
2:00        55
10:00       274
20:00       365

~27 XML ingests per minute, .5 per second

  1. 2 Posted by Jamie Singer on 09 Feb, 2011 10:33 PM

    Jamie Singer's Avatar

    believe this characterization is contrainted by client-side limitations, i.e. single-threaded http transfers of 12k files

  2. Support Staff 3 Posted by David E. Young on 10 Feb, 2011 01:37 PM

    David E. Young's Avatar

    Routed to developer.

  3. 4 Posted by Jamie Singer on 11 Feb, 2011 04:57 PM

    Jamie Singer's Avatar

    This characterization inquiry can be closed.

    After more experimenting with sierra I realized my numbers were constrained by my client-side application and not the sierra back end.

  4. Support Staff 5 Posted by Yen-Min Huang on 11 Feb, 2011 07:46 PM

    Yen-Min Huang's Avatar

    close issue

  5. Yen-Min Huang closed this discussion on 11 Feb, 2011 07:46 PM.

Comments are currently closed for this discussion. You can start a new one.