group
filter
join
pivot

unique values for each column
min and max for each column (esp date)


g = groupby("country", "type")

g["deficiencies"].sum() / g.count();

port -> field -> aggregate

port -> type -> field -> aggregate


result["china"]["deficiencies"] // returns sum
result.china.deficiencies

// type is a sub-aggregate
result["china"]["oil tanker"]["deficiencies"] // returns sum

result["china"]["type"]



array return

array of reduced values
indices match array of hierarchical categories

result = groupby("country")
result.levels() // => 1 (tiers, dimensions)

result.sum("deficiencies", "detained"); // => [[6, 5, 10, 7], [2, 0, 1, 2]]

result.china.sum("deficiencies", "detained"); // => [6, 2]
result.china.sum("deficiencies"); // => 6

result.china.values("deficiencies") // => [1, 1, 0, 1, 1, 0, 1, 0, 1]

result.groups() // => ["china", "brazil", "new zealand", "korea"]

result.china.groups() // => ["oil tanker", "cargo ship"]


ports = f.groupby("port")

ports.values("country") // => [["china", "china", "china"], ["brazil", "brazil"], "new zealand", "korea"]

ports.distinct("country") // => ["china", "brazil", "new zealand", "korea"]

object return

each call to an aggregation function returns a single value (on a leaf node)

## Priorities

* speed
* simple, intuitive interface
* deployable directly to production
* produce results usable in machine learning applications
* 1M - 10M row data sets (for now)

## Example Tasks

should at least be able to create limited prototypes for each of these

* domain category task (Tailwind)
	data: pin data, with domain and category
	pivot (two dimensional groupby)
		dimensions: domain, category
		groups: distinct values
		membership: equals (if the value equals a distinct value, it is a member)
		reduction: count

* board recommendation task (Tailwind)
	data: pin data, with board name and pin descriptions
		board name, vocabulary item, and occurrences
	pivot
		dimensions: board_name, description
		groups:
			board_name: distinct values
			description: all vocabulary words, non-partitioning (membership in more than one group is allowed)
				membership: contains (if the description contains the vocabulary word, it is a member of the group)
		reduction: count/sum

	* dimensionality reduction
	* direct usage
		create new sparse vectors from source
		similarity on sparse vectors with existing set

	text analysis
		parsing
		tokenization
		lemmatization/stemming
	document vector per board (bag-of-words (sparse?))

* user game matrix creation task (Crunch Magic)
	data: user gaming data, with userid, gameid and hours played
	sculpt: turn JSON into row data
		each entry in games array is expanded into a new row:
			userid, gameid, hours_played
			waylon, skyrim, 72
			waylon, horizon, 50
			janell, stardew, 40 
	pivot:
		dimensions: gameid, userid
		groups: distinct values (sparse)
		reduction: sum of hours played

after framing, we have a dimensionality reduction task, using
Alternating Least Squares (ALS)

* inspection data visualization task (Navis)







## Common Visualizations Data Structure

### Stacked bar chart
Is a display of two dimensionally pivoted data, where one axis (typically x) is one reduction dimension
the other axis is the variable reduced over, and the bars are split into groups by
the second reduction dimension.




## Platform Layers

notebook (webnotebook)
---------
data management (dataship)
---------
analysis (frame and webnn)
---------
deployment
---------
