AI Research Literature Survey

Research Goals

Overview of our active research interests.

Inference Optimization

Making AI models run faster (down the street).

GenAI Market Evolution

Will AI spawn a revolution or evolution?

Transformer Optimizations

Change your AI engine into a faster Transformer.

AI Phone Home

Do big models really belong on a small phone?

Zero Multiplication

Multiplication has a bad reputation for running slow.

Quantization

Measure once, cut a billion times.

AI PCs

AI is coming to a desktop near you.

Model Pruning

Take the clippers to cut the links down to size.

Code Optimizations

Who knew that AI inference was just coding.

Logarithmic Models

Your AI's band should be called The Logarithmics.

Tokenization

A token of our appreciation for tokenizer research.

Hashing

A quick way to make a real hash of AI models.

Autoregression

AI's are regressing back to their early model-hood.

Embeddings

Getting stuck in the quagmire of embeddings research.

Advanced Mathematics

Abandon all hope, ye who enter here.

Multi-AI

Are the AI engines working together?

Token Pruning

Pruning of the input text tokens.

Layer Pruning

Like cutting a piece of AI cake, except layer-wise.

Head Pruning

When are two heads worse than one?

Length Pruning

The third type of model pruning is length-wise

Knowledge Distillation

Should models percolate or distill their knowledge?

Approximation

Making your AI models approximately intelligent.

FFN Pruning

Your Transformer without any Feeding Forward.

Shallow Decoder

A good AI is a shallow AI.

Norm Pruning

Apparently it is normal to prune.

Width Pruning

AI models are slimmable; who knew?

Matrix Algebra

Peas and carrots go together like matrices and vectors.

Zero Skipping

Everyone knows that zeroes should be skipped.

Early Exit

Your AI might escape by early exiting.

Weight Precomputations

Weights just sit around, so let's precompute.

Inference Loop Optimizations

Going round in circles, inferencing all the way.

Probabilistic Algorithms

Your AI engine will probably be smart.

Bloom Filters

How to make an AI model in full bloom.

Dual Pruning

Pruning left-to-right and up-down, from two dimensions.

Zero Padding

Padding with zeros works, but fastest is zero zero-padding.

Parameter Sharing

The kindest AI engines always share parameters.

Layer Fusion

With so many layers, AI models are very fusing.