case classes

ate my RAM

Grebennikov Roman / findify.io / @public_void_grv

Scalar 2018

About me

twitter: @public_void_grv email: grv@dfdx.me

Product recommendations

  • Large-scale, online
  • Personalized
  • Keep hot dataset in RAM <-- uh oh

thousand users

million users

Recipe for disaster

Large JVM heaps:

  • Cause GC hiccups: latency spikes
  • Operationally more costly

case class Pageview(id: String, time: Long)
					

Quiz


case class Pageview(id: String, time: Long)

Pageview(id = "p123", time = 0L)
					

How much heap does Pageview use?

  • ~24 bytes
  • ~48 bytes
  • ~72 bytes <-- correct!
  • ~96 bytes

dissecting case class

jol: live!

Pageview memory layout

Case class memory footprint

Object header12
String reference4
String object header12
String array reference4
String array header12
String char array size4
Char array4 or 8
millis8

80% overhead

Collections of case classes

case class ::[B](val head: B, var tl: List[B]) extends List[B] {
  override def tail : List[B] = tl
  override def isEmpty: Boolean = false
} 
  • reference to head
  • reference to next node

million pageviews


val pageviews = List.fill(1000000)(_ =>
  Pageview(
    id = Random.nextString(4),
    time = Random.nextLong()
  )
)
					

jamm: track real memory usage on HotSpot

million pageviews

collection of primitives

Welcome to the JVM world








Going native


class Pageview {
  std::string id;
  int64_t ts;
};

int main() {
  std::cout << sizeof(Pageview);
}

Native memory layout

Native vectors

Perks of native memory layout

  • Almost no RAM overhead
  • Better CPU cache usage

Can it be achieved within JVM?

Yes, but

  • Project valhalla: far future
  • ChronicleMap: complex API, ton of boilerplate
  • MapDB: map only, persistent
  • scala-offheap: dead
  • FlatBuffers: code generation

What if?

  • API: like scala collections
  • No boilerplate
  • Acceptable overhead

Scala-packed

Scala-packed

demo: packing pageviews

How it works: serde

  • basic type encoders/decoders
  • shapeless magic for case classes

How it works: case classes

  • case class Pageview(id: String, ts: Long)
  • String :: Long :: HNil
  • Iterate over HList
  • Typeclasses!

Typeclass derivation levels of doom


case class Easy(id: String, ts: Long)

case class Regular(id: String, easy: Easy)

case class Hardcore(id: String, other: List[Hardcore])

sealed trait Nightmare
case class Leaf(value: String) extends Nightmare
case class Node(left: Nightmare, right: Nightmare) extends Nightmare
					

Shapeless LabelledTypeClass

Implement typeclasses for:

  • HNil
  • H, T <: HList
  • CNil (optionally)
  • L, R <: Coproduct (optionally)

Integrating with collections

  • Builder for PackedList
  • CanBuildFrom
  • pain

How it works: lists

  • Allocate a block of Array[Byte]
  • Dump serialized Pageview data
  • Expand buffer as needed
  • Works well* with primitives

List[Int]

demo: list of ints

Maps

Packed* limitations

  • Immutable: in-place element update is hard
  • List/Map semantics is different
  • Not all scala.collection.* methods are optimized
  • Boxing for primitive values

Benchmark: List[Int], RAM

Benchmark: List[Pageview], RAM

Benchmark: List/CPU


Benchmark  listType  Mode  Cnt      Score     Error  Units
filter       Vector  avgt  100  11291.284 ± 117.354  ns/op
filter         List  avgt  100   6801.318 ±  31.750  ns/op
filter   PackedList  avgt  100  14877.129 ± 470.324  ns/op
foreach      Vector  avgt  100   3462.084 ±  79.947  ns/op
foreach        List  avgt  100   3061.652 ±  30.891  ns/op
foreach  PackedList  avgt  100   9579.002 ±  80.827  ns/op
					

Benchmark: Maps/CPU


Benchmark            mapType  Mode  Cnt       Score      Error  Units
build1000                Map  avgt  100  174976.058 ± 4784.830  ns/op
build1000          PackedMap  avgt  100  324695.548 ± 3652.811  ns/op
lookupExisting           Map  avgt  100      48.297 ±    0.267  ns/op
lookupExisting     PackedMap  avgt  100     128.332 ±    1.251  ns/op
lookupNonExisting        Map  avgt  100      40.084 ±    0.808  ns/op
lookupNonExisting  PackedMap  avgt  100      25.322 ±    0.751  ns/op
					

scala-packed and Findify

  • Using in production: not yet
  • Bugs: kek
  • Shapeless =

Links

Questions?

Insults100200300400500
Holy war100200300400500
Algorithms100200300400500
Cargo cult100200300400500
Emotional burn-out100200300400500
Haskell100200300400500
a slide by @nikitonsky