Aggregate data from a huge list under 50ms [Kotlin or Java]

I got this question as a coding challenge and was unable to get it done under 50 milliseconds (my solution takes >100 ms) 😀

Would you please review my code and share any idea how to do this within 50ms?

Problem Description
One of our customers, a multinational company that manufactures industrial appliances, has an internal system to procure (purchase) all resources the company needs to operate. The procurement is done through the company’s own ERP (Enterprise Resource Planning) system.

A typical business process represented by the ERP system is procure-to-pay, which generally includes the following activities:

create purchase request
request approved
create purchase order
select supplier
receive goods
pay invoice
Whenever the company wants to buy something, they do so through their ERP system.

The company buys many resources, always using their ERP system. Each resource purchase can be considered a case, or single instance of this process. As it happens, the actual as-is process often deviates from the ideal to-be process. Sometimes purchase requests are raised but never get approved, sometimes a supplier is selected but the goods are never received, sometimes it simply takes a long time to complete the process, and so on. We call each unique sequence of activities a variant.

The customer provides us with extracted process data from their existing ERP system. The customer extracted one of their processes for analysis: Procure-to-pay. The logfiles contain three columns:

activity name
case id
We want to analyse and compare process instances (cases) with each other.

Acceptance Criteria

  • Aggregate cases that have the same event execution order and list the
    10 variants with the most cases.
  • As that output is used by other highly interactive components, we
    need to be able to get the query results in well under 50


  • The sample data set is not sorted, please use the timestamp in the
    last column to ensure the correct order.
  • The time required to read the CSV file is not considered part of the
    50 milliseconds specified in the acceptance criteria.

Sample data: the actual file contains 62,000 rows is here

100430035020241420012015;Create purchase order item;2015-05-27 12:44:47.000
100430035020261980012015;Create MM invoice by vendor;2015-07-13 00:00:00.000
100430035020119700012015;Reduce purchase order item net value;2015-02-13 10:24:02.000
100430035020066380012015;Change purchase order item;2015-01-23 09:39:33.000
100430035020232560012015;Change purchase order item;2015-05-11 07:58:29.000
100430031000134820012015;Clear open item;2015-07-28 23:59:59.000
100430035020241250012015;Remove payment block;2015-06-04 16:36:26.000
100430035020193960012015;Enter goods receipt;2015-03-12 20:00:06.000
100430031000151590012015;Clear open item;2015-11-24 23:59:59.000
100430031000129230012015;Post invoice in FI;2015-06-01 12:00:37.000
100430035020228280012015;Create MM invoice by vendor;2015-04-07 00:00:00.000
100430031000113630012015;Clear open item;2015-03-24 23:59:59.000
100430035020260940012015;Enter goods receipt;2015-07-16 15:07:49.000
100430035020244540012015;Create purchase order item;2015-06-02 11:06:11.000

my rejected code

fun main(args: Array<String>) {
    val eventlogRows = CSVReader.readFile("samples/Activity_Log.csv")

    val begin = System.currentTimeMillis()

    val grouped = eventlogRows.groupBy { it.caseId }
    val map = hashMapOf<String, Int>()
    grouped.forEach {
        val toSortedSet = it.value.toSortedSet(compareBy { it.timestamp })
        val hash = toSortedSet.joinToString { it -> it.eventName }
        map(hash) = map(hash) ?: 0 + 1
    val sortedByDescending = map.entries.sortedByDescending { it.value }
    val end = System.currentTimeMillis()

    println(String.format("Duration: %s milliseconds", end - begin))