python – In Spark can the data be processed by querying database before transforming to RDD


I am very new to Spark.

My System

  1. The data from kafka is consumed in Spark
  2. This data next needs to be processed by querying database to get the relevant formula for manipulation
  3. The thus processed data is then used in getting RDD – mapaggregatereduce… to make the resultant data
  4. The resultant data and processed data are then stored in DB.

My Query

Is it right to do this in Spark? i.e can we query database to get a formula to process the data consumed in Spark

If not, please guide me on how I need to be achieving this.