architectural patterns – Why not just use stream processing for everything?

Apache Kafka has become a standard for event-driven architectures, specifically stream processing. This has always been contrasted to batch processing, whether that’s traditional ETL or something like ML training.

Many architectures show hybrid implementations that support both batch and streaming. But wouldn’t it make sense to just adopt a technology like Kafka for all data ingestion needs? Everything could be streamed in, but that doesn’t mean everything has to be processed in real time. Kafka can hold onto data for as long as needed, and is really just a type of distributed database; not just a message queue.

Why not simply use Kafka as a “central nervous system” to the entire architecture, with all data sources publishing to Kafka, and all consuming applications subscribing to Kafka topics? Any batch processing can just be a separate service that grabs data from Kafka when needed.

Does anyone do this, or is streaming always added on as a second part of a hybrid architecture?

youtube – Why does my live stream have more Peak Concurrent Views than Unique Views?

I’m looking at the analytics for a live stream I ran in the past to get an idea of how many people watched it. I see some confusing data in the report.

According to the report, on the day of the stream, I had 21 Unique Viewers. However, my Peak Concurrent Viewers stat says that at one point, my stream had 33 simultaneous viewers.

enter image description here

enter image description here

It seems to me that Uniqie Viewers should be at least 33 since I had that many viewers at one specific moment in time. It might be possible that some viewers were watching on multiple devices simultaneously, but I doubt that explains what 12 of the 33 viewers were doing.

How do I make sense of this apparent discrepancy in my analytics report?

design – How should I handle dealing with a stream of incoming data?

I’m creating a NodeJS application that receives a ton of incoming financial data (prices) through Websockets, like anywhere from 1 to 5 data points per second which I would then like to send to the front end (which is built with React) as the data comes/changes. I will also have functions internally that simulate buy/sell orders which depend on the current price as well.

Originally, I was planning on using a REST API and making 1 GET request every second and using memcached to store the current price. Then, each second, I would send the current price stored in memcached to the front end through Websockets and if someone creates a buy/sell order, I would grab the current price at the top of the function and use that as the price. I’m not planning on saving the data right now but if it comes to it, I was planning on either adding it to MongoDB (Or a better DB) as the data came, or storing it in cache and batch inserting the data every minute or so.

Now that I’ve switched over from a REST API to a Websocket method, I’m wondering if my idea is still fine. When I connected the Websocket and started to log the incoming data, I was baffled at how fast it was coming which made me doubt if this could even work. Like memcached will be updated and accessed so frequently, would it just be better to hold a global variable called “Price” and update that as the data comes? I have absolutely no experience in designing a system like this and I’m doing this as a personal project to learn.

Thank you in advance!

Making Youtube playlist a continuous stream

I have several Youtube playlists from a whole LP, for example Pink Floyd’s The Wall. The problem is, it takes a moment from one video to another, sometimes making a cut in a continuous stream. This make me wonder if there is anyway to reduce the load time, or some kind of in-advance load of the next video? I have found that the Youtube Music app does this pretty well, and I want to enjoy the same experience on the desktop.

linux – ¿Cómo resolver el error Cannot decode byte ‘xfa’: Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream” en Pandoc?

Estoy intentando obtener el contenido HTML de un sitio web en concreto mediante Pandoc, con el comando siguiente:

pandoc -s -r html -o c757.html

Yo he usado ese mismo comando en otros sitios webs obteniendo sin problema el contenido en un archivo local. Pero en este caso concreto no funciona, estoy teniendo el error:

pandoc: Cannot decode byte ‘xfa’:
Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

También he intentado usar iconv como sugiere la documentación, con algo así:

iconv -t utf-8 | pandoc -s -r html -o c757.html | iconv -f utf-8

Y, viendo que esa página web está codificada en iso-8859-1 intento algo así:

iconv -t iso-8859-1 | pandoc -s -r html -o c757.html | iconv -f utf-8

En ningún caso me funciona, obtengo siempre el mismo error.


  • SO: Ubuntu 18.04
  • Pandoc:

Usando Pandoc desde la terminal

project management – Revenue stream options of a closed-source software distributed free of charge

Assume software is distributed free of charge. It’s closed-source but free. It has a good user base. These revenue streams can be imagined:

  1. Customization requests and subsequent contracts
  2. Locking some future features and unlocking them only for a subscription fee
  3. …?

What are the other options? Are there any more? Anything? Any idea? Any reference which I can study? Thanks :]


Mentioned by @Ewan :

  1. Advertisements
  2. Selling anonymous user data collected by the user consent
  3. …?

firebase – Check authentication state using stream not working in Flutter

I’m using GoogleSignIn.

On my splash screen, I’m using StreamBuilder to check whether the user is logged in or not, and based on that data I’m trying to show the Login Screen or Home Screen.

When I uninstall and reinstall the app it shows the Login Screen for the first time. And then after it always shows me the Home Screen even if I logged out from the app.

Below is my code for Splash Screen:

class SplashScreen extends StatefulWidget {
  _SplashScreenState createState() => _SplashScreenState();

class _SplashScreenState extends State<SplashScreen> {
  Widget build(BuildContext context) {
    return StreamBuilder(
      stream: FirebaseAuth.instance.authStateChanges(),
      builder: (BuildContext context, AsyncSnapshot<User> snapshot) {
        if (snapshot.hasData) {
          return AnimatedSplashScreen(
            splashIconSize: SizeConfig.blockSizeVertical * 16,
            splashTransition: SplashTransition.fadeTransition,
            nextScreen: HomeScreen(),
            splash: kLogoImage,
            duration: 800,
        } else {
          return AnimatedSplashScreen(
            splashIconSize: SizeConfig.blockSizeVertical * 16,
            splashTransition: SplashTransition.fadeTransition,
            nextScreen: LoginScreen(),
            splash: kLogoImage,
            duration: 800,

Below is my code for Home Screen:

class HomeScreen extends StatefulWidget {
  _HomeScreenState createState() => _HomeScreenState();

class _HomeScreenState extends State<HomeScreen> {
  final GoogleSignIn googleSignIn = GoogleSignIn();
  User firebaseUser;

  signOut() async {
    await googleSignIn.signOut();
    Navigator.pushReplacementNamed(context, MyRoutes.loginScreen);

  Widget build(BuildContext context) {
    return Scaffold(
      body: Column(
        children: (
            child: Text('Hello'),
              child: Text('Log out'),
              onPressed: () {

algorithms – Time complexity of finding median in data stream

I was reading a solution to the problem in the title on leetcode and the article says that the time complexity of the following solution is O(n)

  1. setup a data structure to hold stream value and insert new element in the right place using linear search or binary search
  2. return median

my solution is as follows:

class MedianFinder:

    def __init__(self):
        initialize your data structure here.
        self.arr = ()

    def addNum(self, num: int) -> None:
        idx = bisect.bisect_left(self.arr, num)
        self.arr.insert(idx, num)

    def findMedian(self) -> float:
        # self.arr.sort()
        if len(self.arr) % 2 != 0:
            return self.arr(len(self.arr)//2)
            return (self.arr(len(self.arr)//2 -1) + self.arr(len(self.arr)//2 ))/2

My question is about the time complexity of the push method. the binary search will take O(log n) to find the index. the insert will take O(n). but since the method will be called on the stream, will the complexity be O(n^2) or O(n) ?

enter image description here

video streaming – How can I efficiently live stream from multiple cameras to individual streams?

I have a base of low-tech volunteers out there that are wanting to live stream events – simple stuff. They’ve worked out OBS and a Go Pro to stream to YouTube, no worries.

We want to have multiple simultaneous streams – multiple cameras connected to one laptop that provide coverage of different parts of the event. NOT multiple cameras feeding one stream, not one camera to multiple platforms, multiple cameras to multiple streams on the one platform.

The obvious basic solution is to have each camera on its own laptop (or run multiple instances of OBS) with their own stream. This is not an option.

On the other side of the equation I want to embed viewers into our website, I’m hoping the solution to the above will allow me to simply put an html5 player in for each stream to have multiple on the page.

Bonus segment: if I can get it to come to the website via a mixer (so I can overlay data from our database on the stream easily) I’d be very happy.

Google has a bajillion hits on streaming to multiple platforms from one camera, or using multiple cameras in an OBS scene, or reviews on streaming services like Netflix (because that’s obviously the same thing, right…). I found one or two articles about ffmpeg that were highly technical and way beyond the ability of the people at an event. Has anyone succeeded in achieving what I’m after? I really don’t care who the intermediate streaming service is as long as I can embed the stream on our site.