CTWC Stream Transparency/Cropping effect

Thread in 'Research & Development' started by Brandon Arnold, 18 Oct 2015.

  1. Hi all,

    I wonder how they set the NES Tetris scene up in the Tetris Championship stream. Here's an example of it on their practice stream video. They cropped out the playfield and stats, and overlaid them on the face cam with a transparency effect. Any idea what kind of production software can be used for that?

  2. If I'm not mistaken, they have a pretty powerful algorithm made by one of the people (Trey) that recognizes the playfield. This way it can actually display any kind of info with the best possible quality, because they know all the necessary infos in the game. They could even change the shape of the blocks and whatnot, but they decided to make it classic, because they're the classic TWC I suppose ^^'
    wasmachstdugern likes this.
  3. Huh. I wonder if The Tetris Company is okay with people doing that. I mean, normal people, not the CTWC crew...

  4. Curious how they do it - not so much the algorithm itself, that stuff seems pretty simple to me, but the actual implementation.
    Is it a plugin to OBS, or are they just running an external program that's actually able to read the captured image?
  5. t2k


    Hey guys, I developed the software used to produce the CTWC 2015 stream - glad to see the appreciation for it here.

    It was built using a modular/node-based programming platform called "Salvation", which I have been developing for the past 15 years (in C++). In that time I've built up a large collection of "modules" in Salvation which handle things like live video capture, image loading, video file loading, output to OpenGL displays, processing with OpenGL GLSL shaders, and a ton of math/control value manipulations. Salvation is also the core/engine behind the "Ai" media server sold by Avolites Media out of the UK - http://www.avolites.com/products/video/media-servers/ex8

    All of the additions specific to processing live video from NES Tetris were added in the months leading up to the competition, in the form of a handful of new modules which deal with NES text recognition, game field recognition, and the next piece box. There is no direct support in Salvation for streaming to twitch/hitbox/etc, so for the stream tests I ran OBS on the same machine and did a window capture of the Salvation output. At the CTWC event, we output our display over HDMI at 1280x720. That was split 3 ways: 1 to the projector, 1 to a monitor for the commentators, and 1 to a machine provided by Hitbox, which captured the HDMI signal and streamed it to their servers using XSplit.

    There are no pixels from the NES that end up in the final composited output - after analysis, all of the graphics and text are regenerated/drawn in OpenGL at varying sizes, depending on the layout (all 4 players or 1 on 1).

    In the planning of the tournament we did submit screenshots and demo videos of the layout to Blue Planet and got their approval for it.

    The long bar counter / drought counter was the last thing to be completed and was added during the day on Saturday (1 day before the bracketed tournament on Sunday).

    I could go into way more detail but I don't know how much you guys want to know =) if you have more questions feel free to ask.
    CPN, Muf and Qlex like this.
  6. Love it! Nice work. Thanks for the info. This is a lot of the same stuff I'm getting started on for TGM. Salvation is pretty much all proprietary, which is the platform your Tetris imaging analysis stuff is written in, right?

    I'm working on a way to record moves and both display them and peform analysis. My prelim research is to use C++ libraries libVLC or libUVC for capturing the frames (from a UVC stream or a video) and CImg for analyzing the video frames. I've also got a good start on a TGM engine in Scala to calculate the best next move, but I need data to work with, hence the first step of being able to record the moves.

    Anything you can give me to help?

  7. t2k


    Yeah it's all proprietary. I'm not familiar with the libraries you mentioned but any open source thing that simplifies the process of accessing live frames is a good idea. In my software (which runs primarily on Windows) I use DirectShow for live video input, which works well enough with the capture thingy I used (Hauppauge Live USB 2) although I did have to add a bit of code to do annoying crap in DirectShow 'graph' land like adding crossbars.. or something, I thankfully don't remember that well hah.

    I don't have any code I can share with you but once you get the capture and display stuff going the analysis should be much more interesting to write. The nice thing about NES Tetris is that the game is on a black background. TGM I believe has various backgrounds and extracting the game field from the background there may be more complicated.. I'm able to look at a single pixel per block and figure out which block type it is (there are 3 possible blocks in NES Tetris, and the colors change with each level). I imagine you'd probably need to scan a bit more than 1 pixel to be sure you're actually detecting a block, and not falsely detecting something from the background. Looking forward to see what you come up with, post videos of your progress somewhere =)
  8. zid


    As I mentioned to steadshot, for TGM i'd probably be easiest to rescale the image such that each block ends up as 1 pixel in the resulting image. Then the image processing code remains fixed and simple, and the mess and guts live in the scaler code.
  9. t2k


    That seems like it should work if the colors you are matching against vary by an amount that is greater than the error that will be introduced when the edges of neighboring blocks get blurred together during the rescale process. If I were writing this, my gut tells me to do it the same way I did the NES analyzer, but read a chunk of pixels at the block center and take the average value, rather than reading just a single pixel.
  10. zid


    But you are reading a chunk of pixels and taking an average, that's what the rescaler does. Except now you can use any rescaler method you like, and the rest of the code stays the same, it's just an architectural trick.
  11. t2k


    A rescaler will blend the edges of your blocks together. Go ahead and do it however you want and let me know how it works, I'm just telling you how I'd do it =)
    Qlex likes this.
  12. Thank you for your input. Was about to get to you after CTWC/your maxout video, but I guess I can directly speak here! :D

    As a matter of fact, some other people are working or have worked on a way to capture the field. The TGM1 problem came up, but it's not really problematic since the background is much darker than the actual pieces (with the exception of the dark blue J, sometimes), so it's still possible to do some math here. Also, it is TGM1-specific, TAP and TI don't have backgrounds within the play field, which eases the following problem a bit :

    Some of us are mainly considering converting an invisible Tetris session into an actually visible one. The best shots that we have at it are looking at the piece flashing (this happens right when it gets locked), and looking at the color of the blocks when they clear, among other things.

    One huge problem with looking at pixels : line clearing doesn't have a nice animation at all. In TGM1, the blocks pop out and fall to the bottom. In TAP and TI, the blocks explode and you have pixels corresponding to their colors for a brief moment. In some potentially fast downstacking sequences (like a center well in Shirase/Death) there could be "dead" pixels to look at which would make the recognition very confused. In @steadshot's (pretty good) algorithm here is a threshold that generally takes care of that, but we didn't get into limit cases like the crazy downstacking I talked about. Do you happen to have an idea about cases like these?

    Also, on a related note, I was wondering how you capture text. This one is a tough cookie, because it's pretty difficult to differentiate fast enough, say, an S8 from a S7. A lot of pixels need to be looked at live, and sometimes that even gets conflicted by the line clear pixels. Jago (@K) did a lot of work here, but even then there are limits to the model because of having to know which pixel to make the algorithm look. I can imagine there are less problems in your case, but still they're not very big numbers so they're hard to differentiate. Which solution did you come up with?

    Last question (maybe more to come ;) ) : With the signal you're receiving, is every block located at the exact same pixel position? Do you need to calibrate?

    Thanks in advance for your answers, CTWC was a very entertaining watch and the layout was pretty fantastic :)
  13. Hey there, Qlex--way cool. This is where I think video far exceeds static images. If there is a lot of variation in the past 10 frames of the playfield, you can reject all of them, until you get (say) five frames that have very little variation (during the piece spawn delay).

    I have been thinking of using something like Tesseract for the timer OCR. For bigger text with fewer variations like the rank (S9, m, etc), you can take a monochrome screen grab of those cropped letters and process them as a "convolution" matrix on the real-time feed.

    Also interested to hear Trey's perspective.

  14. Muf


    Moving to R&D.
    Qlex likes this.
  15. The method I've thought up so far (but which I haven't had the time to implement yet) works something like this:
    You keep a history of the last few frames and you also keep track of the level counter and the next piece. As soon as the level counter changes (either a line clear or just the standard increment by 1) you check those last few frames for the first "usable" one (meaning the algorithm detects the piece and doesn't mistake the locking J for background or whatever). This way you avoid the fireworks from the line clear and you're also not forced to look for the lock flash that might not be there at all if the recording dropped these frames somehow. If it's a line clear, ignore the following increment of the level counter, it's just the next piece coming in. Fast combos aren't hard to deal with, either; when you know both the type of line clear (single, double, triple, tetris) and the piece that caused that line clear, you automatically know the placement, because there's only one possibility. A line clear followed by a very fast placement is the only potential problem, but I can think of a few ways to interpolate that placement regardless, e.g. by subtracting the playfield a few frames after the placement (but still before the placement after that, so the explosion will have faded away) from the playfield after the line clear. During level stops you could track changes in the block count.

    As Brandon pointed out this is not particularly difficult, either. My idea is to extract the contours of each digit and then match them with pre-calculated contours (within a certain tolerance) and I think it would easily work fast enough for live streaming, but I'm curious how quick using Tesseract would be.
    Brandon Arnold and Qlex like this.
  16. t2k


    Regarding the fast downstacking question, I think I am not familiar enough with TGM1 to offer any useful advice.. if you point me to a relevant video though I'd take a look.

    Thinking more generally, if there is something characteristic about the piece lock down mechanism, you could use that to maintain your own internal representation of the board, and simply process line clears yourself when you detect that lines have been completed. So rather than scanning the field, you just keep track of every piece placement and maintain your own board. Looking at this video -
    - it seems like detecting that white flash would actually be pretty easy.. if that flash always occurs when a piece is placed, this approach could probably be quite reliable. When you detect that piece placements have lead to a line clear, you could just turn your detection algorithm off for the appropriate number of frames so that the line clear animation doesn't confuse you... update your internal representation of the board, and carry on. For the actual piece-in-play, you would still need to scan the field but that should be easy enough.

    To scan the all of the text out of the NES Tetris screen (lines, score, game level, and all 7 piece counts) I at first tried using the Tesseract library. It was a pain in the ass to get working, and once I did have it "working", it was only accurate some of the time. So, it's not really something I would recommend. I ended up just writing a brute force character recognition algorithm that is only able to read the digits 0-9. It must know the exact rectangle where the digits are, and it must know how many digits it is expecting to find. The characters are 8x8, so there are 64 comparisons done for each character, for each possible character it might be, and each one is scored. So for example the game level which is 2 characters long.. that's 10 possible characters per on-screen character * 2 characters * 64 comparisons = 1280 comparisons. The character that most closely matches is assumed to be a match. I also have a 'confidence' assigned to the winning character so that when the game is on other screens, it understands that even though '1' is the best match against a black background, the score is too low to be considered a good match. This is actually how I know that the game screen is up, by checking the confidence of the game level OCR.

    If TGM1 characters are reasonably low-res and fixed-width, this is probably the best way to go about it. It is possible that you could customize this further to more quickly reject characters that don't match but it seemed to run fast enough even on my 4 year old laptop that I didn't bother to improve it.

    In terms of overall calibration.. the rectangles used to scan/read all of this information were configured ahead of time, and I thankfully did not have to adjust any of them from one NES to the next. When configuring a rectangle for text recognition, I kept an eye on the confidence indicator and nudged the rectangle 1 pixel wider/taller/left/right/etc until it seemed like I'd found the highest confidence when properly scanning the number. Because the pixel aspect ratio of the NES turns squares into slightly wider rectangles, my OCR algorithm actually had to handle un-scaling each character as it was performing the comparison.. but I don't think that's something you'll have to deal with.
    Brandon Arnold and Qlex like this.
  17. Great information, Trey. Thank you. I won't even try the Tesseract route, then--the numbers and ranks are all we're interested in, which even simplifies this a bit.

  18. Yeah, I'm considering using a kd-tree to store the data and optimize the character recognition algorithm, but from what Jago had said it looks pretty slow to try comparing 20x20 pixels. I'll give it a thorough try!

    That was an example of a more extreme case where pixels can be a problem. I thought of looking at neighboring pixels or just taking more pixels into account for the average. Also this is a very extreme case, most of the time it actually is okay :)

    And yeah knowing the state of the field beforehand definitely helps!
    Brandon Arnold likes this.
  19. t2k


    Hmmm interesting video, I see how it goes very fast and would be hard to detect... the lockdown white flash seems detectable but it doesn't actually do that when you do a line clear. Unfortunately I couldn't frame advance through the youtube video to see exactly what's going on during the line clear but if you want to send me a .mov/.mp4/something I can view offline I'll take a look. As far as the character detection goes - I wonder whether you could just scale their font down to 8x8 and do the comparison at a lower resolution? There is probably enough distinctiveness to each character that at 8x8 you'd still be able to tell what's what... in fact even without doing a downscale you could compare at 20x20, but only consider every 3rd column & row...
    Qlex likes this.
  20. I'll try and give you a video when I get home, but you are indeed correct, during a line clear in TAP, it seems that the blocks of the mino that are not concerned by the line clear flash, but the others are used in the line clear sequence and don't flash. In case all the blocks are used (for example a tetris) there is no flash at all. I should also look up what happens in TGM3.

    The only thing that can be useful is the line clear animation : You can distinguish some of the blocks according to their colors (but only one block out of two is used for the animation, it does a little checker pattern and the rest of the blocks simply disappears)

    I'll look it up more in detail. In any case the piece of advice concerning scaling the font can be pretty useful, but I don't know algorithms that can do the job. Are they generally fast?

    EDIT : A video I can give of high quality at the top of my head : https://www.dropbox.com/s/urp7w5vulgsr6eb/sick_death_recovery_gm.flv?dl=0

    ... I don't mean to brag, it was the only Death video I had of good quality ^^'
    Last edited: 6 Nov 2015

Share This Page