Rendering Multiple NES Tetris thingos

XaeL · 18 Mar 2020

@Jago / K
Post your progress in here.

Hi, just chiming in here,

Also i've already implemented a NesTris99 implementation:
Here's 16+ players playing over the internet

And here's 16+ games made from just scraping video straight off twitch:

I'm glad that real programmers are doing it properly.

@Jago
For the Nestris89 stuff it does the following
1) Convert from any video source (screen cap, or direct from capture card) to game state in json. You could for example have one laptop with 4 capture cards and then 4 daemons running.
2) Send it off via network
3) Render in Unity3D

Right now Nestris89 can be played on internet or wherever, we've used it for live qualification twice

The only thing left to implement is kick/banning clients, and password protection / rooms.

There are already a few modes:
* Nestris99 (highest score wins, you all start at the same time)
* Qualification (Timed mode where highest score wins)
* Free play (just displays all fields)

Also the server sends a sync signal to clients in Nestris99 mode, so they can hear when to press start. It then queues the current state so that the games actually start simultaneously as despite games starting within an 5-10 second window across the world.

K · 27 Mar 2020

Just saw your comment in the other thread by chance, but i'll not reply there.
Effectively we are working on the same kind of solution.

XaeL said: ↑

It is definitely pretty trivial
Click to expand...

it depend about your goal acuracy. I just took some random example from your videos but i see a lot of artifact that i wouldn't like to see on the solution i'm working on for CTEC :

this is a I falling piece
This is a falling O piece :

furthermore, i don't have "unlimited amount of calculation power" : the openCV python program is currently running on a raspberry Pi 3 hooked to the console. i also send data through network but JSON is just a waste of bandwith. I encoded each frame into a "byte array" lowering down each message to around 80 bytes.
Right now i'm just finishing some stuff on the unity program, then i'll work again on the frame analysis.. there are lot or improvement room, and i aim to get thing to run on a raspberry pi Zero, wich is slower and single core..
My visualisation app is also in unity. but i not put any animation or playfield switching position because the team want things to stay static.

so far here is a POC with 18 games running at the same time on my laptop :

even if it's from the same replay from file sent trough network.. the unity app can maintain 60fps and there are lot of room for optimization.
that screenshot is from last week or something.. now the digits are displayed and a specific "curtain drop" animation added. almost finished the CTEC logo animation too.

K · 27 Mar 2020

Yo,
Just to precise my previous post, don't feel any animosity about it, but just wanted to precise that computervision acuracy isn't "pretty trivial".
I like what you did so far about the online mode. Providing a solution for all the online tournament running atm is really cool.

XaeL · 31 Mar 2020

I didn't get any notification; eugh

That artifacts are due to deinterlacing. You can guess what an interlaced stream does to an average of 5px for guessing block color.
In a tournament setting you can control capture card / deinterlacing. This is run by non Tech savvy internet users so the data source can be non ideal. Garbage in garbage out
The streams that don't do that are obviously either emulator or a properly deinterlaced output.

My field data is also exactly 80 bytes. I send store it as json internally and can output as either json or bitpacked. JSON is great if you're writing a generic pack to pipe to say a web browser or something.
I'm guessing you're also doing 4bits per cell (primary secondary white or black). I was considering further optimization like field diffs or active piece + previous field, but didnt feel it was necessary at this point.
You can further optimize by only tracking active piece and sending field once per piece, scanning score/field/lines once per piece etc. Atm i scan every frame which is expensive since OCR is the most expensive part.

`The OpenCV python program is currently running on a raspberry Pi 3 hooked to the console`

What are you using OpenCV for? I found that it was faster to just have a 256x224 size image and compare pixels. For text detection i did my own rudimentary template matching (same as opencv, i'm assuming its similar speed).
For C like speed if you're doing things in python you can use `numba` which precompiles to native C. If you're literally piping it into and out of OpenCV, then it's already almost native code the entire way.
If you're using OpenCV and grabbing a capture card, then you are all good. My approach supports capturecard/screencapture/windowcapture,
which all have decreasing speed (especially screen/windowcapture where the resoution is way bigger than the 256x224 required, so i have to add an expensive nearest-neighbor downscaling stage)

Running it off a pi zero is definitely ambitious. I can see a nice tournament setup of a cheap capture card + pi zero + wifi card hooked up together, then everything streaming to the unity display server.

As per Tetris Concept tradition i'd say our things are 99% similar to each other (i'm using python/opencv + unity too). It wouldnt be hard for either of us to add the other's functionality.

But my project's pretty dead coz no one uses it. There was a big hit last year when it was released but it fell by the wayside. I think if i added a proper automated ladder that might help.
I'm glad that you have a real tournament that would actually use it. There's a lot of cool tech that you guys do (e.g. custom carts) that people don't really notice.

Future extensions for my project are :
1) emulator support. You can read the data straight out of emulator so you don't have to do OCR which is expensive
2) proper ladder multiplayer support. It only supports qualification/KOTH atm.

For your use case it would be nice to use Unity3d's 3d effects.
I use 3d blocks but with orthographic camera. You can then do cool shots (think TGM 3 Ti intro scene) pans very easily.

You can also add custom animations. for example blocks exploding or such. No one really pays attention during an 20+ field screen anyway, so it would be good to add random effects (say 5-10% of the time) to draw peoples attention, and also make it obvious which players are post transition. I know the TO just want 20 screens statically, but i think automatically determining a "main" game or main two games to make slightly larger when people hit transition would be more interesting to watch. I mean Look at my actual qualifier video that i posted above. What were the things you noticed/didnt notice? was a live scoreboard good, or would it be better in a different screen? Did your eyes just gaze at whoever was transitioning? What could you do to aid it?

Muf · 31 Mar 2020

XaeL said: ↑

For your use case it would be nice to use Unity3d's 3d effects.
I use 3d blocks but with orthographic camera. You can then do cool shots (think TGM 3 Ti intro scene) pans very easily.

You can also add custom animations. for example blocks exploding or such. No one really pays attention during an 20+ field screen anyway, so it would be good to add random effects (say 5-10% of the time) to draw peoples attention, and also make it obvious which players are post transition. I know the TO just want 20 screens statically, but i think automatically determining a "main" game or main two games to make slightly larger when people hit transition would be more interesting to watch. I mean Look at my actual qualifier video that i posted above. What were the things you noticed/didnt notice? was a live scoreboard good, or would it be better in a different screen? Did your eyes just gaze at whoever was transitioning? What could you do to aid it?
Click to expand...

I strongly think that 3D effects will just detract from the retro aesthetic. What's the last NES game that had 3D effects? I can't remember either.

As for keeping attention, I don't think adding visual fireworks (literally) will make the presentation more clear to the viewer. I think it will just make it more chaotic with so many fields in view already. I think there are two subtle effects that could work:
- Flash the level box with the NES level up sound when transition occurs
- Flash the score box with some other retro sound effect when a new high score occurs

Other than that, you have to remember that a real tournament setting is very different from just a demo video showing random names. Every player has their own story and people have favourites and players they want to see succeed, so they will gravitate towards watching those players, while ignoring names they don't recognise. I think having 8 fields on screen is the sweet spot: for CTEC we are planning to have 8 qualifying stations that are level 18 only, and only show those on stream. The high level players will be playing on those stations, and because of the speed, stations will cycle players more often, and it's more likely exciting games will be shown.

XaeL · 5 Apr 2020

Just a few things on OCR optimization (from my own github discussions):

Level 1 optimization
* I'm guessing you don't scan piece distribution.
* OCR -> text is the most expensive thing.
* You can ocr score only once per piece placed. (probably need to scan all 6, or you can use math to figure out how many digits based on previous score and current level)
* You can ocr lines only if score changed
* You can skip OCR of all the digits in lines, just get last digit on line clear
* You can skip OCR of level; just increment every 10 lines

This cuts down OCR'ing 3+6+2 (11) digits ->
6 digits on every piece
7 digits on every line clear
7 digits on every level up.

Level 2 optimization
Then more optimization, you can change the order:
1) Scan last digit of lines on every piece placed, and the last 2 digits of score.
2) if lines++, you can calculate all the score digits except for the last two (softdrop)
This means you only have to scan 1 digit of lines on every piece, and 2 digits of score
this is 3 digits per piece, instead of 11 digits/frame.

For piece distribution, i currently have two methods:
1) scan all digits every frame (expensive, 21 digits)
2) scan top of field (assume 0 dropped frames) to determine piece that spawned (checking 4-6 cells for non-black)

For "piece has been placed" i have one method:
scan a 2x4 grid of cells every frame.
If they were all blank last frame, and are now not blank, its a new piece.

Let me know your thoughts (i haven't implemented half the stuff, because performance was acceptable. You're using a RaspPi though so hopefully these help )

K · 9 Apr 2020

XaeL said: ↑

My field data is also exactly 80 bytes. I send store it as json internally and can output as either json or bitpacked. JSON is great if you're writing a generic pack to pipe to say a web browser or something.
I'm guessing you're also doing 4bits per cell (primary secondary white or black). I was considering further optimization like field diffs or active piece + previous field, but didnt feel it was necessary at this point.
You can further optimize by only tracking active piece and sending field once per piece, scanning score/field/lines once per piece etc. Atm i scan every frame which is expensive since OCR is the most expensive part.
Click to expand...

my full frame message is 80bits (21 digits, next and playfield contents), actually i use 2bits per block.. i don't track the active piece too. it's actually time processing and so far it's too much process intensive for python.

XaeL said: ↑

What are you using OpenCV for? I found that it was faster to just have a 256x224 size image and compare pixels. For text detection i did my own rudimentary template matching (same as opencv, i'm assuming its similar speed).
For C like speed if you're doing things in python you can use `numba` which precompiles to native C. If you're literally piping it into and out of OpenCV, then it's already almost native code the entire way.
If you're using OpenCV and grabbing a capture card, then you are all good. My approach supports capturecard/screencapture/windowcapture,
which all have decreasing speed (especially screen/windowcapture where the resoution is way bigger than the 256x224 required, so i have to add an expensive nearest-neighbor downscaling stage)
Click to expand...

i though i could use openCV to do all those sexy neural network thingy... but it's so expensive that i mostly ended up using matchtemplating.
i didn't know about numpa, i'll check about it because i'm extensively manipulating numpy arrays

As you can read they are no room for creativity. Things have to stay as close as the original to not confuse people.

so far my OCR program don't have much optimization : i wanted it to be able to transcribe the content of any frame image into a fully human readable form without context (previous frame history). It's important to separate the concern of work in the pipeline process. The server at the other side of the pipeline can handle the game logic display coherence when artifact happen. That's why if you write a good OCR program, to some extent it should be able to grab any source (capture card / emulaion windows).

i'm not gonna describe all the optimisation you can do but for instance if you track only the last digit of the distribution stats, 7 digits, you can then determine the frame change occured :
- what is the current active block that spawned
- any score, top, level and lines digits changes

if you know the active block shape that is suppose to "interact" in the playfield you know that all other block in the stack aren't supposed to "change" or glitch. The game behavior is pretty simple so you can correct lof of anormal input provided by the OCR..

another example, you can detect the line clear tetris animation by analyzing the playfield content over time, instead of relying on the "flashing screen"..

XaeL · 10 Apr 2020

Right, spot on about those optimizations. Its a different set for a different set of tradeoffs.
my particular set reduces OCR cost to 3 digits; at the cost of assuming there are no dropped frames and that Field data is always correct. This gives you the lowest possible runtime.

Your set (7 distribution + 7 score digits) is 14 digits, but the benefit is you can calculate active piece with 100% accuracy even if the spawn frame(s) are dropped

I'm of course assuming that scanning the field is essentially 0 cost, and the cost of converting a 7x7 pixel array to a digit is expensive (due to template matching).

But as you said, both of our ocr programs don't have much optimization.

K said: ↑

That's why if you write a good OCR program, to some extent it should be able to grab any source (capture card / emulaion windows).
Click to expand...

Right; my current ocr program can grab anything; OpenCV/Screencap/Video. As long as its de-interlaced i haven't had any problems. It supports multiple modes for detecting piece stats/field/stats; so you can choose which tradeoff you want.

@Jago, which capture card did you use for your rasp-pis? linux support for capture cards is quite sparse and even sparser for rasp-pi?

K · 10 Apr 2020

sorry, i didn't really explained the OCR part, but actually there are a lot of optimization on the digit recognition because it take a lot of processing time.
i was using a first a tree with pixel comparison :

but it's not reliable when you have to deal with inconsistant source. Even at the main event, we had to deal with signal problems (even with a pixel perfect capture card) and clutter PC configurations..
the capture device we are using is a basic UVC composite to USB :

they are rather cheap but the downside is 30fps at best..
with that kind of quality :

as you can see the white digit are rather stable, but the red digits from statistics are very inconsistant due to composite artifacts.. you can't simply rely on a tree to "read" the digits. So that's why i ended up using the matchtemplate() function extensively in openCV. Even with that it's a huge bottleneck that required "tricks". but in the end i'm able to read all informations in one pass : 35 digits + Next + playfield content.
I had also a lot of trouble with the level color pattern because you can't rely on the console output palette. So i had to work on a dynamic way to determine the blocs color instead of a corresponding table. all together It took me months to get something out reliable on a rasp3.
Again, the goal was to completely separate what the OCR "see", from what the server analyse and correct in the pipeline.
Now back to optimization of the OCR part, of course for rasp0 i'll drop all unecessary digits and work with a 1 frame history to avoid unecessary match templating calculation.
i can surely lower the matchtemplate function calls to around 10% of current implementation.

XaeL · 10 Apr 2020

K said: ↑

sorry, i didn't really explained the OCR part, but actually there are a lot of optimization on the digit recognition because it take a lot of processing time.
i was using a first a tree with pixel comparison :
View attachment 1817
Click to expand...

Clever tree, but i was about to say you can't rely on single pixel accuracy for non white pixels, and even white pixels can be sometimes eh. You did say that next. Another point
is if you're using pure python you can further improve by wrapping it in numba to get pure C performance. Most of the speedup is 5 pixel instead of 490 pixel check ;
but without numba it might seem like performance difference is closer to 50 pixel check vs 490 pixel check.

but it's not reliable when you have to deal with inconsistant source. Even at the main event, we had to deal with signal problems (even with a pixel perfect capture card) and clutter PC configurations..
the capture device we are using is a basic UVC composite to USB :
View attachment 1818
they are rather cheap but the downside is 30fps at best.
Click to expand...

These are the standard "EZCap" style capture cards. they are a lottery but most of them can do 60p; you get 30i and de-interlace to 60p.
Quite a few classic tetris members have that card, and are able to get 60p.

with that kind of quality :
View attachment 1819
as you can see the white digit are rather stable, but the red digits from statistics are very inconsistant due to composite artifacts.. you can't simply rely on a tree to "read" the digits. So that's why i ended up using the matchtemplate() function extensively in openCV. Even with that it's a huge bottleneck that required "tricks". but in the end i'm able to read all informations in one pass : 35 digits + Next + playfield content.
Click to expand...

I also use my own matchtemplate for digits, but i dont matchtemplate the red numbers, since there's just too many hacks required. scanning 8 squares at top of playfield and determining active piece was how i got around it.
If you're using custom roms anyway, just make piece stats white. Red always comes out trash in composite.

I had also a lot of trouble with the level color pattern because you can't rely on the console output palette. So i had to work on a dynamic way to determine the blocs color instead of a corresponding table.
Click to expand...

The way i did it was part of my calibration is to get colors from the stats window. Then when i convert from field -> color, i match against [white, black, color1, color2] -> [0,1,2,3]. That's why i use 2 bits per cell.
You can see in the calibrator that the J and Z pieces get checked. Preview is done pixel perfectly with just a 3 pixel lookup (look at the three red dots)

all together It took me months to get something out reliable on a rasp3.
Click to expand...

It took me around a week to get something reliable on PC, and after that most of my efforts has been to get calibration working within a few clicks.
I think my existing library would fail spectacularly on rasppi, but i am now moving from a "throw more cores at it" approach to smart, single-threaded scanning,
due to quite a bit of collaboration with Yobi9, who uses a macbook; and macs don't support multiprocess properly.

Again, the goal was to completely separate what the OCR "see", from what the server analyse and correct in the pipeline.
Click to expand...

I do all corrections on the OCR side. For e.g, you can do stuff like sending level 30+ on the OCR side instead of sending 00, 0A, etc to the master and getting the master
to decode it. Originally my design was like yours; OCR tells master what it sees, and the master has to do corrections. But in the end i found it was much easier for the master
to assume perfect input and the OCR to do correction, since if you needed an extra piece of information for correction, the OCR program could be easily modified to attain that information.
Some things that master used to do and then i pushed into OCR:
* Level past 30
* GameID / new game detection
* LineClear is occuring (haven't done yet, but will eventually)
* if Lineclear, how many are being cleared
* Gamestate (menu, rocket, etc etc)

Now back to optimization of the OCR part, of course for rasp0 i'll drop all unecessary digits and work with a 1 frame history to avoid unecessary match templating calculation.
i can surely lower the matchtemplate function calls to around 10% of current implementation.
Click to expand...

Atm there I have lots of config options for either 1 thread, pipelined-split work, vs multithread non-shared information. I think once i actually work on it, a single thread pipelined approach
will be fast enough that multi-thread will go away. By having history/state inside your OCR for optimization, do you agree that now you're pushing cleanup partially into OCR?
This is what i mean by at the beginning, i just sent "what it saw" to the master and master did cleanup, and now its approaching more and more the OCR cleans it up fully.

Some state things that help decrease work:
previous score -> lets you know which digits to scan
previous lines -> lets you know which digits to scan
Level -> you never have to scan again after the start
previous piece distribution -> you can correct 7 vs 1 template match errors, since you can't move from xx0 to xx7. In my case i scan top of field and Completely ignore scanning stats window.

Muf · 10 Apr 2020

XaeL said: ↑

These are the standard "EZCap" style capture cards. they are a lottery but most of them can do 60p; you get 30i and de-interlace to 60p.
Quite a few classic tetris members have that card, and are able to get 60p.
Click to expand...

That is incorrect. The case moulding used is identical to the "EZCap" dongles, but the insides are very different. Basically, a cheap composite to BT656 decoder chip jerry rigged to a cheap USB webcam ASIC. These dongles report as an UVC (Universal Video Class - like USB mass storage but for video) device which doesn't require any drivers and is very limited in functionality. There is some rudimentary deinterlacing and scaling into a fixed frame buffer (limitation of the webcam chip) and update rate is never more than 30fps. I chose these devices because of their extremely low cost ($5 on Aliexpress) combined with not needing any drivers, so being able to be used on ARM-based Linux systems.

You have to remember that for a competition like CTEC, you need at least 16 or more computer vision processing nodes (one for each qualifying station), which balloons costs. You need a Raspi ($35 if we can't get it optimised to run on a Zero, $5 if we do), power supply, capture dongle, etc. Then multiply that cost number by 16.

XaeL · 10 Apr 2020

Muf said: ↑

That is incorrect. The case moulding used is identical to the "EZCap" dongles, but the insides are very different. Basically, a cheap composite to BT656 decoder chip jerry rigged to a cheap USB webcam ASIC. These dongles report as an UVC (Universal Video Class - like USB mass storage but for video) device which doesn't require any drivers and is very limited in functionality.
Click to expand...

Thanks for the clarification

There is some rudimentary deinterlacing and scaling into a fixed frame buffer (limitation of the webcam chip) and update rate is never more than 30fps.
Click to expand...

Are you able to disable deinterlacing? The ezcaps i've seen in the wild have some ability in windows at least to disable deinterlacing, then do interlacing in OpenCV/OBS/whatever. I see that i am mistaken in assuming you have an ezcap. I assumed you paid $12-15 for a real ezcap. War of the clones of course
It might be a different story in linux of course. If that's the case then you could get the 60p.
If not, then yes i completely agree that 30p is the limit. I think a $5 30p vs a $17 30i -> 60p is a fair tradeoff in terms of cost.

I chose these devices because of their extremely low cost ($5 on Aliexpress) combined with not needing any drivers, so being able to be used on ARM-based Linux systems.
Click to expand...

You have to remember that for a competition like CTEC, you **need** at least 16 or more computer vision processing nodes (one for each qualifying station), which balloons costs. You need a Raspi ($35 if we can't get it optimised to run on a Zero, $5 if we do), power supply, capture dongle, etc. Then multiply that cost number by 16.
Click to expand...

Budget is king; and its not like they streamed all qualifiers before.
Remember that your own time is effective $$$ and you're not(?!) getting paid, so while having 16 streams is a great goal, having 4 nice streams with less headaches and dev time is still better than 0. I too would pick the cheapest possible capture card that ends up de-interlaced, whether it be 20p or 30p; its only qualifiers after all. You can use the nice cards for the nice matches

K · 10 Apr 2020

Good hint about the custom rom, but we can't do that. I also use the same technique to get the color calibration (the J and Z stats pieces)

XaeL said: ↑

By having history/state inside your OCR for optimization, do you agree that now you're pushing cleanup partially into OCR?
Click to expand...

no. i'm not doing any "clean up" into the OCR. i'm just avoiding unecessary digits recognition and unecessary matchTemplate call function. All the game logic "clean up" is still up to the server.

Muf · 11 Apr 2020

XaeL said: ↑

Are you able to disable deinterlacing? The ezcaps i've seen in the wild have some ability in windows at least to disable deinterlacing
Click to expand...

It's not an EZCap. The only things it has in common with an EZCap are its low price and the shell moulding (injection moulds are expensive - reusing someone else's mould is cheap). Have you ever seen a webcam with a "disable deinterlacing" option? Like I said, there is a fixed size frame buffer which does not match the actual capture resolution, it gets scaled, and it needs to be deinterlaced before it gets scaled or else it would become a mess of moire and blended fields. Since it's an UVC device, the only options available are what you would expect from a $5 webcam, so you can adjust the brightness and contrast, I believe.

XaeL said: ↑

Remember that your own time is effective $$$ and you're not(?!) getting paid, so while having 16 streams is a great goal, having 4 nice streams with less headaches and dev time is still better than 0.
Click to expand...

The development time is a fixed up front cost (and as you said, we're not getting paid - so it is "free"), whereas the equipment cost is multiplied every time you want to get more equipment. Making the vision solution work with the absolute shit-tier bottom of the barrel hardware available, also means that it will work with basically anything.

XaeL · 12 Apr 2020

First i would like to apologize if my tone/manner appears like I am not reading your posts thoroughly. I just wanted to double check that it indeed didn't have de-interlacing as an easy setting.
My thoughts on why i wanted a double check are below.

Muf said: ↑

It's not an EZCap. The only things it has in common with an EZCap are its low price and the shell moulding (injection moulds are expensive - reusing someone else's mould is cheap).
Click to expand...

Sorry, i shall call it notezycap henceforth.

Have you ever seen a webcam with a "disable deinterlacing" option?
Click to expand...

No, but I have seen options like brightness, contrast, sharpness, resolution selection/mode selection which may apply to this case. My "UVC 2.0 camera" for my laptop has resolution and [flip camera] <-- look an easy bool,
whereas my previous UVC 2.0 camera did not have [flip camera]. So my assumption about UVC is that the devices gives the OS a list of bools/sliders in a "universal" way, and the OS reads from a camera color buffer
in a "universal" way. Microsoft seems to back me up about my assumption on how UVC works: https://docs.microsoft.com/en-us/windows-hardware/drivers/stream/uvc-camera-implementation-guide, where it says UVC devices can report different capture modes

Where i'm coming from is "just because it's UVC doesn't mean it has no adjustability, maybe you forgot to check the UVC settings and there indeed is an easy way to double your performance"

Since it's an UVC device, the only options available are what you would expect from a $5 webcam, so you can adjust the brightness and contrast, I believe.
Click to expand...

Right, so this is why i keep asking; If you had opened the magic UVC menu, your "I believe" would say "it only has options x". Can you see where I'm coming from?

Like I said, there is a fixed size frame buffer which does not match the actual capture resolution, it gets scaled, and it needs to be deinterlaced before it gets scaled or else it would become a mess of moire and blended fields.
Click to expand...

Right, in a device that does scaling internally, you'd have to de-interlace before you scale. How did you come to know about the whole fixed size buffer thing? Is it by examining the physical chips?

My card for e.g. sends 720x480 -> 176 x 120, but I have no idea what it does its first internal capture at. It could actually capture at the resolution it sends at or it could capture at 720x480 (it's max res) and downscale internally.
For my particular card, since it doesn't de-interlace, so I can determine internal 120p capture resolution by checking scanline thickness in output image to determine if changing resolution-modes affects its internal "capture" resolution.
But if my card only outputted progressive like yours seems to, there'd be no way for me to know the internal methodology, since [720x480i -> deinterlaced -> scaled] to [160p vs 160i-> deinterlaced] would both look not completely messy and there'd be no comparison image to compare against.

The development time is a fixed up front cost (and as you said, we're not getting paid - so it is "free"), whereas the equipment cost is multiplied every time you want to get more equipment.
Click to expand...

Right; but if your plan is to eventually have silky smooth 19 (you're in pal region, where 1f is more common than NTSC 1f), it would be cheaper to buy cards capable of 50i than it would be to buy 25p cards and have to rebuy all your cards in the future,
and just support less screens and slowly accumulate more capacity over time. If the Pi is just too weak to do 50p no matter how much optimization you do, you'd want to know that now, rather than buying 20 notezcap and 20 pi3+, then when people
really want silky 19, you have to sell the hardware at a loss, and keep the software.

The software development time for either hardware solution is kind of equivalent; but the decision to buy 4 nice setups now vs 18 cheap setups now makes a direct impact on:
1) setup time and teething during the actual event (18 things harder to setup than 4 things). Do you want to be the unpaid guy stressing over 4 setups, or the unpaid guy stressing over 18 setups?
2) Crowd wows (people gonna be more wowed by 18 x 25p captures than 4 x 50p captures, this may lead to more audience/funding so you can get 18 nice setups)
3) Marginal dev time. If you have faster CPU then you don't have to optimize as much.

Now i'm all in for the glory; if it was me i would also go 18 setups, and my code and hardware would be immaculate and everything would be pre-plug and play,
and i would setup 4 -5 capture setups per 5 minutes, and the hardware would never fail. And i'm sure you're the same. But if the project is getting too stressful,
and deadlines are looming, there's no shame in only having 4 or 8 or 10 setups.

Making the vision solution work with the absolute shit-tier bottom of the barrel hardware available, also means that it will work with basically anything.
Click to expand...

Alright different topic here, and this ties to the RaspPi vs de-interlacing thing. If your rasp-pi takes 20ms to process each frame, and you have 30p, your execution window is 33ms and it would work perfectly.
But if you upgraded to something that *could* de-interlace, and the RaspPi is doing the de-interlacing, the added time to de-interlace (lets say 5ms) and exact same algoirthm and rasppi would mean you take 25ms to process each frame but the window is now 16.66ms.

Also, optimizations for 60p and 30p could be different. With 60p you can make certain assumptions like you are guaranteed to get a solid frame of the piece in its spawn state, the frame it spawns. This means you can determine piece statistics from just looking at 8 cells/pixels at the top of the field,
which is an optimization that would be different in 30p, where the piece might have been rotated in the missing frame, or gravity has shifted it, or both, so you'd need slower code to handle more cases; and the additional cases might make it slower than just OCR'ing the last digits of piece stats.

Now in your particular case, I don't think you care about piece history; so the algorithm would just check that all 8 cells change from blank to at least 2(?) of them being filled to signify piece spawn. Then you can scan minimal digits from score/lines,
or even queue reading those minimal digits over several frames, since score/lines can only change once every 15~ frames. (I am talking about staggering in the case that the raspbery pi can't even approach scanning 3-4 digits per frame)

Alternatively, you could scan 1 digit from lines every frame and last two digits of score (3 digit ocr every frame vs 0 digit every frame). With this method you no longer care when pieces spawn.
You can't make this decision until you have the pi and can check performance. Using this method scales to 60p fine, assuming that your frame processing time was already fast enough.

Since I don't have the perf. numbers on your rasp-pi OCR, I'm hopeful that your implementation is lightning fast so scalability to better cards isn't a concern.

But the main point is that just because it works for "shit tier bottom hardware" doesn't mean it will still work when you upgrade to 60p.

Muf · 12 Apr 2020

My uncertainty about the exact details about the interface stems from having bought these in 2017, and I haven't really looked at them since. I did a very thorough investigation right when I first got them, and yes, I confirmed how the device works by cracking apart the shell and looking up datasheets for the chips inside. UVC devices can expose custom options like you mention, but the UVC interface talks to the webcam chip, when in fact it's the BT565 decoder chip that is doing the (fixed) scaling. As far as I could tell there is no additional communication between the two chips (this would require more circuitry and increase the BOM), so the decoder chip is just hard configured to output a buffer format that the webcam chip accepts (640x480), and that's all you get. There isn't even a PAL/NTSC switch. My recordings are the raw MJPG stream output by the device, and although it's reported as 60fps (the highest option allowed by the device) it just sends frames as they come in from the video decoder, which ends up at 25fps for a PAL input with the remainder output as dropped (dummy) frames by the capture software.

There were about 3 or 4 other devices I considered before landing on this particular model of capture dongle; I originally intended to use a device that had 4 or 8 composite video inputs, but further investigation and inspection of the circuitry revealed that literally all of those devices have extremely low update rates (in the order of 1 frame per second) and actually alternate between inputs in a round robin fashion. These are ostensibly for video surveillance but with the horrible capture quality combined with the glacial update rate I don't believe it would be useful for any kind of real world security application.

I'm sorry if I came off as impatient in reply to your questions, and in fact I don't care if you call them EZCap or not (I call them "UVC dongle" fwiw), I just meant to drive the point home that any assumptions made about classic EZCap/EasyCap devices don't apply to this as it's a mostly unrelated device. Just yesterday I learned about the plurality of competing Chinese factories as I examined four electric bug zapper rackets that one might assume are all identical: in fact they all had different PCBs and varying components, even when the outward appearance seemed identical.

As for 25/30fps vs 30/60fps capture, the actual ladder competition will be captured and streamed at full rate using the same machine vision software running on PC, so any code common to both capture methods can be shared, while the Raspberry Pi software will be optimised the heaviest. There is also an optional stretch goal for the software to eventually be able to be used for online tournaments like CTP, which necessitates it being more or less universal. There it also helps to have a lowest common denominator hardware that works: it lowers the barrier of entry for players.

XaeL · 12 Apr 2020

Muf said: ↑

which ends up at 25fps for a PAL input with the remainder output as dropped (dummy) frames by the capture software.
Click to expand...

That's a little annoying since you have to spend precious cycles determining dummy frames. But at least it looks like it would work with NTSC too.

There were about 3 or 4 other devices I considered before landing on this particular model of capture dongle; I originally intended to use a device that had 4 or 8 composite video inputs, but further investigation and inspection of the circuitry revealed that literally all of those devices have extremely low update rates (in the order of 1 frame per second) and actually alternate between inputs in a round robin fashion. These are ostensibly for video surveillance but with the horrible capture quality combined with the glacial update rate I don't believe it would be useful for any kind of real world security application.
Click to expand...

Imagine a Enterprise grade problem like security being solved with 1fps cameras :kappa:

I'm sorry if I came off as impatient in reply to your questions, and in fact I don't care if you call them EZCap or not (I call them "UVC dongle" fwiw), I just meant to drive the point home that any assumptions made about classic EZCap/EasyCap devices don't apply to this as it's a mostly unrelated device. Just yesterday I learned about the plurality of competing Chinese factories as I examined four electric bug zapper rackets that one might assume are all identical: in fact they all had different PCBs and varying components, even when the outward appearance seemed identical.
Click to expand...

Yeah my assumption about chinese electronics was that if there were 20 different vendors, there were just 2-3 factories and 18 resellers. It's obviously impossible like you said to find out without doing the buying and cracking open, or at least the buying a sample first and hoping they dont change their processes in the 2 months shipping time.

As for 25/30fps vs 30/60fps capture, the actual ladder competition will be captured and streamed at full rate using the same machine vision software running on PC, so any code common to both capture methods can be shared, while the Raspberry Pi software will be optimised the heaviest.
Click to expand...

Right; i was more talking about if you wanted 50/60fps for the 18 player viewer eventually. Whatever solution you had last year for the matches was (probably 50hz) anyway, and neednt change. You already fully deved it so while its ideal they share the same codebase, if the 2 player capture method already works and is easy enough to setup then using the same codebase is less of a priority. I'm assuming they already use the same codebase but the 2p one is just an older version written last year, and hence the cost of updating is 0 anyway. Did you guys end up using rasppi or pc last year for OCR? i read Rasppi but i might be mistaken?

There is also an optional stretch goal for the software to eventually be able to be used for online tournaments like CTP, which necessitates it being more or less universal. There it also helps to have a lowest common denominator hardware that works: it lowers the barrier of entry for players.
Click to expand...

The main problem with online tournaments is getting the players to use the software and making onboarding as easy as possible.
I've tried my best on Nestris99, and around 20 players have used it successfully. I think the "best" case was CTM staff doing live qualifiers using it.
But you get things like you saw at the top like people not deinterlacing etc.

One way i've seen some people do it is people restream as normal (to twitch) and the host does the OCR on the streams. This is way easier for participants but also way harder for the host since the host has to deal stuff like 400kbps images that blur everytime a tetris flash occurs.[/quote][/quote]

Muf · 13 Apr 2020

XaeL said: ↑

That's a little annoying since you have to spend precious cycles determining dummy frames. But at least it looks like it would work with NTSC too.
Click to expand...

Nah, the dummy frames are inserted by the capture tool (in this case Virtualdub). DirectShow is VFR; that is to say frames come in whenever and have a timestamp attached. It's up to the capture application to deal with that if it wants to write it to a CFR file. I'm assuming video4linux works the same way.

XaeL said: ↑

Right; i was more talking about if you wanted 50/60fps for the 18 player viewer eventually. Whatever solution you had last year for the matches was (probably 50hz) anyway, and neednt change. You already fully deved it so while its ideal they share the same codebase, if the 2 player capture method already works and is easy enough to setup then using the same codebase is less of a priority. I'm assuming they already use the same codebase but the 2p one is just an older version written last year, and hence the cost of updating is 0 anyway. Did you guys end up using rasppi or pc last year for OCR? i read Rasppi but i might be mistaken?
Click to expand...

Last year's solution was written relatively hastily before the event and the OCR is pretty fragile. It requires pixel perfect RGB capture and low noise or else it breaks. Goals for this year are to have a unified architecture where both the qualification and the ladders use the same robust computer vision solution, slightly adapted to different hardware.

XaeL · 13 Apr 2020

Ahh ok thanks for clarification.
Yeah pixel perfect rgb is a great assumption for reducing cpu usage but unfortunately with cheaper hardware the software needs to be robust (i.e. take more cycles) and fast (because peasant cpu). A tough situation to be in for sure; hopefully it works out!

i've got a Pi3 that i'm gonna be testing on eventually anyway

K · 21 Apr 2020

Just a small update on my side. The rasp0 is not powerful enough even to grab each frame from the UVC capture card.
Even without the GUI, the rasp0 grab a frame every ~72ms.
Considering 30fps, give a frame every ~34ms this platform is not viable. I didn't do any calculation at this point...

Rendering Multiple NES Tetris thingos

Share This Page

Useful Searches