The trend seems to be going towards multiple decoder complexes. Recent designs from AMD and Intel do this.
It makes sense to me: if the distance between branches is small, a 10-wide decode may be wasted anyway. Better to decode multiple basic blocks in parallel
It makes sense to me: if the distance between branches is small, a 10-wide decode may be wasted anyway. Better to decode multiple basic blocks in parallel