×¼±¸ÖªÊ¶ ÔÚÔĶÁ±¾ÎÄ֮ǰ£¬ÐèÒª¶ÁÕßÖÁÉÙÁ˽âÒÔÏ»ù´¡ÖªÊ¶£º CPU CacheµÄ»ù±¾¸ÅÄ¾ßÌå¿É²Î¼û¹ØÓÚ¡¶CPU Cache ¨C ³ÌÐòÔ³ÐèÒªÖªµÀµÄÄÇЩÊ¡·¡£ NUMAµÄ»ù±¾¸ÅÄ¾ßÌå¿É²Î¼ûÎÒ²©¿Í֮ǰµÄ¿ÆÆÕ½éÉÜ¡£ ĿǰLinux»ùÓÚ¶àºËCPU·±Ã¦³Ì¶ÈµÄÏ̵߳÷¶È»úÖÆ£¬²Î¿´Chip Multi Processing aware Linux Kernel SchedulerÂÛÎÄ¡£
Ò»¡¢¹ØÓÚHuge Page ÔÚÕýʽ¿ªÊ¼±¾ÎÄ·ÖÎöǰ£¬ÎÒÃÇÏÈ´ó¸Å½éÉÜÏÂHuge PageµÄÀúÊ·±³¾°ºÍʹÓó¡¾°¡£ Á˽âCPU Cache´óÖ¼ܹ¹µÄ»°£¬Ò»¶¨Ìý¹ýTLB Cache¡£LinuxϵͳÖУ¬¶Ô³ÌÐò¿É¼ûµÄ£¬¿ÉʹÓõÄÄÚ´æµØÖ·ÊÇVirtual Address¡£Ã¿¸ö³ÌÐòµÄÄÚ´æµØÖ·¶¼ÊÇ´Ó0¿ªÊ¼µÄ¡£¶øÊµ¼ÊµÄÊý¾Ý·ÃÎÊÊÇҪͨ¹ýPhysical Address½øÐеġ£Òò´Ë£¬Ã¿´ÎÄÚ´æ²Ù×÷£¬CPU¶¼ÐèÒª´Ópage tableÖаÑVirtual Address·Òë³É¶ÔÓ¦µÄPhysical Address£¬ÄÇô¶ÔÓÚ´óÁ¿ÄÚ´æÃܼ¯ÐͳÌÐòÀ´Ëµpage tableµÄ²éÕҾͻá³ÉΪ³ÌÐòµÄÆ¿¾±¡£ ËùÒÔÏÖ´úCPUÖоͳöÏÖÁËTLB(Translation Lookaside Buffer) CacheÓÃÓÚ»º´æÉÙÁ¿ÈȵãÄÚ´æµØÖ·µÄmapping¹ØÏµ¡£È»¶øÓÉÓÚÖÆÔì³É±¾ºÍ¹¤ÒÕµÄÏÞÖÆ£¬ÏìӦʱ¼äÐèÒª¿ØÖÆÔÚCPU Cycle¼¶±ðµÄCacheÈÝÁ¿Ö»ÄÜ´æ´¢¼¸Ê®¸ö¶ÔÏó¡£ÄÇôTLB CacheÔÚÓ¦¶Ô´óÁ¿ÈȵãÊý¾ÝVirual Addressת»»µÄʱºò¾ÍÏÔµÃ×½½ó¼ûÖâÁË¡£ ÎÒÃÇÀ´Ëãϰ´ÕÕ±ê×¼µÄLinuxÒ³´óС(page size) 4K£¬Ò»¸öÄÜ»º´æ64ÔªËØµÄTLB CacheÖ»Äܺ¸Ç4K*64 = 256KµÄÈȵãÊý¾ÝµÄÄÚ´æµØÖ·£¬ÏÔÈ»ÀëÀíÏë·Ç³£Ò£Ô¶µÄ¡£ÓÚÊÇHuge Page¾Í²úÉúÁË¡£ Tips: ÕâÀï²»Òª°ÑVirutal AddressºÍWindowsÉϵÄÐéÄâÄÚ´æ¸ã»ìÁË¡£ºóÕßÊÇΪÁËÓ¦¶ÔÎïÀíÄÚ´æ²»×㣬¶ø½«ÄÚÈÝ´ÓÄÚ´æ»»³öµ½ÆäËûÉ豸µÄ¼¼Êõ£¨ÀàËÆÓÚLinuxµÄSWAP»úÖÆ£©¡£ 
¼ÈÈ»¸Ä±ä²»ÁËTLB CacheµÄÈÝÁ¿£¬ÄÇôֻÄÜ´Óϵͳ²ãÃæÔö¼ÓÒ»¸öTLB Cache entryËùÄܶÔÓ¦µÄÎïÀíÄÚ´æ´óС£¬´Ó¶øÔö¼ÓTLB CacheËùÄܺ¸ÇµÄÈȵãÄÚ´æÊý¾ÝÁ¿¡£¼ÙÉèÎÒÃǰÑLinuxPage SizeÔö¼Óµ½16M£¬ÄÇôͬÑùÒ»¸öÈÝÄÉ64¸öÔªËØµÄTLB Cache¾ÍÄܹ˼°64*16M = 1GµÄÄÚ´æÈȵãÊý¾Ý£¬ÕâÑùµÄ´óСÏà½ÏÉÏÎĵÄ256K¾ÍÏԵ÷dz£ÊʺÏʵ¼ÊÓ¦ÓÃÁË¡£ÏñÕâÖÖ½«Page Size¼Ó´óµÄ¼¼Êõ¾ÍÊÇHuge Page¡£ ¶þ¡¢Huge PageÊÇÍòÄܵģ¿ Á˽âÁËHuge PageµÄÓÉÀ´ºÍÔÀíºó£¬ÎÒÃDz»ÄÑ×ܽá³öÄÜ´ÓHuge PageÊÜÒæµÄ³ÌÐò±ØÈ»ÊÇÄÇЩÈȵãÊý¾Ý·ÖÉ¢ÇÒÖÁÉÙ³¬¹ý64¸ö4K Page SizeµÄ³ÌÐò¡£´ËÍ⣬Èç¹û³ÌÐòµÄÖ÷ÒªÔËÐÐʱ¼ä²¢²»ÊÇÏûºÄÔÚTLB Cache MissºóµÄPage Table LookupÉÏ£¬ÄÇôTLBÔÙÔõô´ó£¬Page SizeÔÙÔõôÔö¼Ó¶¼ÊÇͽÀÍ¡£ÔÚLWNµÄһƪÈëÃŽéÉÜÖоÍÌáµ½ÁËÕâ¸öÔÀí£¬²¢ÇÒ¸ø³öÁ˱ȽÏÏêϸµÄ¹ÀËã·½·¨¡£ ¼òµ¥Ëµ¾ÍÊÇ£ºÏÈͨ¹ýoprofileץȡµ½TLB Missµ¼ÖµÄÔËÐÐʱ¼äÕ¼³ÌÐò×ÜÔËÐÐʱ¼äµÄ¶àÉÙ£¬À´¼ÆËã³öHuge PageËùÄÜ´øÀ´µÄÔ¤ÆÚÐÔÄÜÌáÉý¡£ ¼òµ¥À´Ëµ£¬ÎÒÃǵijÌÐòÈç¹ûÈȵãÊý¾ÝÖ»ÓÐ256K£¬²¢ÇÒ¼¯ÖÐÔÚÁ¬ÐøµÄÄÚ´æpageÉÏ£¬ÄÇôһ¸ö64¸öentryµÄTLB Cache¾Í×ãÒÔÓ¦¸¶ÁË¡£Ëµµ½ÕâÀ´ó¼Ò¿ÉÄÜÓиöÒÉÎÊÁË£º¼ÈÈ»ÎÒÃDZȽÏÄÑÔ¤²â×Ô¼ºµÄ³ÌÐò·ÃÎÊÂß¼ÊÇ·ñÄÜ´Ó¿ªÆôHuge PageÖÐÊÜÒæ£¬·´ÕýHuge Page¿´ÉÏÈ¥Ö»¸ÄÁËÒ»¸öPage Size£¬²»»áÓÐʲôÐÔÄÜËðʧ¡£ÄÇôÎÒÃǾÍË÷ÐÔ¶ÔËùÓгÌÐò¶¼ÊÇÓÃHuge PageºÃÀ²¡£ ÆäʵÕâÑùµÄÏë·¨ÊÇÍêÈ«´íÎóµÄ£¡Ò²ÕýÊDZ¾ÎÄÏëÒª½éÉܵÄÒ»¸öÖ÷ÒªÄÚÈÝ£¬ÔÚĿǰ³£¼ûµÄNUMAÌåϵÏÂHuge PageÒ²²¢·ÇÍòÄÜÔ¿³×£¬Ê¹Óò»µ±ÉõÖÁ»áʹµÃ³ÌÐò»òÕßÊý¾Ý¿âÐÔÄÜϽµ10%¡£ ÏÂÃæÎÒÃÇÖØµã·ÖÎö¡£ Èý¡¢Huge Page on NUMA ¡°Large Pages May Be Harmful on NUMA Systems¡±Ò»ÎĵÄ×÷ÕßÔø½ñ×ö¹ýÒ»¸öʵÑ飬²âÊÔHuge PageÔÚNUMA»·¾³µÄ¸÷ÖÖ²»Í¬Ó¦Óó¡¾°Ï´øÀ´µÄÐÔÄܲîÒì¡£´ÓÏÂͼ¿ÉÒÔ¿´µ½Huge Page¶ÔÓÚÏ൱һ²¿·ÖµÄÓ¦Óó¡¾°²¢²»ÄܺܺõÄÌáÉýÐÔÄÜ£¬ÉõÖÁ»á´øÀ´¸ß´ï10%µÄÐÔÄÜËðºÄ¡£ 
ÐÔÄÜϽµµÄÔÒòÖ÷ÒªÓÐÒÔÏÂÁ½µã£º CPU¶Ôͬһ¸öPageÇÀÕ¼Ôö¶à ¶ÔÓÚд²Ù×÷Ãܼ¯Ð͵ÄÓ¦Óã¬Huge Page»á´ó´óÔö¼ÓCacheд³åÍ»µÄ·¢Éú¸ÅÂÊ¡£ÓÉÓÚCPU¶ÀÁ¢Cache²¿·ÖµÄдһÖÂÐÔÓõÄÊÇMESIÐÒ飬д³åÍ»¾ÍÒâζ£º Àà±Èµ½Êý¾Ý¿â¾ÍÏ൱ÓÚ£¬ÔÀ´Ò»°ÑÓÃÀ´±£»¤10ÐÐÊý¾ÝµÄËø£¬ÏÖÔÚÓÃÀ´Ëø1000ÐÐÊý¾ÝÁË¡£±ØÈ»Õâ°ÑËøÔÚÏß³ÌÖ®¼äµÄÕùÇÀ¸ÅÂÊÒª´ó´óÔö¼Ó¡£ Á¬ÐøÊý¾ÝÐèÒª¿çCPU¶ÁÈ¡(False Sharing£© ´ÓÏÂͼÎÒÃÇ¿ÉÒÔ¿´µ½£¬Ô±¾ÔÚ4KСҳÉÏ¿ÉÒÔÁ¬Ðø·ÖÅ䣬²¢ÒòΪ½Ï¸ßÃüÖÐÂʶøÔÚͬһ¸öCPUÉÏʵÏÖlocalityµÄÊý¾Ý¡£µ½ÁËHuge PageµÄÇé¿öÏ£¬¾ÍÓÐÒ»²¿·ÖÊý¾ÝΪÁËÌî³äͳһ³ÌÐòÖÐÉÏ´ÎÄÚ´æ·ÖÅäÁôϵĿռ䣬¶ø±»ÆÈ·Ö²¼ÔÚÁËÁ½¸öÒ³ÉÏ¡£ ¶øÔÚËùÔÚHuge PageÖÐÕ¼±È½ÏСµÄÄDz¿·ÖÊý¾Ý£¬ÓÉÓÚÔÚ¼ÆËãCPUÇ׺ÍÁ¦µÄʱºòÈ¨ÖØÐ¡£¬×ÔÈ»¾Í±»¸½×ŵ½ÁËÆäËûCPUÉÏ¡£ÄÇô¾Í»áÔì³É£º±¾¸ÃÒÔÈȵãÐÎʽ´æÔÚÓÚCPU2 L1»òÕßL2 CacheÉϵÄÊý¾Ý£¬²»µÃ²»Í¨¹ýCPU inter-connectÈ¥remote CPU»ñÈ¡Êý¾Ý¡£ ¼ÙÉèÎÒÃÇÁ¬ÐøÉêÃ÷Á½¸öÊý×飬Array AºÍArray B´óС¶¼ÊÇ1536K¡£ÄÚ´æ·ÖÅäʱÓÉÓÚµÚÒ»¸öPageµÄ2MûÓÐÓÃÂú£¬Òò´ËArray B¾Í±»²ð³ÉÁËÁ½·Ý£¬·Ö¸îÔÚÁËÁ½¸öPageÀï¡£¶øÓÉÓÚÄÚ´æµÄÇ׺ÍÅäÖã¬Ò»¸ö·ÖÅäÔÚZone 0£¬¶øÁíÒ»¸öÔÚZone 1¡£ÄÇôµ±Ä³¸öÏß³ÌÐèÒª·ÃÎÊArray Bʱ¾Í²»µÃ²»Í¨¹ý´ú¼Û½Ï´óµÄInter-ConnectÈ¥»ñÈ¡ÁíÍâÒ»²¿·ÖÊý¾Ý¡£ 
delays re-sulting from traversing a greater physical distance to reach a remote node, are not the most important source of performance overhead. On the other hand, congestion on interconnect links and in memory controllers, which results from high volume of data flowing across the system, can dramatically hurt performance. Under interleaving, the memory latency re- duces by a factor of 2.48 for Streamcluster and 1.39 for PCA. This effect is entirely responsible for performance improvement under the better policy. The question is, what is responsible for memory latency improvements? It turns out that interleaving dramatically reduces memory controller and interconnect congestion by allevi- ating the load imbalance and mitigating traffic hotspots. ËÄ¡¢¶Ô²ß ÀíÏë ÎÒÃÇÏÈ̸̸ÀíÏëÇé¿ö¡£ÉÏÎÄÌáµ½µÄÂÛÎÄÆäʵËûµÄÖ÷ҪĿµÄ¾ÍÊÇÌÖÂÛÒ»ÖÖÊÊÓÃÓÚNUMA¼Ü¹¹µÄHuge Page×Ô¶¯ÄÚ´æ¹ÜÀí²ßÂÔ¡£Õâ¸ö¹ÜÀí²ßÂÔ¼òµ¥µÄ˵ÊÇ»ùÓÚCarrefourµÄÒ»ÖÖ¶ÔHuge PageÓÅ»¯µÄ±äÖÖ¡££¨×¢£º²»ÊìϤʲôÊÇCarrefourµÄ¶ÁÕß¿ÉÒԲο´²©¿Í֮ǰµÄ¿ÆÆÕ½éÉÜ»òÕßÔĶÁÔÎÄ£© ÏÂÃæÊÇһЩÏà¹Ø¼¼ÊõÊֶεļòÒª¸ÅÀ¨£º ΪÁ˼õÉÙÖ»¶ÁÈȵãÊý¾Ý¿çNUMA ZoneµÄ·ÃÎÊ£¬¿ÉÒÔ½«¶Áд±È·Ç³£¸ßµÄPage£¬Ê¹ÓÃReplicationµÄ·½Ê½ÔÚÿ¸öNUMA ZoneµÄDirectÄÚ´æÖж¼¸´ÖÆÒ»¸ö¸±±¾£¬½µµÍÏìӦʱ¼ä¡£ ΪÁ˼õÉÙFalse Sharing£¬¼à¿ØÔì³É´óÁ¿Cache MissµÄPage£¬²¢½øÐвð·ÖÖØ×é¡£½«Í¬Ò»CPUÇ׺͵ÄÊý¾Ý·ÅÔÚͬһ¸öPageÖÐ
ÏÖʵ ̸ÍêÁËÀíÏ룬ÎÒÃÇ¿´¿´ÏÖʵ¡£ÏÖʵÍùÍùÊDzпáµÄ£¬ÓÉÓÚûÓÐÓ²¼þ¼¶±ðµÄPMU(Performance Monitor Unit)Ö§³Ö£¬»ñÈ¡¾«×¼µÄPage·ÃÎʺÍCache MissÐÅÏ¢ÐÔÄÜ´ú¼Û·Ç³£´ó¡£ËùÒÔÉÏÃæµÄÀíÏë½ö½öÍ£ÁôÔÚʵÑéºÍÂÛÎĽ׶Ρ£ÄÇôÔÚÀíÏëʵÏÖ֮ǰ£¬ÎÒÃÇÏÖÔÚ¸ÃÔõô°ìÄØ£¿ ´ð°¸Ö»ÓÐÒ»¸ö¾ÍÊDzâÊÔ¡£ ʵ¼Ê²âÊԵĽá¹û×î¾ßÓÐ˵·þÁ¦¡£Ëùνʵ¼Ê²âÊÔ¾ÍÊǰÑÓÅ»¯¶ÔÏó¸øÓèÕæÊµ»·¾³µÄѹÁ¦Ä£Ä⡣ͨ¹ý¶Ô±È¿ªÆôºÍ¹Ø±ÕHuge PageʱµÄÐÔÄܲî±ðÀ´ÑéÖ¤Huge PageÊÇ·ñ»á´øÀ´ÐÔÄÜÌáÉý¡£µ±È»´ó¶àÊýÓ¦ÓóÌÐò£¬ÒªÏëÄ£ÄâÕæÊµ»·¾³ÏµÄÔËÐÐÇé¿öÊǷdz£À§Äѵġ£ÄÇôÎÒÃǾͿÉÒÔÓÃÏÂÃæÕâÖÖÀíÂÛ²âÊÔ¡£ ÀíÂÛ²âÊÔ¿ÉÒÔͨ¹ýprofileÔ¤¹À³öHuge PageÄܹ»´øÀ´µÄDZÔÚÌáÉý¡£¾ßÌåÔÀí¾ÍÊǼÆË㵱ǰӦÓóÌÐòÔËÐÐʱTLB Missµ¼ÖµÄPage Walk³É±¾Õ¼³ÌÐò×ÜÖ´ÐÐʱ¼äµÄÕ¼±È¡£µ±È»ÕâÖÖ²âÊÔ·½Ê½Ã»ÓаÑÉÏÎÄÌáµ½µÄÄÇÁ½ÖÖÐÔÄÜËðʧ¿¼ÂǽøÈ¥£¬ËùÒÔÖ»ÄÜÓÃÓÚ¼ÆËãHuge PageËùÄÜ´øÀ´µÄDZÔÚÐÔÄÜÌáÉýµÄÉÏÏÞ¡£Èç¹û¼ÆËã³öÀ´Õâ¸öÖµ·Ç³£µÍ£¬ÄÇô¿ÉÒÔÈÏΪʹÓÃHuge PageÔò»á´øÀ´¶îÍâµÄÐÔÄÜËðʧ¡£¾ßÌå·½·¨¼ûLWNÉϽéÉܵķ½·¨ ¾ßÌåµÄ¼ÆË㹫ʽÈçÏÂͼ£º 
Èç¹ûûÓÐhardwareµÄPMUÖ§³ÖµÄ»°£¬¼ÆËãÐèÒªÓõ½oprofileºÍcalibrator¡£ Îå¡¢×Ü½á ²¢²»ÊÇËùÓеÄÓÅ»¯·½°¸¶¼ÊÇ0ÐÔÄÜËðʧµÄ¡£³ä·ÖµÄ²âÊԺͶÔÓÚÓÅ»¯ÔÀíµÄÀí½âÊÇÒ»¸ö³É¹¦ÓÅ»¯µÄǰÌáÌõ¼þ¡£ Áù¡¢Reference Huge pages part 5: A deeper look at TLBs and costs About Huge Page TLB on Wikipedia Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems Large Pages May Be Harmful on NUMA Systems
×÷Õß½éÉÜ Â¬¾ûéó ¡¾DBAplusÉçȺ¡¿Ô´´×¨¼Ò£» Ŀǰ¾ÍÖ°ÓÚFacebook MySQL Infra Team£¬Ö÷Òª¸ºÔð´ó¹æÄ£MySQLÊý¾Ý¿âÔËά¡£ÔÚFailover£¬±¸·Ý£¬¼à¿Ø£¬ÓÅ»¯£¬Êý¾Ý¿â˽ÓÐÔÆµÈÏà¹ØÁìÓòÓÐÒ»¶¨¾ÑéºÍ¸öÈËÀí½â£» ֮ǰÏȺó¾ÍÖ°ÓÚBesTVºÍ´óÖÚµãÆÀÍø¡£ÔøÔÚ°¢Àï¼ÎÄ껪ºÍÖлªÊý¾Ý¿â´ó»áÉÏÓйýÏà¹Ø·ÖÏí¡£ ¸öÈ˲©¿Í£ºhttp://cenalulu.github.io/ ÍùÆÚ»Ø¹Ë£º ¡¶ÎÒÊÇÈçºÎͨ¹ý5ÂÖÃæÊÔÄÃÏÂFacebook offer£¿¡· ¡¶FaceBookר¼Ò£º10·ÖÖÓ³¹µ×½â¾öMySQLÂÒÂëÎÊÌâ?¡· ¡¶½ÒÃØFacebookº£Á¿Êý¾Ý±¸·Ý²ßÂÔ¡·
¾«Ñ¡×¨Ì⣨µã»÷À¶É«±êÌâ¿ÉÔĶÁÈ«ÎÄ£©
|